Collation & ctype method table, and extension hooks

Started by Jeff Davisover 1 year ago21 messages

pgsql@j-davis.com

over 1 year ago

8 attachment(s)

The attached patch series refactors the collation and ctype behavior
into method tables, and provides a way to hook the creation of a
pg_locale_t so that an extension can create any kind of method table it
wants.

In practice, the main use is to replace, for example, ICU with a
different version of ICU. But it can also be used to control libc
behavior, or to use a different set of methods that have nothing to do
with ICU or libc.

It also isolates code to some new files: ICU code goes in
pg_locale_icu.c, and libc code goes in pg_locale_libc.c. And it reduces
a lot of code that branches on the provider. That's easier to reason
about, in my opinion.

With these patches, the collation provider becomes mainly a catalog
concept used to create the right pg_locale_t, rather than an execution-
time concept.

We could take this further and make providers a concept in the catalog,
like "CREATE LOCALE PROVIDER", and it would just provide an arbitrary
handler function to create the pg_locale_t. If we decide how we'd like
to handle versioning, that could potentially allow a much smoother
upgrade process that preserves the provider versions.

Regards,
Jeff Davis

Attachments:

v5-0006-Control-case-mapping-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v5-0006-Control-case-mapping-behavior-with-a-method-table.patchDownload

From eec6c3eb165f85614780374cb83ec98f09d3bd3e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 12:12:51 -0700
Subject: [PATCH v5 6/8] Control case mapping behavior with a method table.

Previously, case mapping (LOWER(), INITCAP(), UPPER()) behavior
branched based on the provider.

A method table is less error-prone and easier to hook.
---
 src/backend/utils/adt/formatting.c     | 445 ++++---------------------
 src/backend/utils/adt/pg_locale.c      | 102 ++++++
 src/backend/utils/adt/pg_locale_icu.c  | 142 +++++++-
 src/backend/utils/adt/pg_locale_libc.c | 219 ++++++++++++
 src/include/utils/pg_locale.h          |  29 +-
 5 files changed, 539 insertions(+), 398 deletions(-)

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 85a7dd45619..6a0571f93e6 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1570,52 +1570,6 @@ str_numth(char *dest, char *num, int type)
  *			upper/lower/initcap functions
  *****************************************************************************/
 
-#ifdef USE_ICU
-
-typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
-									 const UChar *src, int32_t srcLength,
-									 const char *locale,
-									 UErrorCode *pErrorCode);
-
-static int32_t
-icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
-				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
-{
-	UErrorCode	status;
-	int32_t		len_dest;
-
-	len_dest = len_source;		/* try first with same length */
-	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-	status = U_ZERO_ERROR;
-	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
-	if (status == U_BUFFER_OVERFLOW_ERROR)
-	{
-		/* try again with adjusted length */
-		pfree(*buff_dest);
-		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-		status = U_ZERO_ERROR;
-		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
-	}
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("case conversion failed: %s", u_errorName(status))));
-	return len_dest;
-}
-
-static int32_t
-u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
-						const UChar *src, int32_t srcLength,
-						const char *locale,
-						UErrorCode *pErrorCode)
-{
-	return u_strToTitle(dest, destCapacity, src, srcLength,
-						NULL, locale, pErrorCode);
-}
-
-#endif							/* USE_ICU */
-
 /*
  * If the system provides the needed functions for wide-character manipulation
  * (which are all standardized by C99), then we implement upper/lower/initcap
@@ -1663,101 +1617,28 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar;
-			int32_t		len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToLower, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strlower(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strlower(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that tolower_l() will not be so broken as
-				 * to need an isupper_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
@@ -1800,147 +1681,33 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToUpper, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
-		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strupper(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strupper(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l() will not be so broken as
-				 * to need an islower_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
 }
 
-struct WordBoundaryState
-{
-	const char *str;
-	size_t		len;
-	size_t		offset;
-	bool		init;
-	bool		prev_alnum;
-};
-
-/*
- * Simple word boundary iterator that draws boundaries each time the result of
- * pg_u_isalnum() changes.
- */
-static size_t
-initcap_wbnext(void *state)
-{
-	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
-
-	while (wbstate->offset < wbstate->len &&
-		   wbstate->str[wbstate->offset] != '\0')
-	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
-										wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
-
-		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
-		{
-			size_t		prev_offset = wbstate->offset;
-
-			wbstate->init = true;
-			wbstate->offset += unicode_utf8len(u);
-			wbstate->prev_alnum = curr_alnum;
-			return prev_offset;
-		}
-
-		wbstate->offset += unicode_utf8len(u);
-	}
-
-	return wbstate->len;
-}
-
 /*
  * collation-aware, wide-character-aware initcap function
  *
@@ -1951,7 +1718,6 @@ char *
 str_initcap(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
-	int			wasalnum = false;
 	pg_locale_t mylocale;
 
 	if (!buff)
@@ -1979,125 +1745,28 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToTitle_default_BI, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-			struct WordBoundaryState wbstate = {
-				.str = src,
-				.len = srclen,
-				.offset = 0,
-				.init = false,
-				.prev_alnum = false,
-			};
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strtitle(dst, dstsize, src, srclen,
-									  initcap_wbnext, &wbstate);
-			if (needed + 1 > dstsize)
-			{
-				/* reset iterator */
-				wbstate.offset = 0;
-				wbstate.init = false;
-
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strtitle(dst, dstsize, src, srclen,
-										  initcap_wbnext, &wbstate);
-				Assert(needed + 1 == dstsize);
-			}
-
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-				{
-					if (wasalnum)
-						workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
-					else
-						workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-					wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
-				}
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
 
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l()/tolower_l() will not be so
-				 * broken as to need guard tests.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (wasalnum)
-						*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-					else
-						*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-					wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index dfb9c3bd952..a106478b119 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -58,6 +58,8 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "common/hashfn.h"
+#include "common/unicode_case.h"
+#include "common/unicode_category.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -170,6 +172,83 @@ static pg_locale_t last_collation_cache_locale = NULL;
 static char *IsoLocaleName(const char *);
 #endif
 
+struct WordBoundaryState
+{
+	const char *str;
+	size_t		len;
+	size_t		offset;
+	bool		init;
+	bool		prev_alnum;
+};
+
+/*
+ * Simple word boundary iterator that draws boundaries each time the result of
+ * pg_u_isalnum() changes.
+ */
+static size_t
+initcap_wbnext(void *state)
+{
+	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
+
+	while (wbstate->offset < wbstate->len &&
+		   wbstate->str[wbstate->offset] != '\0')
+	{
+		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+										wbstate->offset);
+		bool		curr_alnum = pg_u_isalnum(u, true);
+
+		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
+		{
+			size_t		prev_offset = wbstate->offset;
+
+			wbstate->init = true;
+			wbstate->offset += unicode_utf8len(u);
+			wbstate->prev_alnum = curr_alnum;
+			return prev_offset;
+		}
+
+		wbstate->offset += unicode_utf8len(u);
+	}
+
+	return wbstate->len;
+}
+
+static size_t
+strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strlower(dest, destsize, src, srclen);
+}
+
+static size_t
+strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	struct WordBoundaryState wbstate = {
+		.str = src,
+		.len = srclen,
+		.offset = 0,
+		.init = false,
+		.prev_alnum = false,
+	};
+
+	return unicode_strtitle(dest, destsize, src, srclen,
+							initcap_wbnext, &wbstate);
+}
+
+static size_t
+strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strupper(dest, destsize, src, srclen);
+}
+
+static struct casemap_methods casemap_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+};
+
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
@@ -1239,6 +1318,7 @@ dat_create_locale_builtin(HeapTuple dattuple)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	result->casemap = &casemap_methods_builtin;
 
 	return result;
 }
@@ -1265,6 +1345,7 @@ coll_create_locale_builtin(HeapTuple colltuple, MemoryContext context)
 	result->deterministic = collform->collisdeterministic;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	result->casemap = &casemap_methods_builtin;
 
 	return result;
 }
@@ -1537,6 +1618,27 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+size_t
+pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strlower(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strtitle(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strupper(dst, dstsize, src, srclen, locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index e53bf2d4b33..97e96d5b9fb 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,11 @@ static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
+typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
+									 const UChar *src, int32_t srcLength,
+									 const char *locale,
+									 UErrorCode *pErrorCode);
+
 /*
  * Converter object for converting between ICU's UChar strings and C strings
  * in database encoding.  Since the database encoding doesn't change, we only
@@ -60,9 +65,20 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
+
+static size_t strlower_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strtitle_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strupper_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
 static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 								 const char *arg2, ssize_t len2,
 								 pg_locale_t locale);
+
 static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
 										  const char *src, ssize_t srclen,
 										  pg_locale_t locale);
@@ -72,8 +88,19 @@ static size_t uchar_length(UConverter *converter,
 static int32_t uchar_convert(UConverter *converter,
 							 UChar *dest, int32_t destlen,
 							 const char *src, int32_t srclen);
+static int32_t icu_to_uchar(UChar **buff_uchar, const char *buff,
+							size_t nbytes);
+static size_t icu_from_uchar(char *dest, size_t destsize,
+							 const UChar *buff_uchar, int32_t len_uchar);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+static int32_t icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+								UChar **buff_dest, UChar *buff_source,
+								int32_t len_source);
+static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+									   const UChar *src, int32_t srcLength,
+									   const char *locale,
+									   UErrorCode *pErrorCode);
 
 static struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
@@ -82,6 +109,11 @@ static struct collate_methods collate_methods_icu = {
 	.strxfrm_is_safe = true,
 };
 
+static struct casemap_methods casemap_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+};
 #endif
 
 pg_locale_t
@@ -118,6 +150,7 @@ dat_create_locale_icu(HeapTuple dattuple)
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
 	result->collate = &collate_methods_icu;
+	result->casemap = &casemap_methods_icu;
 
 	return result;
 #else
@@ -163,6 +196,7 @@ coll_create_locale_icu(HeapTuple colltuple, MemoryContext context)
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
 	result->collate = &collate_methods_icu;
+	result->casemap = &casemap_methods_icu;
 
 	return result;
 #else
@@ -336,6 +370,66 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
+static size_t
+strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToLower, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToTitle_default_BI, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToUpper, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
 /*
  * strncoll_icu
  *
@@ -470,7 +564,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
  * The result string is nul-terminated, though most callers rely on the
  * result length instead.
  */
-int32_t
+static int32_t
 icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
 {
 	int32_t		len_uchar;
@@ -497,8 +591,8 @@ icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
  *
  * The result string is nul-terminated.
  */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+static size_t
+icu_from_uchar(char *dest, size_t destsize, const UChar *buff_uchar, int32_t len_uchar)
 {
 	UErrorCode	status;
 	int32_t		len_result;
@@ -513,10 +607,11 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 				(errmsg("%s failed: %s", "ucnv_fromUChars",
 						u_errorName(status))));
 
-	*result = palloc(len_result + 1);
+	if (len_result + 1 > destsize)
+		return len_result;
 
 	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+	len_result = ucnv_fromUChars(icu_converter, dest, len_result + 1,
 								 buff_uchar, len_uchar, &status);
 	if (U_FAILURE(status) ||
 		status == U_STRING_NOT_TERMINATED_WARNING)
@@ -527,6 +622,43 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	return len_result;
 }
 
+static int32_t
+icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
+{
+	UErrorCode	status;
+	int32_t		len_dest;
+
+	len_dest = len_source;		/* try first with same length */
+	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+	status = U_ZERO_ERROR;
+	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+					mylocale->info.icu.locale, &status);
+	if (status == U_BUFFER_OVERFLOW_ERROR)
+	{
+		/* try again with adjusted length */
+		pfree(*buff_dest);
+		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+		status = U_ZERO_ERROR;
+		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+						mylocale->info.icu.locale, &status);
+	}
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("case conversion failed: %s", u_errorName(status))));
+	return len_dest;
+}
+
+static int32_t
+u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+						const UChar *src, int32_t srcLength,
+						const char *locale,
+						UErrorCode *pErrorCode)
+{
+	return u_strToTitle(dest, destCapacity, src, srcLength,
+						NULL, locale, pErrorCode);
+}
+
 /*
  * strncoll_icu_no_utf8
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index b8ccd24715d..79828ab3524 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,6 +11,9 @@
 
 #include "postgres.h"
 
+#include <limits.h>
+#include <wctype.h>
+
 #include "access/htup_details.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
@@ -48,6 +51,16 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static size_t strlower_libc(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+static size_t strtitle_libc(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+static size_t strupper_libc(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+
 static struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -69,6 +82,208 @@ static struct collate_methods collate_methods_libc = {
 #endif
 };
 
+static struct casemap_methods casemap_methods_libc = {
+	.strlower = strlower_libc,
+	.strtitle = strtitle_libc,
+	.strupper = strupper_libc,
+};
+
+static size_t
+strlower_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			  pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+
+	if (pg_database_encoding_max_length() > 1)
+	{
+		wchar_t    *workspace;
+		size_t		curr_char;
+
+		/* Overflow paranoia */
+		if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+
+		/* Output workspace cannot have more codes than input bytes */
+		workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+		char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+		for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+
+		/*
+		 * Make result large enough; case change might change number of bytes
+		 */
+		result_size = curr_char * pg_database_encoding_max_length();
+		if (result_size + 1 > destsize)
+			return result_size;
+
+		wchar2char(dest, workspace, result_size + 1, locale);
+		pfree(workspace);
+	}
+	else
+	{
+		char	   *p;
+
+		result_size = srclen;
+		if (result_size + 1 > destsize)
+			return result_size;
+
+		strlcpy(dest, src, result_size + 1);
+
+		/*
+		 * Note: we assume that tolower_l() will not be so broken as to need
+		 * an isupper_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = tolower_l((unsigned char) *p, loc);
+	}
+
+	result_size = strlen(dest);
+	return result_size;
+}
+
+static size_t
+strtitle_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			  pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	int			wasalnum = false;
+	size_t		result_size;
+
+	if (pg_database_encoding_max_length() > 1)
+	{
+		wchar_t    *workspace;
+		size_t		curr_char;
+
+		/* Overflow paranoia */
+		if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+
+		/* Output workspace cannot have more codes than input bytes */
+		workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+		char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+		for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		{
+			if (wasalnum)
+				workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			else
+				workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+			wasalnum = iswalnum_l(workspace[curr_char], loc);
+		}
+
+		/*
+		 * Make result large enough; case change might change number of bytes
+		 */
+		result_size = curr_char * pg_database_encoding_max_length();
+
+		if (result_size + 1 > destsize)
+			return result_size;
+
+		wchar2char(dest, workspace, result_size + 1, locale);
+		pfree(workspace);
+	}
+	else
+	{
+		char	   *p;
+
+		if (srclen + 1 > destsize)
+			return srclen;
+
+		strlcpy(dest, src, srclen + 1);
+
+		/*
+		 * Note: we assume that toupper_l()/tolower_l() will not be so broken
+		 * as to need guard tests.  When using the default collation, we apply
+		 * the traditional Postgres behavior that forces ASCII-style treatment
+		 * of I/i, but in non-default collations you get exactly what the
+		 * collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (wasalnum)
+				*p = tolower_l((unsigned char) *p, loc);
+			else
+				*p = toupper_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, loc);
+		}
+	}
+
+	result_size = strlen(dest);
+	return result_size;
+}
+
+static size_t
+strupper_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			  pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+
+	if (pg_database_encoding_max_length() > 1)
+	{
+		wchar_t    *workspace;
+		size_t		curr_char;
+
+		/* Overflow paranoia */
+		if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+
+		/* Output workspace cannot have more codes than input bytes */
+		workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+		char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+		for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+
+		/*
+		 * Make result large enough; case change might change number of bytes
+		 */
+		result_size = curr_char * pg_database_encoding_max_length();
+		if (result_size + 1 > destsize)
+			return result_size;
+
+		wchar2char(dest, workspace, result_size + 1, locale);
+		pfree(workspace);
+	}
+	else
+	{
+		char	   *p;
+
+		result_size = srclen;
+		if (result_size + 1 > destsize)
+			return result_size;
+
+		strlcpy(dest, src, srclen + 1);
+
+		/*
+		 * Note: we assume that toupper_l() will not be so broken as to need
+		 * an islower_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = toupper_l((unsigned char) *p, loc);
+	}
+
+	result_size = strlen(dest);
+	return result_size;
+}
+
 pg_locale_t
 dat_create_locale_libc(HeapTuple dattuple)
 {
@@ -102,6 +317,8 @@ dat_create_locale_libc(HeapTuple dattuple)
 	result->info.lt = loc;
 	if (!result->collate_is_c)
 		result->collate = &collate_methods_libc;
+	if (!result->ctype_is_c)
+		result->casemap = &casemap_methods_libc;
 
 	return result;
 }
@@ -137,6 +354,8 @@ coll_create_locale_libc(HeapTuple colltuple, MemoryContext context)
 	result->info.lt = loc;
 	if (!result->collate_is_c)
 		result->collate = &collate_methods_libc;
+	if (!result->ctype_is_c)
+		result->casemap = &casemap_methods_libc;
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 6c2a0456f22..4bd9e6de7a3 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -93,6 +93,20 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+/* methods that define string case mapping behavior */
+struct casemap_methods
+{
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -117,6 +131,7 @@ struct pg_locale_struct
 	bool		ctype_is_c;
 
 	struct collate_methods *collate;	/* NULL if collate_is_c */
+	struct casemap_methods *casemap;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -141,6 +156,15 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern size_t pg_strlower(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strtitle(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strupper(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
@@ -160,11 +184,6 @@ extern const char *builtin_validate_locale(int encoding, const char *locale);
 extern void icu_validate_locale(const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
-#ifdef USE_ICU
-extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
-extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
-
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
 						 pg_locale_t locale);
-- 
2.34.1

v5-0005-Control-collation-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v5-0005-Control-collation-behavior-with-a-method-table.patchDownload

From 6613da1cb99dfa19d45fb11073ba82ebda2c6242 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 11:27:29 -0700
Subject: [PATCH v5 5/8] Control collation behavior with a method table.

Previously, behavior branched based on the provider.

A method table is less error prone and easier to hook.
---
 src/backend/utils/adt/pg_locale.c      | 147 +++++--------------------
 src/backend/utils/adt/pg_locale_icu.c  |  55 +++++----
 src/backend/utils/adt/pg_locale_libc.c |  64 +++++++----
 src/include/utils/pg_locale.h          |  33 ++++++
 4 files changed, 136 insertions(+), 163 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f49c89c833e..dfb9c3bd952 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -100,24 +100,8 @@ extern pg_locale_t coll_create_locale_libc(HeapTuple colltuple,
 
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int strncoll_icu(const char *arg1, ssize_t len1,
-						const char *arg2, ssize_t len2,
-						pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
-						   const char *src, ssize_t srclen,
-						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
-								  const char *src, ssize_t srclen,
-								  pg_locale_t locale);
 #endif
 
-extern int strncoll_libc(const char *arg1, ssize_t len1,
-						 const char *arg2, ssize_t len2,
-						 pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1234,10 +1218,10 @@ IsoLocaleName(const char *winlocname)
 static pg_locale_t
 dat_create_locale_builtin(HeapTuple dattuple)
 {
-	Form_pg_database	 dbform;
-	Datum				 datum;
-	const char			*locstr;
-	pg_locale_t			 result;
+	Form_pg_database dbform;
+	Datum		datum;
+	const char *locstr;
+	pg_locale_t result;
 
 	dbform = (Form_pg_database) GETSTRUCT(dattuple);
 	datum = SysCacheGetAttrNotNull(DATABASEOID, dattuple,
@@ -1262,10 +1246,10 @@ dat_create_locale_builtin(HeapTuple dattuple)
 static pg_locale_t
 coll_create_locale_builtin(HeapTuple colltuple, MemoryContext context)
 {
-	Form_pg_collation	 collform;
-	Datum				 datum;
-	const char			*locstr;
-	pg_locale_t			 result;
+	Form_pg_collation collform;
+	Datum		datum;
+	const char *locstr;
+	pg_locale_t result;
 
 	collform = (Form_pg_collation) GETSTRUCT(colltuple);
 	datum = SysCacheGetAttrNotNull(COLLOID, colltuple,
@@ -1293,8 +1277,8 @@ create_pg_locale(Oid collid, MemoryContext context)
 {
 	/* We haven't computed this yet in this session, so do it */
 	HeapTuple	tp;
-	Datum				 datum;
-	bool				 isnull;
+	Datum		datum;
+	bool		isnull;
 	Form_pg_collation collform;
 	pg_locale_t	result;
 
@@ -1332,9 +1316,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 		if (!actual_versionstr)
 		{
 			/*
-			 * This could happen when specifying a version in CREATE
-			 * COLLATION but the provider does not support versioning, or
-			 * manually creating a mess in the catalogs.
+			 * This could happen when specifying a version in CREATE COLLATION
+			 * but the provider does not support versioning, or manually
+			 * creating a mess in the catalogs.
 			 */
 			ereport(ERROR,
 					(errmsg("collation \"%s\" has no actual version, but a version was recorded",
@@ -1561,19 +1545,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 int
 pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, -1, arg2, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, -1, arg2, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, -1, arg2, -1, locale);
 }
 
 /*
@@ -1594,51 +1566,25 @@ int
 pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 			pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, len1, arg2, len2, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, len1, arg2, len2, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, len1, arg2, len2, locale);
 }
 
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
  *
- * Unfortunately, it seems that strxfrm() for non-C collations is broken on
- * many common platforms; testing of multiple versions of glibc reveals that,
- * for many locales, strcoll() and strxfrm() do not return consistent
- * results. While no other libc other than Cygwin has so far been shown to
- * have a problem, we take the conservative course of action for right now and
- * disable this categorically.  (Users who are certain this isn't a problem on
- * their system can define TRUST_STRXFRM.)
  *
  * No similar problem is known for the ICU provider.
  */
 bool
 pg_strxfrm_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-#ifdef TRUST_STRXFRM
-		return true;
-#else
-		return false;
-#endif
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	/*
+	 * locale->collate->strnxfrm is still a required method, even if it may
+	 * have the wrong behavior, because the planner uses it for estimates in
+	 * some cases.
+	 */
+	return locale->collate->strxfrm_is_safe;
 }
 
 /*
@@ -1649,19 +1595,7 @@ pg_strxfrm_enabled(pg_locale_t locale)
 size_t
 pg_strxfrm(char *dest, const char *src, size_t destsize, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1687,19 +1621,7 @@ size_t
 pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, srclen, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, srclen, locale);
 }
 
 /*
@@ -1709,15 +1631,7 @@ pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 bool
 pg_strxfrm_prefix_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-		return false;
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	return (locale->collate->strnxfrm_prefix != NULL);
 }
 
 /*
@@ -1729,7 +1643,7 @@ size_t
 pg_strxfrm_prefix(char *dest, const char *src, size_t destsize,
 				  pg_locale_t locale)
 {
-	return pg_strnxfrm_prefix(dest, destsize, src, -1, locale);
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1754,16 +1668,7 @@ size_t
 pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 				   ssize_t srclen, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-#ifdef USE_ICU
-	if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_prefix_icu(dest, destsize, src, -1, locale);
-	else
-#endif
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2df1a8226e6..e53bf2d4b33 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -40,13 +40,14 @@ extern pg_locale_t coll_create_locale_icu(HeapTuple colltuple,
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int strncoll_icu(const char *arg1, ssize_t len1,
-						const char *arg2, ssize_t len2,
-						pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
+
+static int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+static size_t strnxfrm_icu(char *dest, size_t destsize,
 						   const char *src, ssize_t srclen,
 						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
@@ -59,9 +60,9 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
-static int strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-								const char *arg2, ssize_t len2,
-								pg_locale_t locale);
+static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
+								 const char *arg2, ssize_t len2,
+								 pg_locale_t locale);
 static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
 										  const char *src, ssize_t srclen,
 										  pg_locale_t locale);
@@ -73,19 +74,27 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+
+static struct collate_methods collate_methods_icu = {
+	.strncoll = strncoll_icu,
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu,
+	.strxfrm_is_safe = true,
+};
+
 #endif
 
 pg_locale_t
 dat_create_locale_icu(HeapTuple dattuple)
 {
 #ifdef USE_ICU
-	Form_pg_database	 dbform;
-	Datum				 datum;
-	bool				 isnull;
-	const char			*iculocstr;
-	const char			*icurules = NULL;
-	UCollator			*collator;
-	pg_locale_t			 result;
+	Form_pg_database dbform;
+	Datum		datum;
+	bool		isnull;
+	const char *iculocstr;
+	const char *icurules = NULL;
+	UCollator  *collator;
+	pg_locale_t result;
 
 	dbform = (Form_pg_database) GETSTRUCT(dattuple);
 
@@ -108,6 +117,7 @@ dat_create_locale_icu(HeapTuple dattuple)
 	result->deterministic = true;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
+	result->collate = &collate_methods_icu;
 
 	return result;
 #else
@@ -124,13 +134,13 @@ pg_locale_t
 coll_create_locale_icu(HeapTuple colltuple, MemoryContext context)
 {
 #ifdef USE_ICU
-	Form_pg_collation	 collform;
-	Datum				 datum;
-	bool				 isnull;
-	const char			*iculocstr;
-	const char			*icurules = NULL;
-	UCollator			*collator;
-	pg_locale_t			 result;
+	Form_pg_collation collform;
+	Datum		datum;
+	bool		isnull;
+	const char *iculocstr;
+	const char *icurules = NULL;
+	UCollator  *collator;
+	pg_locale_t result;
 
 	collform = (Form_pg_collation) GETSTRUCT(colltuple);
 
@@ -152,6 +162,7 @@ coll_create_locale_icu(HeapTuple colltuple, MemoryContext context)
 	result->deterministic = collform->collisdeterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
+	result->collate = &collate_methods_icu;
 
 	return result;
 #else
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 5481fd3b802..b8ccd24715d 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -31,10 +31,10 @@ extern pg_locale_t dat_create_locale_libc(HeapTuple dattuple);
 extern pg_locale_t coll_create_locale_libc(HeapTuple colltuple,
 										   MemoryContext context);
 
-extern int strncoll_libc(const char *arg1, ssize_t len1,
-						 const char *arg2, ssize_t len2,
-						 pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
+static int	strncoll_libc(const char *arg1, ssize_t len1,
+						  const char *arg2, ssize_t len2,
+						  pg_locale_t locale);
+static size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 
@@ -43,20 +43,41 @@ static locale_t make_libc_collator(const char *collate,
 static void report_newlocale_failure(const char *localename);
 
 #ifdef WIN32
-static int strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
-									const char *arg2, ssize_t len2,
-									pg_locale_t locale);
+static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
+									 const char *arg2, ssize_t len2,
+									 pg_locale_t locale);
 #endif
 
+static struct collate_methods collate_methods_libc = {
+	.strncoll = strncoll_libc,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+
+	/*
+	 * Unfortunately, it seems that strxfrm() for non-C collations is broken
+	 * on many common platforms; testing of multiple versions of glibc reveals
+	 * that, for many locales, strcoll() and strxfrm() do not return
+	 * consistent results. While no other libc other than Cygwin has so far
+	 * been shown to have a problem, we take the conservative course of action
+	 * for right now and disable this categorically.  (Users who are certain
+	 * this isn't a problem on their system can define TRUST_STRXFRM.)
+	 */
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+
 pg_locale_t
 dat_create_locale_libc(HeapTuple dattuple)
 {
-	Form_pg_database	 dbform;
-	Datum				 datum;
-	const char			*datcollate;
-	const char			*datctype;
-	locale_t			 loc;
-	pg_locale_t			 result;
+	Form_pg_database dbform;
+	Datum		datum;
+	const char *datcollate;
+	const char *datctype;
+	locale_t	loc;
+	pg_locale_t result;
 
 	dbform = (Form_pg_database) GETSTRUCT(dattuple);
 
@@ -79,6 +100,8 @@ dat_create_locale_libc(HeapTuple dattuple)
 	result->ctype_is_c = (strcmp(datctype, "C") == 0) ||
 		(strcmp(datctype, "POSIX") == 0);
 	result->info.lt = loc;
+	if (!result->collate_is_c)
+		result->collate = &collate_methods_libc;
 
 	return result;
 }
@@ -86,12 +109,12 @@ dat_create_locale_libc(HeapTuple dattuple)
 pg_locale_t
 coll_create_locale_libc(HeapTuple colltuple, MemoryContext context)
 {
-	Form_pg_collation	 collform;
-	Datum				 datum;
-	const char			*collcollate;
-	const char			*collctype;
-	locale_t			 loc;
-	pg_locale_t			 result;
+	Form_pg_collation collform;
+	Datum		datum;
+	const char *collcollate;
+	const char *collctype;
+	locale_t	loc;
+	pg_locale_t result;
 
 	collform = (Form_pg_collation) GETSTRUCT(colltuple);
 
@@ -112,6 +135,8 @@ coll_create_locale_libc(HeapTuple colltuple, MemoryContext context)
 	result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
 		(strcmp(collctype, "POSIX") == 0);
 	result->info.lt = loc;
+	if (!result->collate_is_c)
+		result->collate = &collate_methods_libc;
 
 	return result;
 }
@@ -400,4 +425,3 @@ report_newlocale_failure(const char *localename)
 			  errdetail("The operating system could not find any locale data for the locale name \"%s\".",
 						localename) : 0)));
 }
-
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 3b443df8014..6c2a0456f22 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -63,6 +63,36 @@ extern struct lconv *PGLC_localeconv(void);
 extern void cache_locale_time(void);
 
 
+struct pg_locale_struct;
+typedef struct pg_locale_struct *pg_locale_t;
+
+/* methods that define collation behavior */
+struct collate_methods
+{
+	/* required */
+	int			(*strncoll) (const char *arg1, ssize_t len1,
+							 const char *arg2, ssize_t len2,
+							 pg_locale_t locale);
+
+	/* required */
+	size_t		(*strnxfrm) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* optional */
+	size_t		(*strnxfrm_prefix) (char *dest, size_t destsize,
+									const char *src, ssize_t srclen,
+									pg_locale_t locale);
+
+	/*
+	 * If the strnxfrm method is not trusted to return the correct results,
+	 * set strxfrm_is_safe to false. It set to false, the method will not be
+	 * used in most cases, but the planner still expects it to be there for
+	 * estimation purposes (where incorrect results are acceptable).
+	 */
+	bool		strxfrm_is_safe;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -85,6 +115,9 @@ struct pg_locale_struct
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
+
+	struct collate_methods *collate;	/* NULL if collate_is_c */
+
 	union
 	{
 		struct
-- 
2.34.1

v5-0004-Perform-provider-specific-initialization-code-in-.patchtext/x-patch; charset=UTF-8; name=v5-0004-Perform-provider-specific-initialization-code-in-.patchDownload

From 49e43d4e24af1757425b6fd46cf2d4c6afcbce63 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 15:49:32 -0700
Subject: [PATCH v5 4/8] Perform provider-specific initialization code in new
 functions.

---
 src/backend/utils/adt/pg_locale.c      | 215 +++++++++----------------
 src/backend/utils/adt/pg_locale_icu.c  | 112 ++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c |  83 +++++++++-
 3 files changed, 266 insertions(+), 144 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 94cd8d132f7..f49c89c833e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -88,10 +88,18 @@
 
 #define		MAX_L10N_DATA		80
 
+extern pg_locale_t dat_create_locale_icu(HeapTuple dattuple);
+
+extern pg_locale_t coll_create_locale_icu(HeapTuple colltuple,
+										  MemoryContext context);
+
+extern pg_locale_t dat_create_locale_libc(HeapTuple dattuple);
+
+extern pg_locale_t coll_create_locale_libc(HeapTuple colltuple,
+										   MemoryContext context);
+
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int strncoll_icu(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2,
 						pg_locale_t locale);
@@ -103,8 +111,6 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  pg_locale_t locale);
 #endif
 
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
 extern int strncoll_libc(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -135,7 +141,7 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
-static struct pg_locale_struct default_locale;
+static pg_locale_t default_locale = NULL;
 
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
@@ -1225,6 +1231,59 @@ IsoLocaleName(const char *winlocname)
 
 #endif							/* WIN32 && LC_MESSAGES */
 
+static pg_locale_t
+dat_create_locale_builtin(HeapTuple dattuple)
+{
+	Form_pg_database	 dbform;
+	Datum				 datum;
+	const char			*locstr;
+	pg_locale_t			 result;
+
+	dbform = (Form_pg_database) GETSTRUCT(dattuple);
+	datum = SysCacheGetAttrNotNull(DATABASEOID, dattuple,
+								   Anum_pg_database_datlocale);
+	locstr = TextDatumGetCString(datum);
+
+	builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+	result = MemoryContextAllocZero(TopMemoryContext,
+									sizeof(struct pg_locale_struct));
+
+	result->info.builtin.locale = MemoryContextStrdup(TopMemoryContext,
+													  locstr);
+	result->provider = dbform->datlocprovider;
+	result->deterministic = true;
+	result->collate_is_c = true;
+	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+	return result;
+}
+
+static pg_locale_t
+coll_create_locale_builtin(HeapTuple colltuple, MemoryContext context)
+{
+	Form_pg_collation	 collform;
+	Datum				 datum;
+	const char			*locstr;
+	pg_locale_t			 result;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+	datum = SysCacheGetAttrNotNull(COLLOID, colltuple,
+								   Anum_pg_collation_colllocale);
+	locstr = TextDatumGetCString(datum);
+
+	builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	result->provider = collform->collprovider;
+	result->deterministic = collform->collisdeterministic;
+	result->collate_is_c = true;
+	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+	return result;
+}
 
 /*
  * Create a new pg_locale_t struct for the given collation oid.
@@ -1234,80 +1293,22 @@ create_pg_locale(Oid collid, MemoryContext context)
 {
 	/* We haven't computed this yet in this session, so do it */
 	HeapTuple	tp;
+	Datum				 datum;
+	bool				 isnull;
 	Form_pg_collation collform;
 	pg_locale_t	result;
-	Datum		datum;
-	bool		isnull;
-
-	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	result->provider = collform->collprovider;
-	result->deterministic = collform->collisdeterministic;
-
 	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-	{
-		const char *locstr;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		locstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = true;
-		result->ctype_is_c = (strcmp(locstr, "C") == 0);
-
-		builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-		result->info.builtin.locale = MemoryContextStrdup(context,
-														  locstr);
-	}
+		result = coll_create_locale_builtin(tp, context);
 	else if (collform->collprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		const char *iculocstr;
-		const char *icurules;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		iculocstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = false;
-		result->ctype_is_c = false;
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = coll_create_locale_icu(tp, context);
 	else if (collform->collprovider == COLLPROVIDER_LIBC)
-	{
-		const char *collcollate;
-		const char *collctype;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-		collctype = TextDatumGetCString(datum);
-
-		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
-			(strcmp(collcollate, "POSIX") == 0);
-		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
-			(strcmp(collctype, "POSIX") == 0);
-
-		result->info.lt = make_libc_collator(collcollate, collctype);
-	}
+		result = coll_create_locale_libc(tp, context);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
@@ -1367,7 +1368,9 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	Datum		datum;
+	pg_locale_t	result;
+
+	Assert(default_locale == NULL);
 
 	/* Fetch our pg_database row normally, via syscache */
 	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
@@ -1376,80 +1379,18 @@ init_database_collation(void)
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
 	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-	{
-		char	   *datlocale;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		builtin_validate_locale(dbform->encoding, datlocale);
-
-		default_locale.collate_is_c = true;
-		default_locale.ctype_is_c = (strcmp(datlocale, "C") == 0);
-
-		default_locale.info.builtin.locale = MemoryContextStrdup(
-																 TopMemoryContext, datlocale);
-	}
+		result = dat_create_locale_builtin(tup);
 	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		char	   *datlocale;
-		char	   *icurules;
-		bool		isnull;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = false;
-		default_locale.ctype_is_c = false;
-
-		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		default_locale.info.icu.locale = MemoryContextStrdup(TopMemoryContext, datlocale);
-		default_locale.info.icu.ucol = make_icu_collator(datlocale, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = dat_create_locale_icu(tup);
 	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-	{
-		const char *datcollate;
-		const char *datctype;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datcollate);
-		datcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
-		datctype = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = (strcmp(datcollate, "C") == 0) ||
-			(strcmp(datcollate, "POSIX") == 0);
-		default_locale.ctype_is_c = (strcmp(datctype, "C") == 0) ||
-			(strcmp(datctype, "POSIX") == 0);
-
-		default_locale.info.lt = make_libc_collator(datcollate, datctype);
-	}
+		result = dat_create_locale_libc(tup);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
 
-
-	default_locale.provider = dbform->datlocprovider;
-
-	/*
-	 * Default locale is currently always deterministic.  Nondeterministic
-	 * locales currently don't support pattern matching, which would break a
-	 * lot of things if applied globally.
-	 */
-	default_locale.deterministic = true;
-
 	ReleaseSysCache(tup);
+
+	default_locale = result;
 }
 
 /*
@@ -1467,7 +1408,7 @@ pg_newlocale_from_collation(Oid collid)
 	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return &default_locale;
+		return default_locale;
 
 	if (!OidIsValid(collid))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2ffd98ececa..2df1a8226e6 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -12,14 +12,19 @@
 #include "postgres.h"
 
 #ifdef USE_ICU
-
 #include <unicode/ucnv.h>
 #include <unicode/ustring.h>
+#endif
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * This should be large enough that most strings will fit, but small enough
@@ -27,9 +32,14 @@
  */
 #define		TEXTBUFLEN			1024
 
+extern pg_locale_t dat_create_locale_icu(HeapTuple dattuple);
+
+extern pg_locale_t coll_create_locale_icu(HeapTuple colltuple,
+										  MemoryContext context);
+
+#ifdef USE_ICU
+
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int strncoll_icu(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2,
 						pg_locale_t locale);
@@ -47,6 +57,8 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
  */
 static UConverter *icu_converter = NULL;
 
+static UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
 static int strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 								const char *arg2, ssize_t len2,
 								pg_locale_t locale);
@@ -61,6 +73,98 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+#endif
+
+pg_locale_t
+dat_create_locale_icu(HeapTuple dattuple)
+{
+#ifdef USE_ICU
+	Form_pg_database	 dbform;
+	Datum				 datum;
+	bool				 isnull;
+	const char			*iculocstr;
+	const char			*icurules = NULL;
+	UCollator			*collator;
+	pg_locale_t			 result;
+
+	dbform = (Form_pg_database) GETSTRUCT(dattuple);
+
+	datum = SysCacheGetAttrNotNull(DATABASEOID, dattuple,
+								   Anum_pg_database_datlocale);
+	iculocstr = TextDatumGetCString(datum);
+
+	datum = SysCacheGetAttr(DATABASEOID, dattuple,
+							Anum_pg_database_daticurules, &isnull);
+	if (!isnull)
+		icurules = TextDatumGetCString(datum);
+
+	collator = make_icu_collator(iculocstr, icurules);
+
+	result = MemoryContextAllocZero(TopMemoryContext,
+									sizeof(struct pg_locale_struct));
+	result->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
+	result->info.icu.ucol = collator;
+	result->provider = dbform->datlocprovider;
+	result->deterministic = true;
+	result->collate_is_c = false;
+	result->ctype_is_c = false;
+
+	return result;
+#else
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build")));
+
+	return NULL;
+#endif
+}
+
+pg_locale_t
+coll_create_locale_icu(HeapTuple colltuple, MemoryContext context)
+{
+#ifdef USE_ICU
+	Form_pg_collation	 collform;
+	Datum				 datum;
+	bool				 isnull;
+	const char			*iculocstr;
+	const char			*icurules = NULL;
+	UCollator			*collator;
+	pg_locale_t			 result;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+
+	datum = SysCacheGetAttrNotNull(COLLOID, colltuple,
+								   Anum_pg_collation_colllocale);
+	iculocstr = TextDatumGetCString(datum);
+
+	datum = SysCacheGetAttr(COLLOID, colltuple,
+							Anum_pg_collation_collicurules, &isnull);
+	if (!isnull)
+		icurules = TextDatumGetCString(datum);
+
+	collator = make_icu_collator(iculocstr, icurules);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+	result->info.icu.ucol = collator;
+	result->provider = collform->collprovider;
+	result->deterministic = collform->collisdeterministic;
+	result->collate_is_c = false;
+	result->ctype_is_c = false;
+
+	return result;
+#else
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build")));
+
+	return NULL;
+#endif
+}
+
+#ifdef USE_ICU
 
 /*
  * Wrapper around ucol_open() to handle API differences for older ICU
@@ -158,7 +262,7 @@ pg_ucol_open(const char *loc_str)
  *
  * Ensure that no path leaks a UCollator.
  */
-UCollator *
+static UCollator *
 make_icu_collator(const char *iculocstr, const char *icurules)
 {
 	if (!icurules)
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index ab53995b786..5481fd3b802 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,10 +11,15 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * This should be large enough that most strings will fit, but small enough
@@ -22,8 +27,10 @@
  */
 #define		TEXTBUFLEN			1024
 
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
+extern pg_locale_t dat_create_locale_libc(HeapTuple dattuple);
+extern pg_locale_t coll_create_locale_libc(HeapTuple colltuple,
+										   MemoryContext context);
+
 extern int strncoll_libc(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -31,6 +38,8 @@ extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 
+static locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
 static void report_newlocale_failure(const char *localename);
 
 #ifdef WIN32
@@ -39,6 +48,74 @@ static int strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									pg_locale_t locale);
 #endif
 
+pg_locale_t
+dat_create_locale_libc(HeapTuple dattuple)
+{
+	Form_pg_database	 dbform;
+	Datum				 datum;
+	const char			*datcollate;
+	const char			*datctype;
+	locale_t			 loc;
+	pg_locale_t			 result;
+
+	dbform = (Form_pg_database) GETSTRUCT(dattuple);
+
+	datum = SysCacheGetAttrNotNull(DATABASEOID, dattuple,
+								   Anum_pg_database_datcollate);
+	datcollate = TextDatumGetCString(datum);
+
+	datum = SysCacheGetAttrNotNull(DATABASEOID, dattuple,
+								   Anum_pg_database_datctype);
+	datctype = TextDatumGetCString(datum);
+
+	loc = make_libc_collator(datcollate, datctype);
+
+	result = MemoryContextAllocZero(TopMemoryContext,
+									sizeof(struct pg_locale_struct));
+	result->provider = dbform->datlocprovider;
+	result->deterministic = true;
+	result->collate_is_c = (strcmp(datcollate, "C") == 0) ||
+		(strcmp(datcollate, "POSIX") == 0);
+	result->ctype_is_c = (strcmp(datctype, "C") == 0) ||
+		(strcmp(datctype, "POSIX") == 0);
+	result->info.lt = loc;
+
+	return result;
+}
+
+pg_locale_t
+coll_create_locale_libc(HeapTuple colltuple, MemoryContext context)
+{
+	Form_pg_collation	 collform;
+	Datum				 datum;
+	const char			*collcollate;
+	const char			*collctype;
+	locale_t			 loc;
+	pg_locale_t			 result;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+
+	datum = SysCacheGetAttrNotNull(COLLOID, colltuple,
+								   Anum_pg_collation_collcollate);
+	collcollate = TextDatumGetCString(datum);
+	datum = SysCacheGetAttrNotNull(COLLOID, colltuple,
+								   Anum_pg_collation_collctype);
+	collctype = TextDatumGetCString(datum);
+
+	loc = make_libc_collator(collcollate, collctype);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->provider = collform->collprovider;
+	result->deterministic = collform->collisdeterministic;
+	result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
+		(strcmp(collcollate, "POSIX") == 0);
+	result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
+		(strcmp(collctype, "POSIX") == 0);
+	result->info.lt = loc;
+
+	return result;
+}
+
 /*
  * Create a locale_t with the given collation and ctype.
  *
@@ -47,7 +124,7 @@ static int strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
  *
  * Ensure that no path leaks a locale_t.
  */
-locale_t
+static locale_t
 make_libc_collator(const char *collate, const char *ctype)
 {
 	locale_t	loc = 0;
-- 
2.34.1

v5-0008-Introduce-hooks-for-creating-custom-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v5-0008-Introduce-hooks-for-creating-custom-pg_locale_t.patchDownload

From f20966d84bcdc8edc7fe2d93ded092e4a6ef7252 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 16:10:28 -0700
Subject: [PATCH v5 8/8] Introduce hooks for creating custom pg_locale_t.

Now that collation, case mapping, and ctype behavior is controlled
with a method table, we can hook the behavior.

The hooks can provide their own arbitrary method table, which may be
based on a different version of ICU than what Postgres was built with,
or entirely unrelated to ICU/libc.
---
 src/backend/utils/adt/pg_locale.c | 55 ++++++++++++++++++++-----------
 src/include/utils/pg_locale.h     | 16 +++++++++
 2 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 12f8987065c..2409190fd84 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -100,6 +100,9 @@ extern pg_locale_t dat_create_locale_libc(HeapTuple dattuple);
 extern pg_locale_t coll_create_locale_libc(HeapTuple colltuple,
 										   MemoryContext context);
 
+default_pg_locale_hook_type default_pg_locale_hook = NULL;
+create_pg_locale_hook_type	create_pg_locale_hook  = NULL;
+
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
 #endif
@@ -1409,22 +1412,28 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Datum		datum;
 	bool		isnull;
 	Form_pg_collation collform;
-	pg_locale_t	result;
+	pg_locale_t	result = NULL;
 
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		result = coll_create_locale_builtin(tp, context);
-	else if (collform->collprovider == COLLPROVIDER_ICU)
-		result = coll_create_locale_icu(tp, context);
-	else if (collform->collprovider == COLLPROVIDER_LIBC)
-		result = coll_create_locale_libc(tp, context);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(tp, context);
+
+	if (result == NULL)
+	{
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+			result = coll_create_locale_builtin(tp, context);
+		else if (collform->collprovider == COLLPROVIDER_ICU)
+			result = coll_create_locale_icu(tp, context);
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
+			result = coll_create_locale_libc(tp, context);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	}
 
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
@@ -1481,7 +1490,7 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	pg_locale_t	result;
+	pg_locale_t	result = NULL;
 
 	Assert(default_locale == NULL);
 
@@ -1491,15 +1500,21 @@ init_database_collation(void)
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-		result = dat_create_locale_builtin(tup);
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-		result = dat_create_locale_icu(tup);
-	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-		result = dat_create_locale_libc(tup);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+	if (default_pg_locale_hook != NULL)
+		result = default_pg_locale_hook(tup);
+
+	if (result == NULL)
+	{
+		if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+			result = dat_create_locale_builtin(tup);
+		else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			result = dat_create_locale_icu(tup);
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			result = dat_create_locale_libc(tup);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+	}
 
 	ReleaseSysCache(tup);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 3e5f625f661..65ae2dbd078 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -174,6 +174,22 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+/*
+ * Hooks to allow creating a custom pg_locale_t.
+ *
+ * default_pg_locale_hook should allocate the object in TopMemoryContext, and
+ * create_pg_locale_hook should allocate in the provided context.
+ *
+ * Accept a HeapTuple to avoid an extra catalog lookup.
+ */
+struct HeapTupleData;
+typedef pg_locale_t (*default_pg_locale_hook_type)(struct HeapTupleData *dattuple);
+typedef pg_locale_t (*create_pg_locale_hook_type)(struct HeapTupleData *colltuple,
+												  MemoryContext context);
+
+extern PGDLLIMPORT default_pg_locale_hook_type default_pg_locale_hook;
+extern PGDLLIMPORT create_pg_locale_hook_type create_pg_locale_hook;
+
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
-- 
2.34.1

v5-0007-Control-ctype-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v5-0007-Control-ctype-behavior-with-a-method-table.patchDownload

From dfdc402eac48c248f2a70edea91d57989a1af6f1 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 14:30:07 -0700
Subject: [PATCH v5 7/8] Control ctype behavior with a method table.

Previously, ctype behavior (pattern matching) behavior branched based
on the provider.

A method table is less error-prone and easier to hook.
---
 src/backend/regex/regc_pg_locale.c     | 378 +++++--------------------
 src/backend/utils/adt/pg_locale.c      |  62 ++++
 src/backend/utils/adt/pg_locale_icu.c  |  45 +++
 src/backend/utils/adt/pg_locale_libc.c | 169 +++++++++++
 src/include/utils/pg_locale.h          |  23 ++
 5 files changed, 373 insertions(+), 304 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index b75784b6ce5..d256e7be660 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,33 +63,18 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 static Oid	pg_regex_collation;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
-#define PG_ISDIGIT	0x01
-#define PG_ISALPHA	0x02
-#define PG_ISALNUM	(PG_ISDIGIT | PG_ISALPHA)
-#define PG_ISUPPER	0x04
-#define PG_ISLOWER	0x08
-#define PG_ISGRAPH	0x10
-#define PG_ISPRINT	0x20
-#define PG_ISPUNCT	0x40
-#define PG_ISSPACE	0x80
-
-static const unsigned char pg_char_properties[128] = {
+static const unsigned char char_properties_tbl[128] = {
 	 /* NUL */ 0,
 	 /* ^A */ 0,
 	 /* ^B */ 0,
@@ -232,7 +217,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -253,8 +237,8 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
 		collation = C_COLLATION_OID;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -271,32 +255,11 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
+			locale = &dummy_c_locale;
 			collation = C_COLLATION_OID;
 		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
-		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 	pg_regex_collation = collation;
 }
@@ -304,82 +267,31 @@ pg_set_regex_collation(Oid collation)
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISDIGIT));
+	else
+		return char_properties(c, PG_ISDIGIT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALPHA));
+	else
+		return char_properties(c, PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALNUM));
+	else
+		return char_properties(c, PG_ISDIGIT|PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
@@ -394,219 +306,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISUPPER));
+	else
+		return char_properties(c, PG_ISUPPER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISLOWER));
+	else
+		return char_properties(c, PG_ISLOWER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISGRAPH));
+	else
+		return char_properties(c, PG_ISGRAPH, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPRINT));
+	else
+		return char_properties(c, PG_ISPRINT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPUNCT));
+	else
+		return char_properties(c, PG_ISPUNCT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISSPACE));
+	else
+		return char_properties(c, PG_ISSPACE, pg_regex_locale) != 0;
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -732,37 +512,27 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
 			max_chr = (pg_wchar) 127;
 			pcc->cv.cclasscode = -1;
 #else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
+	}
+	else
+	{
 #if MAX_SIMPLE_CHR >= UCHAR_MAX
+		if (pg_regex_locale->provider == COLLPROVIDER_LIBC &&
+			GetDatabaseEncoding() != PG_UTF8)
+		{
 			max_chr = (pg_wchar) UCHAR_MAX;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		}
+		else
 #endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index a106478b119..12f8987065c 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -249,6 +249,50 @@ static struct casemap_methods casemap_methods_builtin = {
 	.strupper = strupper_builtin,
 };
 
+static int
+char_properties_builtin(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int result = 0;
+
+	if ((mask & PG_ISDIGIT) && pg_u_isdigit(wc, true))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && pg_u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && pg_u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && pg_u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && pg_u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && pg_u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && pg_u_ispunct(wc, true))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && pg_u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+ 
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static struct ctype_methods ctype_methods_builtin = {
+	.char_properties = char_properties_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
+
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
@@ -1319,6 +1363,8 @@ dat_create_locale_builtin(HeapTuple dattuple)
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
 	result->casemap = &casemap_methods_builtin;
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
@@ -1346,6 +1392,8 @@ coll_create_locale_builtin(HeapTuple colltuple, MemoryContext context)
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
 	result->casemap = &casemap_methods_builtin;
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
@@ -1773,6 +1821,20 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_properties()
+ *
+ * Out of the properties specified in the given mask, return a new mask of the
+ * properties true for the given character.
+ *
+ * XXX: add caching?
+ */
+int
+char_properties(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	return locale->ctype->char_properties(wc, mask, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 97e96d5b9fb..3951262486e 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -102,6 +102,43 @@ static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 									   const char *locale,
 									   UErrorCode *pErrorCode);
 
+static int
+char_properties_icu(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int result = 0;
+
+	if ((mask & PG_ISDIGIT) && u_isdigit(wc))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && u_ispunct(wc))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -114,6 +151,12 @@ static struct casemap_methods casemap_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 };
+
+static struct ctype_methods ctype_methods_icu = {
+       .char_properties = char_properties_icu,
+       .wc_toupper = toupper_icu,
+       .wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -151,6 +194,7 @@ dat_create_locale_icu(HeapTuple dattuple)
 	result->ctype_is_c = false;
 	result->collate = &collate_methods_icu;
 	result->casemap = &casemap_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -197,6 +241,7 @@ coll_create_locale_icu(HeapTuple colltuple, MemoryContext context)
 	result->ctype_is_c = false;
 	result->collate = &collate_methods_icu;
 	result->casemap = &casemap_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 79828ab3524..6de87d6b948 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -61,6 +61,10 @@ static size_t strupper_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 
+static int char_properties_libc(pg_wchar wc, int mask, pg_locale_t locale);
+static pg_wchar toupper_libc(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc(pg_wchar wc, pg_locale_t locale);
+
 static struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -88,6 +92,12 @@ static struct casemap_methods casemap_methods_libc = {
 	.strupper = strupper_libc,
 };
 
+static struct ctype_methods ctype_methods_libc = {
+	.char_properties = char_properties_libc,
+	.wc_toupper = toupper_libc,
+	.wc_tolower = tolower_libc,
+};
+
 static size_t
 strlower_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			  pg_locale_t locale)
@@ -319,6 +329,8 @@ dat_create_locale_libc(HeapTuple dattuple)
 		result->collate = &collate_methods_libc;
 	if (!result->ctype_is_c)
 		result->casemap = &casemap_methods_libc;
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_libc;
 
 	return result;
 }
@@ -356,6 +368,8 @@ coll_create_locale_libc(HeapTuple colltuple, MemoryContext context)
 		result->collate = &collate_methods_libc;
 	if (!result->ctype_is_c)
 		result->casemap = &casemap_methods_libc;
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_libc;
 
 	return result;
 }
@@ -644,3 +658,158 @@ report_newlocale_failure(const char *localename)
 			  errdetail("The operating system could not find any locale data for the locale name \"%s\".",
 						localename) : 0)));
 }
+
+static int
+char_properties_libc(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int result = 0;
+
+	Assert(!locale->ctype_is_c);
+
+	if (mask & PG_ISDIGIT)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswdigit_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISDIGIT;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				isdigit_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISDIGIT;
+		}
+	}
+	if (mask & PG_ISALPHA)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswalpha_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISALPHA;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				isalpha_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISALPHA;
+		}
+	}
+	if (mask & PG_ISUPPER)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswupper_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISUPPER;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				isupper_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISUPPER;
+		}
+	}
+	if (mask & PG_ISLOWER)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswlower_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISLOWER;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				islower_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISLOWER;
+		}
+	}
+	if (mask & PG_ISGRAPH)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswgraph_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISGRAPH;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				isgraph_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISGRAPH;
+		}
+	}
+	if (mask & PG_ISPRINT)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswprint_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISPRINT;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				isprint_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISPRINT;
+		}
+	}
+	if (mask & PG_ISPUNCT)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswpunct_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISPUNCT;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				ispunct_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISPUNCT;
+		}
+	}
+	if (mask & PG_ISSPACE)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		{
+			if (iswspace_l((wint_t) wc, locale->info.lt))
+				result |= PG_ISSPACE;
+		}
+		else
+		{
+			if (wc <= (pg_wchar) UCHAR_MAX &&
+				isspace_l((unsigned char) wc, locale->info.lt))
+				result |= PG_ISSPACE;
+		}
+	}
+
+	return result;
+}
+
+static pg_wchar
+toupper_libc(pg_wchar wc, pg_locale_t locale)
+{
+	if (GetDatabaseEncoding() == PG_UTF8 &&
+		(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc(pg_wchar wc, pg_locale_t locale)
+{
+	if (GetDatabaseEncoding() == PG_UTF8 &&
+		(sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF))
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 4bd9e6de7a3..3e5f625f661 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,6 +12,8 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #if defined(LOCALE_T_IN_XLOCALE) || defined(WCSTOMBS_L_IN_XLOCALE)
 #include <xlocale.h>
 #endif
@@ -19,6 +21,19 @@
 #include <unicode/ucol.h>
 #endif
 
+/*
+ * Character properties for regular expressions.
+ */
+#define PG_ISDIGIT     0x01
+#define PG_ISALPHA     0x02
+#define PG_ISALNUM     (PG_ISDIGIT | PG_ISALPHA)
+#define PG_ISUPPER     0x04
+#define PG_ISLOWER     0x08
+#define PG_ISGRAPH     0x10
+#define PG_ISPRINT     0x20
+#define PG_ISPUNCT     0x40
+#define PG_ISSPACE     0x80
+
 #ifdef USE_ICU
 /*
  * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
@@ -107,6 +122,12 @@ struct casemap_methods
 							 pg_locale_t locale);
 };
 
+struct ctype_methods {
+	int (*char_properties) (pg_wchar wc, int mask, pg_locale_t locale);
+	pg_wchar (*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar (*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -132,6 +153,7 @@ struct pg_locale_struct
 
 	struct collate_methods *collate;	/* NULL if collate_is_c */
 	struct casemap_methods *casemap;	/* NULL if ctype_is_c */
+	struct ctype_methods *ctype;		/* NULL if ctype_is_c */
 
 	union
 	{
@@ -156,6 +178,7 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern int char_properties(pg_wchar wc, int mask, pg_locale_t locale);
 extern size_t pg_strlower(char *dest, size_t destsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.34.1

v5-0003-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patchtext/x-patch; charset=UTF-8; name=v5-0003-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patchDownload

From 33f6a8ec2cb97b00d85436b1cff5e887fed4c4cd Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 14:58:52 -0700
Subject: [PATCH v5 3/8] Refactor the code to create a pg_locale_t into new
 function.

---
 src/backend/utils/adt/pg_locale.c | 297 ++++++++++++++----------------
 1 file changed, 140 insertions(+), 157 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index ce80ead86dd..94cd8d132f7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1227,42 +1227,136 @@ IsoLocaleName(const char *winlocname)
 
 
 /*
- * Cache mechanism for collation information.
- *
- * Note that we currently lack any way to flush the cache.  Since we don't
- * support ALTER COLLATION, this is OK.  The worst case is that someone
- * drops a collation, and a useless cache entry hangs around in existing
- * backends.
+ * Create a new pg_locale_t struct for the given collation oid.
  */
-static collation_cache_entry *
-lookup_collation_cache(Oid collation)
+static pg_locale_t
+create_pg_locale(Oid collid, MemoryContext context)
 {
-	collation_cache_entry *cache_entry;
-	bool		found;
+	/* We haven't computed this yet in this session, so do it */
+	HeapTuple	tp;
+	Form_pg_collation collform;
+	pg_locale_t	result;
+	Datum		datum;
+	bool		isnull;
 
-	Assert(OidIsValid(collation));
-	Assert(collation != DEFAULT_COLLATION_OID);
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	if (CollationCache == NULL)
+	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for collation %u", collid);
+	collform = (Form_pg_collation) GETSTRUCT(tp);
+
+	result->provider = collform->collprovider;
+	result->deterministic = collform->collisdeterministic;
+
+	if (collform->collprovider == COLLPROVIDER_BUILTIN)
 	{
-		CollationCacheContext = AllocSetContextCreate(TopMemoryContext,
-													  "collation cache",
-													  ALLOCSET_DEFAULT_SIZES);
-		CollationCache = collation_cache_create(CollationCacheContext,
-												16, NULL);
+		const char *locstr;
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+		locstr = TextDatumGetCString(datum);
+
+		result->collate_is_c = true;
+		result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+		builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+		result->info.builtin.locale = MemoryContextStrdup(context,
+														  locstr);
 	}
+	else if (collform->collprovider == COLLPROVIDER_ICU)
+	{
+#ifdef USE_ICU
+		const char *iculocstr;
+		const char *icurules;
 
-	cache_entry = collation_cache_insert(CollationCache, collation, &found);
-	if (!found)
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+		iculocstr = TextDatumGetCString(datum);
+
+		result->collate_is_c = false;
+		result->ctype_is_c = false;
+
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
+#else
+		/* could get here if a collation was created by a build with ICU */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#endif
+	}
+	else if (collform->collprovider == COLLPROVIDER_LIBC)
 	{
-		/*
-		 * Make sure cache entry is marked invalid, in case we fail before
-		 * setting things.
-		 */
-		cache_entry->locale = 0;
+		const char *collcollate;
+		const char *collctype;
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
+		collcollate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
+		collctype = TextDatumGetCString(datum);
+
+		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
+			(strcmp(collcollate, "POSIX") == 0);
+		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
+			(strcmp(collctype, "POSIX") == 0);
+
+		result->info.lt = make_libc_collator(collcollate, collctype);
+	}
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+
+	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
+							&isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+
+		collversionstr = TextDatumGetCString(datum);
+
+		if (collform->collprovider == COLLPROVIDER_LIBC)
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
+		else
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 TextDatumGetCString(datum));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE
+			 * COLLATION but the provider does not support versioning, or
+			 * manually creating a mess in the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was recorded",
+							NameStr(collform->collname))));
+		}
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
 	}
 
-	return cache_entry;
+	ReleaseSysCache(tp);
+
+	return result;
 }
 
 /*
@@ -1370,6 +1464,7 @@ pg_locale_t
 pg_newlocale_from_collation(Oid collid)
 {
 	collation_cache_entry *cache_entry;
+	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
 		return &default_locale;
@@ -1380,140 +1475,28 @@ pg_newlocale_from_collation(Oid collid)
 	if (last_collation_cache_oid == collid)
 		return last_collation_cache_locale;
 
-	cache_entry = lookup_collation_cache(collid);
-
-	if (cache_entry->locale == 0)
+	if (CollationCache == NULL)
 	{
-		/* We haven't computed this yet in this session, so do it */
-		HeapTuple	tp;
-		Form_pg_collation collform;
-		struct pg_locale_struct result;
-		pg_locale_t resultp;
-		Datum		datum;
-		bool		isnull;
-
-		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
-		if (!HeapTupleIsValid(tp))
-			elog(ERROR, "cache lookup failed for collation %u", collid);
-		collform = (Form_pg_collation) GETSTRUCT(tp);
-
-		/* We'll fill in the result struct locally before allocating memory */
-		memset(&result, 0, sizeof(result));
-		result.provider = collform->collprovider;
-		result.deterministic = collform->collisdeterministic;
-
-		if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		{
-			const char *locstr;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-			locstr = TextDatumGetCString(datum);
-
-			result.collate_is_c = true;
-			result.ctype_is_c = (strcmp(locstr, "C") == 0);
-
-			builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-			result.info.builtin.locale = MemoryContextStrdup(TopMemoryContext,
-															 locstr);
-		}
-		else if (collform->collprovider == COLLPROVIDER_ICU)
-		{
-#ifdef USE_ICU
-			const char *iculocstr;
-			const char *icurules;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-			iculocstr = TextDatumGetCString(datum);
-
-			result.collate_is_c = false;
-			result.ctype_is_c = false;
-
-			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-			if (!isnull)
-				icurules = TextDatumGetCString(datum);
-			else
-				icurules = NULL;
-
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
-			result.info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build")));
-#endif
-		}
-		else if (collform->collprovider == COLLPROVIDER_LIBC)
-		{
-			const char *collcollate;
-			const char *collctype;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-			collcollate = TextDatumGetCString(datum);
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-			collctype = TextDatumGetCString(datum);
-
-			result.collate_is_c = (strcmp(collcollate, "C") == 0) ||
-				(strcmp(collcollate, "POSIX") == 0);
-			result.ctype_is_c = (strcmp(collctype, "C") == 0) ||
-				(strcmp(collctype, "POSIX") == 0);
-
-			result.info.lt = make_libc_collator(collcollate, collctype);
-		}
-		else
-			/* shouldn't happen */
-			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-								&isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			collversionstr = TextDatumGetCString(datum);
-
-			if (collform->collprovider == COLLPROVIDER_LIBC)
-				datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-			else
-				datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider,
-															 TextDatumGetCString(datum));
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION but the provider does not support versioning, or
-				 * manually creating a mess in the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was recorded",
-								NameStr(collform->collname))));
-			}
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
-
-		ReleaseSysCache(tp);
+		CollationCacheContext = AllocSetContextCreate(TopMemoryContext,
+													  "collation cache",
+													  ALLOCSET_DEFAULT_SIZES);
+		CollationCache = collation_cache_create(CollationCacheContext,
+												16, NULL);
+	}
 
-		/* We'll keep the pg_locale_t structures in TopMemoryContext */
-		resultp = MemoryContextAlloc(TopMemoryContext, sizeof(*resultp));
-		*resultp = result;
+	cache_entry = collation_cache_insert(CollationCache, collid, &found);
+	if (!found)
+	{
+		/*
+		 * Make sure cache entry is marked invalid, in case we fail before
+		 * setting things.
+		 */
+		cache_entry->locale = 0;
+	}
 
-		cache_entry->locale = resultp;
+	if (cache_entry->locale == 0)
+	{
+		cache_entry->locale = create_pg_locale(collid, CollationCacheContext);
 	}
 
 	last_collation_cache_oid = collid;
-- 
2.34.1

v5-0002-Move-libc-specific-code-from-pg_locale.c-into-pg_.patchtext/x-patch; charset=UTF-8; name=v5-0002-Move-libc-specific-code-from-pg_locale.c-into-pg_.patchDownload

From 477cbb438c44ef241599f07b4d8b427c96b57832 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 14:35:11 -0700
Subject: [PATCH v5 2/8] Move libc-specific code from pg_locale.c into
 pg_locale_libc.c.

---
 src/backend/utils/adt/Makefile         |   1 +
 src/backend/utils/adt/meson.build      |   1 +
 src/backend/utils/adt/pg_locale.c      | 294 +---------------------
 src/backend/utils/adt/pg_locale_libc.c | 326 +++++++++++++++++++++++++
 4 files changed, 337 insertions(+), 285 deletions(-)
 create mode 100644 src/backend/utils/adt/pg_locale_libc.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index bb416c86744..85e5eaf32eb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -80,6 +80,7 @@ OBJS = \
 	partitionfuncs.o \
 	pg_locale.o \
 	pg_locale_icu.o \
+	pg_locale_libc.o \
 	pg_lsn.o \
 	pg_upgrade_support.o \
 	pgstatfuncs.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 19a27465a29..f73f294b8f5 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -67,6 +67,7 @@ backend_sources += files(
   'partitionfuncs.c',
   'pg_locale.c',
   'pg_locale_icu.c',
+  'pg_locale_libc.c',
   'pg_lsn.c',
   'pg_upgrade_support.c',
   'pgstatfuncs.c',
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9ca8cacffbd..ce80ead86dd 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -103,6 +103,15 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  pg_locale_t locale);
 #endif
 
+extern locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
+extern int strncoll_libc(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+extern size_t strnxfrm_libc(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1256,108 +1265,6 @@ lookup_collation_cache(Oid collation)
 	return cache_entry;
 }
 
-/* simple subroutine for reporting errors from newlocale() */
-static void
-report_newlocale_failure(const char *localename)
-{
-	int			save_errno;
-
-	/*
-	 * Windows doesn't provide any useful error indication from
-	 * _create_locale(), and BSD-derived platforms don't seem to feel they
-	 * need to set errno either (even though POSIX is pretty clear that
-	 * newlocale should do so).  So, if errno hasn't been set, assume ENOENT
-	 * is what to report.
-	 */
-	if (errno == 0)
-		errno = ENOENT;
-
-	/*
-	 * ENOENT means "no such locale", not "no such file", so clarify that
-	 * errno with an errdetail message.
-	 */
-	save_errno = errno;			/* auxiliary funcs might change errno */
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-			 errmsg("could not create locale \"%s\": %m",
-					localename),
-			 (save_errno == ENOENT ?
-			  errdetail("The operating system could not find any locale data for the locale name \"%s\".",
-						localename) : 0)));
-}
-
-/*
- * Create a locale_t with the given collation and ctype.
- *
- * The "C" and "POSIX" locales are not actually handled by libc, so return
- * NULL.
- *
- * Ensure that no path leaks a locale_t.
- */
-static locale_t
-make_libc_collator(const char *collate, const char *ctype)
-{
-	locale_t	loc = 0;
-
-	if (strcmp(collate, ctype) == 0)
-	{
-		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
-		{
-			/* Normal case where they're the same */
-			errno = 0;
-#ifndef WIN32
-			loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, collate,
-							NULL);
-#else
-			loc = _create_locale(LC_ALL, collate);
-#endif
-			if (!loc)
-				report_newlocale_failure(collate);
-		}
-	}
-	else
-	{
-#ifndef WIN32
-		/* We need two newlocale() steps */
-		locale_t	loc1 = 0;
-
-		if (strcmp(collate, "C") != 0 && strcmp(collate, "POSIX") != 0)
-		{
-			errno = 0;
-			loc1 = newlocale(LC_COLLATE_MASK, collate, NULL);
-			if (!loc1)
-				report_newlocale_failure(collate);
-		}
-
-		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
-		{
-			errno = 0;
-			loc = newlocale(LC_CTYPE_MASK, ctype, loc1);
-			if (!loc)
-			{
-				if (loc1)
-					freelocale(loc1);
-				report_newlocale_failure(ctype);
-			}
-		}
-		else
-			loc = loc1;
-#else
-
-		/*
-		 * XXX The _create_locale() API doesn't appear to support this. Could
-		 * perhaps be worked around by changing pg_locale_t to contain two
-		 * separate fields.
-		 */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("collations with different collate and ctype values are not supported on this platform")));
-#endif
-	}
-
-	return loc;
-}
-
 /*
  * Initialize default_locale with database locale settings.
  */
@@ -1722,150 +1629,6 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
-/*
- * strncoll_libc_win32_utf8
- *
- * Win32 does not have UTF-8. Convert UTF8 arguments to wide characters and
- * invoke wcscoll_l().
- *
- * An input string length of -1 means that it's NUL-terminated.
- */
-#ifdef WIN32
-static int
-strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
-						 ssize_t len2, pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	char	   *a1p,
-			   *a2p;
-	int			a1len;
-	int			a2len;
-	int			r;
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-	Assert(GetDatabaseEncoding() == PG_UTF8);
-
-	if (len1 == -1)
-		len1 = strlen(arg1);
-	if (len2 == -1)
-		len2 = strlen(arg2);
-
-	a1len = len1 * 2 + 2;
-	a2len = len2 * 2 + 2;
-
-	if (a1len + a2len > TEXTBUFLEN)
-		buf = palloc(a1len + a2len);
-
-	a1p = buf;
-	a2p = buf + a1len;
-
-	/* API does not work for zero-length input */
-	if (len1 == 0)
-		r = 0;
-	else
-	{
-		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
-								(LPWSTR) a1p, a1len / 2);
-		if (!r)
-			ereport(ERROR,
-					(errmsg("could not convert string to UTF-16: error code %lu",
-							GetLastError())));
-	}
-	((LPWSTR) a1p)[r] = 0;
-
-	if (len2 == 0)
-		r = 0;
-	else
-	{
-		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
-								(LPWSTR) a2p, a2len / 2);
-		if (!r)
-			ereport(ERROR,
-					(errmsg("could not convert string to UTF-16: error code %lu",
-							GetLastError())));
-	}
-	((LPWSTR) a2p)[r] = 0;
-
-	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
-	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
-		ereport(ERROR,
-				(errmsg("could not compare Unicode strings: %m")));
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	return result;
-}
-#endif							/* WIN32 */
-
-/*
- * strncoll_libc
- *
- * NUL-terminate arguments, if necessary, and pass to strcoll_l().
- *
- * An input string length of -1 means that it's already NUL-terminated.
- */
-static int
-strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			  pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	size_t		bufsize1 = (len1 == -1) ? 0 : len1 + 1;
-	size_t		bufsize2 = (len2 == -1) ? 0 : len2 + 1;
-	const char *arg1n;
-	const char *arg2n;
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
-#ifdef WIN32
-	/* check for this case before doing the work for nul-termination */
-	if (GetDatabaseEncoding() == PG_UTF8)
-		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
-#endif							/* WIN32 */
-
-	if (bufsize1 + bufsize2 > TEXTBUFLEN)
-		buf = palloc(bufsize1 + bufsize2);
-
-	/* nul-terminate arguments if necessary */
-	if (len1 == -1)
-	{
-		arg1n = arg1;
-	}
-	else
-	{
-		char	   *buf1 = buf;
-
-		memcpy(buf1, arg1, len1);
-		buf1[len1] = '\0';
-		arg1n = buf1;
-	}
-
-	if (len2 == -1)
-	{
-		arg2n = arg2;
-	}
-	else
-	{
-		char	   *buf2 = buf + bufsize1;
-
-		memcpy(buf2, arg2, len2);
-		buf2[len2] = '\0';
-		arg2n = buf2;
-	}
-
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	return result;
-}
-
 /*
  * pg_strcoll
  *
@@ -1922,45 +1685,6 @@ pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	return result;
 }
 
-/*
- * strnxfrm_libc
- *
- * NUL-terminate src, if necessary, and pass to strxfrm_l().
- *
- * A source length of -1 means that it's already NUL-terminated.
- */
-static size_t
-strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
-			  pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	size_t		bufsize = srclen + 1;
-	size_t		result;
-
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
-	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
-
-	if (bufsize > TEXTBUFLEN)
-		buf = palloc(bufsize);
-
-	/* nul-terminate argument */
-	memcpy(buf, src, srclen);
-	buf[srclen] = '\0';
-
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	/* if dest is defined, it should be nul-terminated */
-	Assert(result >= destsize || dest[result] == '\0');
-
-	return result;
-}
-
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
new file mode 100644
index 00000000000..ab53995b786
--- /dev/null
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -0,0 +1,326 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL locale utilities for libc
+ *
+ * Portions Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * src/backend/utils/adt/pg_locale_libc.c
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_collation.h"
+#include "mb/pg_wchar.h"
+#include "utils/formatting.h"
+#include "utils/pg_locale.h"
+
+/*
+ * This should be large enough that most strings will fit, but small enough
+ * that we feel comfortable putting it on the stack
+ */
+#define		TEXTBUFLEN			1024
+
+extern locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
+extern int strncoll_libc(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+extern size_t strnxfrm_libc(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+
+static void report_newlocale_failure(const char *localename);
+
+#ifdef WIN32
+static int strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
+									const char *arg2, ssize_t len2,
+									pg_locale_t locale);
+#endif
+
+/*
+ * Create a locale_t with the given collation and ctype.
+ *
+ * The "C" and "POSIX" locales are not actually handled by libc, so return
+ * NULL.
+ *
+ * Ensure that no path leaks a locale_t.
+ */
+locale_t
+make_libc_collator(const char *collate, const char *ctype)
+{
+	locale_t	loc = 0;
+
+	if (strcmp(collate, ctype) == 0)
+	{
+		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
+		{
+			/* Normal case where they're the same */
+			errno = 0;
+#ifndef WIN32
+			loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, collate,
+							NULL);
+#else
+			loc = _create_locale(LC_ALL, collate);
+#endif
+			if (!loc)
+				report_newlocale_failure(collate);
+		}
+	}
+	else
+	{
+#ifndef WIN32
+		/* We need two newlocale() steps */
+		locale_t	loc1 = 0;
+
+		if (strcmp(collate, "C") != 0 && strcmp(collate, "POSIX") != 0)
+		{
+			errno = 0;
+			loc1 = newlocale(LC_COLLATE_MASK, collate, NULL);
+			if (!loc1)
+				report_newlocale_failure(collate);
+		}
+
+		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
+		{
+			errno = 0;
+			loc = newlocale(LC_CTYPE_MASK, ctype, loc1);
+			if (!loc)
+			{
+				if (loc1)
+					freelocale(loc1);
+				report_newlocale_failure(ctype);
+			}
+		}
+		else
+			loc = loc1;
+#else
+
+		/*
+		 * XXX The _create_locale() API doesn't appear to support this. Could
+		 * perhaps be worked around by changing pg_locale_t to contain two
+		 * separate fields.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("collations with different collate and ctype values are not supported on this platform")));
+#endif
+	}
+
+	return loc;
+}
+
+/*
+ * strncoll_libc
+ *
+ * NUL-terminate arguments, if necessary, and pass to strcoll_l().
+ *
+ * An input string length of -1 means that it's already NUL-terminated.
+ */
+int
+strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+			  pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	size_t		bufsize1 = (len1 == -1) ? 0 : len1 + 1;
+	size_t		bufsize2 = (len2 == -1) ? 0 : len2 + 1;
+	const char *arg1n;
+	const char *arg2n;
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_LIBC);
+
+#ifdef WIN32
+	/* check for this case before doing the work for nul-termination */
+	if (GetDatabaseEncoding() == PG_UTF8)
+		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
+#endif							/* WIN32 */
+
+	if (bufsize1 + bufsize2 > TEXTBUFLEN)
+		buf = palloc(bufsize1 + bufsize2);
+
+	/* nul-terminate arguments if necessary */
+	if (len1 == -1)
+	{
+		arg1n = arg1;
+	}
+	else
+	{
+		char	   *buf1 = buf;
+
+		memcpy(buf1, arg1, len1);
+		buf1[len1] = '\0';
+		arg1n = buf1;
+	}
+
+	if (len2 == -1)
+	{
+		arg2n = arg2;
+	}
+	else
+	{
+		char	   *buf2 = buf + bufsize1;
+
+		memcpy(buf2, arg2, len2);
+		buf2[len2] = '\0';
+		arg2n = buf2;
+	}
+
+	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	return result;
+}
+
+/*
+ * strnxfrm_libc
+ *
+ * NUL-terminate src, if necessary, and pass to strxfrm_l().
+ *
+ * A source length of -1 means that it's already NUL-terminated.
+ */
+size_t
+strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			  pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	size_t		bufsize = srclen + 1;
+	size_t		result;
+
+	Assert(locale->provider == COLLPROVIDER_LIBC);
+
+	if (srclen == -1)
+		return strxfrm_l(dest, src, destsize, locale->info.lt);
+
+	if (bufsize > TEXTBUFLEN)
+		buf = palloc(bufsize);
+
+	/* nul-terminate argument */
+	memcpy(buf, src, srclen);
+	buf[srclen] = '\0';
+
+	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	/* if dest is defined, it should be nul-terminated */
+	Assert(result >= destsize || dest[result] == '\0');
+
+	return result;
+}
+
+/*
+ * strncoll_libc_win32_utf8
+ *
+ * Win32 does not have UTF-8. Convert UTF8 arguments to wide characters and
+ * invoke wcscoll_l().
+ *
+ * An input string length of -1 means that it's NUL-terminated.
+ */
+#ifdef WIN32
+static int
+strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
+						 ssize_t len2, pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	char	   *a1p,
+			   *a2p;
+	int			a1len;
+	int			a2len;
+	int			r;
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_LIBC);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (len1 == -1)
+		len1 = strlen(arg1);
+	if (len2 == -1)
+		len2 = strlen(arg2);
+
+	a1len = len1 * 2 + 2;
+	a2len = len2 * 2 + 2;
+
+	if (a1len + a2len > TEXTBUFLEN)
+		buf = palloc(a1len + a2len);
+
+	a1p = buf;
+	a2p = buf + a1len;
+
+	/* API does not work for zero-length input */
+	if (len1 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
+								(LPWSTR) a1p, a1len / 2);
+		if (!r)
+			ereport(ERROR,
+					(errmsg("could not convert string to UTF-16: error code %lu",
+							GetLastError())));
+	}
+	((LPWSTR) a1p)[r] = 0;
+
+	if (len2 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
+								(LPWSTR) a2p, a2len / 2);
+		if (!r)
+			ereport(ERROR,
+					(errmsg("could not convert string to UTF-16: error code %lu",
+							GetLastError())));
+	}
+	((LPWSTR) a2p)[r] = 0;
+
+	errno = 0;
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
+		ereport(ERROR,
+				(errmsg("could not compare Unicode strings: %m")));
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	return result;
+}
+#endif							/* WIN32 */
+
+/* simple subroutine for reporting errors from newlocale() */
+static void
+report_newlocale_failure(const char *localename)
+{
+	int			save_errno;
+
+	/*
+	 * Windows doesn't provide any useful error indication from
+	 * _create_locale(), and BSD-derived platforms don't seem to feel they
+	 * need to set errno either (even though POSIX is pretty clear that
+	 * newlocale should do so).  So, if errno hasn't been set, assume ENOENT
+	 * is what to report.
+	 */
+	if (errno == 0)
+		errno = ENOENT;
+
+	/*
+	 * ENOENT means "no such locale", not "no such file", so clarify that
+	 * errno with an errdetail message.
+	 */
+	save_errno = errno;			/* auxiliary funcs might change errno */
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("could not create locale \"%s\": %m",
+					localename),
+			 (save_errno == ENOENT ?
+			  errdetail("The operating system could not find any locale data for the locale name \"%s\".",
+						localename) : 0)));
+}
+
-- 
2.34.1

v5-0001-Move-ICU-specific-code-from-pg_locale.c-into-pg_l.patchtext/x-patch; charset=UTF-8; name=v5-0001-Move-ICU-specific-code-from-pg_locale.c-into-pg_l.patchDownload

From 0e907cbc1a4c510b437a6ad055c822900b12fb28 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 24 Sep 2024 17:15:06 -0700
Subject: [PATCH v5 1/8] Move ICU-specific code from pg_locale.c into
 pg_locale_icu.c.

---
 src/backend/utils/adt/Makefile        |   1 +
 src/backend/utils/adt/meson.build     |   1 +
 src/backend/utils/adt/pg_locale.c     | 691 +------------------------
 src/backend/utils/adt/pg_locale_icu.c | 706 ++++++++++++++++++++++++++
 4 files changed, 722 insertions(+), 677 deletions(-)
 create mode 100644 src/backend/utils/adt/pg_locale_icu.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index edb09d4e356..bb416c86744 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -79,6 +79,7 @@ OBJS = \
 	orderedsetaggs.o \
 	partitionfuncs.o \
 	pg_locale.o \
+	pg_locale_icu.o \
 	pg_lsn.o \
 	pg_upgrade_support.o \
 	pgstatfuncs.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 8c6fc80c373..19a27465a29 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -66,6 +66,7 @@ backend_sources += files(
   'orderedsetaggs.c',
   'partitionfuncs.c',
   'pg_locale.c',
+  'pg_locale_icu.c',
   'pg_lsn.c',
   'pg_upgrade_support.c',
   'pgstatfuncs.c',
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f2a28d5ef5a..9ca8cacffbd 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -68,11 +68,6 @@
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
 
-#ifdef USE_ICU
-#include <unicode/ucnv.h>
-#include <unicode/ustring.h>
-#endif
-
 #ifdef __GLIBC__
 #include <gnu/libc-version.h>
 #endif
@@ -93,6 +88,20 @@
 
 #define		MAX_L10N_DATA		80
 
+#ifdef USE_ICU
+extern UCollator *pg_ucol_open(const char *loc_str);
+extern UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
+extern int strncoll_icu(const char *arg1, ssize_t len1,
+						const char *arg2, ssize_t len2,
+						pg_locale_t locale);
+extern size_t strnxfrm_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+#endif
 
 /* GUC settings */
 char	   *locale_messages;
@@ -162,25 +171,6 @@ static pg_locale_t last_collation_cache_locale = NULL;
 static char *IsoLocaleName(const char *);
 #endif
 
-#ifdef USE_ICU
-/*
- * Converter object for converting between ICU's UChar strings and C strings
- * in database encoding.  Since the database encoding doesn't change, we only
- * need one of these per session.
- */
-static UConverter *icu_converter = NULL;
-
-static UCollator *pg_ucol_open(const char *loc_str);
-static void init_icu_converter(void);
-static size_t uchar_length(UConverter *converter,
-						   const char *str, int32_t len);
-static int32_t uchar_convert(UConverter *converter,
-							 UChar *dest, int32_t destlen,
-							 const char *src, int32_t srclen);
-static void icu_set_collation_attributes(UCollator *collator, const char *loc,
-										 UErrorCode *status);
-#endif
-
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
@@ -1368,76 +1358,6 @@ make_libc_collator(const char *collate, const char *ctype)
 	return loc;
 }
 
-/*
- * Create a UCollator with the given locale string and rules.
- *
- * Ensure that no path leaks a UCollator.
- */
-#ifdef USE_ICU
-static UCollator *
-make_icu_collator(const char *iculocstr, const char *icurules)
-{
-	if (!icurules)
-	{
-		/* simple case without rules */
-		return pg_ucol_open(iculocstr);
-	}
-	else
-	{
-		UCollator  *collator_std_rules;
-		UCollator  *collator_all_rules;
-		const UChar *std_rules;
-		UChar	   *my_rules;
-		UChar	   *all_rules;
-		int32_t		length;
-		int32_t		total;
-		UErrorCode	status;
-
-		/*
-		 * If rules are specified, we extract the rules of the standard
-		 * collation, add our own rules, and make a new collator with the
-		 * combined rules.
-		 */
-		icu_to_uchar(&my_rules, icurules, strlen(icurules));
-
-		collator_std_rules = pg_ucol_open(iculocstr);
-
-		std_rules = ucol_getRules(collator_std_rules, &length);
-
-		total = u_strlen(std_rules) + u_strlen(my_rules) + 1;
-
-		/* avoid leaking collator on OOM */
-		all_rules = palloc_extended(sizeof(UChar) * total, MCXT_ALLOC_NO_OOM);
-		if (!all_rules)
-		{
-			ucol_close(collator_std_rules);
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-		}
-
-		u_strcpy(all_rules, std_rules);
-		u_strcat(all_rules, my_rules);
-
-		ucol_close(collator_std_rules);
-
-		status = U_ZERO_ERROR;
-		collator_all_rules = ucol_openRules(all_rules, u_strlen(all_rules),
-											UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH,
-											NULL, &status);
-		if (U_FAILURE(status))
-		{
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
-							iculocstr, icurules, u_errorName(status))));
-		}
-
-		return collator_all_rules;
-	}
-}
-#endif							/* not USE_ICU */
-
 /*
  * Initialize default_locale with database locale settings.
  */
@@ -1946,104 +1866,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	return result;
 }
 
-#ifdef USE_ICU
-
-/*
- * strncoll_icu_no_utf8
- *
- * Convert the arguments from the database encoding to UChar strings, then
- * call ucol_strcoll(). An argument length of -1 means that the string is
- * NUL-terminated.
- *
- * When the database encoding is UTF-8, and ICU supports ucol_strcollUTF8(),
- * caller should call that instead.
- */
-static int
-strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-					 const char *arg2, ssize_t len2, pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	int32_t		ulen1;
-	int32_t		ulen2;
-	size_t		bufsize1;
-	size_t		bufsize2;
-	UChar	   *uchar1,
-			   *uchar2;
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	Assert(GetDatabaseEncoding() != PG_UTF8);
-#endif
-
-	init_icu_converter();
-
-	ulen1 = uchar_length(icu_converter, arg1, len1);
-	ulen2 = uchar_length(icu_converter, arg2, len2);
-
-	bufsize1 = (ulen1 + 1) * sizeof(UChar);
-	bufsize2 = (ulen2 + 1) * sizeof(UChar);
-
-	if (bufsize1 + bufsize2 > TEXTBUFLEN)
-		buf = palloc(bufsize1 + bufsize2);
-
-	uchar1 = (UChar *) buf;
-	uchar2 = (UChar *) (buf + bufsize1);
-
-	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
-	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
-
-	result = ucol_strcoll(locale->info.icu.ucol,
-						  uchar1, ulen1,
-						  uchar2, ulen2);
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	return result;
-}
-
-/*
- * strncoll_icu
- *
- * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
- * database encoding. An argument length of -1 means the string is
- * NUL-terminated.
- */
-static int
-strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			 pg_locale_t locale)
-{
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UErrorCode	status;
-
-		status = U_ZERO_ERROR;
-		result = ucol_strcollUTF8(locale->info.icu.ucol,
-								  arg1, len1,
-								  arg2, len2,
-								  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("collation failed: %s", u_errorName(status))));
-	}
-	else
-#endif
-	{
-		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
-	}
-
-	return result;
-}
-
-#endif							/* USE_ICU */
-
 /*
  * pg_strcoll
  *
@@ -2139,143 +1961,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result;
 }
 
-#ifdef USE_ICU
-
-/* 'srclen' of -1 means the strings are NUL-terminated */
-static size_t
-strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
-			 pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	UChar	   *uchar;
-	int32_t		ulen;
-	size_t		uchar_bsize;
-	Size		result_bsize;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
-	init_icu_converter();
-
-	ulen = uchar_length(icu_converter, src, srclen);
-
-	uchar_bsize = (ulen + 1) * sizeof(UChar);
-
-	if (uchar_bsize > TEXTBUFLEN)
-		buf = palloc(uchar_bsize);
-
-	uchar = (UChar *) buf;
-
-	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
-
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
-								   uchar, ulen,
-								   (uint8_t *) dest, destsize);
-
-	/*
-	 * ucol_getSortKey() counts the nul-terminator in the result length, but
-	 * this function should not.
-	 */
-	Assert(result_bsize > 0);
-	result_bsize--;
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	/* if dest is defined, it should be nul-terminated */
-	Assert(result_bsize >= destsize || dest[result_bsize] == '\0');
-
-	return result_bsize;
-}
-
-/* 'srclen' of -1 means the strings are NUL-terminated */
-static size_t
-strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	UCharIterator iter;
-	uint32_t	state[2];
-	UErrorCode	status;
-	int32_t		ulen = -1;
-	UChar	   *uchar = NULL;
-	size_t		uchar_bsize;
-	Size		result_bsize;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-	Assert(GetDatabaseEncoding() != PG_UTF8);
-
-	init_icu_converter();
-
-	ulen = uchar_length(icu_converter, src, srclen);
-
-	uchar_bsize = (ulen + 1) * sizeof(UChar);
-
-	if (uchar_bsize > TEXTBUFLEN)
-		buf = palloc(uchar_bsize);
-
-	uchar = (UChar *) buf;
-
-	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
-
-	uiter_setString(&iter, uchar, ulen);
-	state[0] = state[1] = 0;	/* won't need that again */
-	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
-										&iter,
-										state,
-										(uint8_t *) dest,
-										destsize,
-										&status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("sort key generation failed: %s",
-						u_errorName(status))));
-
-	return result_bsize;
-}
-
-/* 'srclen' of -1 means the strings are NUL-terminated */
-static size_t
-strnxfrm_prefix_icu(char *dest, size_t destsize,
-					const char *src, ssize_t srclen,
-					pg_locale_t locale)
-{
-	size_t		result;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UCharIterator iter;
-		uint32_t	state[2];
-		UErrorCode	status;
-
-		uiter_setUTF8(&iter, src, srclen);
-		state[0] = state[1] = 0;	/* won't need that again */
-		status = U_ZERO_ERROR;
-		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
-									  &iter,
-									  state,
-									  (uint8_t *) dest,
-									  destsize,
-									  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("sort key generation failed: %s",
-							u_errorName(status))));
-	}
-	else
-		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
-											 locale);
-
-	return result;
-}
-
-#endif
-
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
@@ -2486,354 +2171,6 @@ builtin_validate_locale(int encoding, const char *locale)
 }
 
 
-#ifdef USE_ICU
-
-/*
- * Wrapper around ucol_open() to handle API differences for older ICU
- * versions.
- *
- * Ensure that no path leaks a UCollator.
- */
-static UCollator *
-pg_ucol_open(const char *loc_str)
-{
-	UCollator  *collator;
-	UErrorCode	status;
-	const char *orig_str = loc_str;
-	char	   *fixed_str = NULL;
-
-	/*
-	 * Must never open default collator, because it depends on the environment
-	 * and may change at any time. Should not happen, but check here to catch
-	 * bugs that might be hard to catch otherwise.
-	 *
-	 * NB: the default collator is not the same as the collator for the root
-	 * locale. The root locale may be specified as the empty string, "und", or
-	 * "root". The default collator is opened by passing NULL to ucol_open().
-	 */
-	if (loc_str == NULL)
-		elog(ERROR, "opening default collator is not supported");
-
-	/*
-	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
-	 * the root locale. If the first component of the locale is "und", replace
-	 * with "root" before opening.
-	 */
-	if (U_ICU_VERSION_MAJOR_NUM < 55)
-	{
-		char		lang[ULOC_LANG_CAPACITY];
-
-		status = U_ZERO_ERROR;
-		uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
-		{
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("could not get language from locale \"%s\": %s",
-							loc_str, u_errorName(status))));
-		}
-
-		if (strcmp(lang, "und") == 0)
-		{
-			const char *remainder = loc_str + strlen("und");
-
-			fixed_str = palloc(strlen("root") + strlen(remainder) + 1);
-			strcpy(fixed_str, "root");
-			strcat(fixed_str, remainder);
-
-			loc_str = fixed_str;
-		}
-	}
-
-	status = U_ZERO_ERROR;
-	collator = ucol_open(loc_str, &status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-		/* use original string for error report */
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("could not open collator for locale \"%s\": %s",
-						orig_str, u_errorName(status))));
-
-	if (U_ICU_VERSION_MAJOR_NUM < 54)
-	{
-		status = U_ZERO_ERROR;
-		icu_set_collation_attributes(collator, loc_str, &status);
-
-		/*
-		 * Pretend the error came from ucol_open(), for consistent error
-		 * message across ICU versions.
-		 */
-		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
-		{
-			ucol_close(collator);
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("could not open collator for locale \"%s\": %s",
-							orig_str, u_errorName(status))));
-		}
-	}
-
-	if (fixed_str != NULL)
-		pfree(fixed_str);
-
-	return collator;
-}
-
-static void
-init_icu_converter(void)
-{
-	const char *icu_encoding_name;
-	UErrorCode	status;
-	UConverter *conv;
-
-	if (icu_converter)
-		return;					/* already done */
-
-	icu_encoding_name = get_encoding_name_for_icu(GetDatabaseEncoding());
-	if (!icu_encoding_name)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("encoding \"%s\" not supported by ICU",
-						pg_encoding_to_char(GetDatabaseEncoding()))));
-
-	status = U_ZERO_ERROR;
-	conv = ucnv_open(icu_encoding_name, &status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("could not open ICU converter for encoding \"%s\": %s",
-						icu_encoding_name, u_errorName(status))));
-
-	icu_converter = conv;
-}
-
-/*
- * Find length, in UChars, of given string if converted to UChar string.
- *
- * A length of -1 indicates that the input string is NUL-terminated.
- */
-static size_t
-uchar_length(UConverter *converter, const char *str, int32_t len)
-{
-	UErrorCode	status = U_ZERO_ERROR;
-	int32_t		ulen;
-
-	ulen = ucnv_toUChars(converter, NULL, 0, str, len, &status);
-	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
-	return ulen;
-}
-
-/*
- * Convert the given source string into a UChar string, stored in dest, and
- * return the length (in UChars).
- *
- * A srclen of -1 indicates that the input string is NUL-terminated.
- */
-static int32_t
-uchar_convert(UConverter *converter, UChar *dest, int32_t destlen,
-			  const char *src, int32_t srclen)
-{
-	UErrorCode	status = U_ZERO_ERROR;
-	int32_t		ulen;
-
-	status = U_ZERO_ERROR;
-	ulen = ucnv_toUChars(converter, dest, destlen, src, srclen, &status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
-	return ulen;
-}
-
-/*
- * Convert a string in the database encoding into a string of UChars.
- *
- * The source string at buff is of length nbytes
- * (it needn't be nul-terminated)
- *
- * *buff_uchar receives a pointer to the palloc'd result string, and
- * the function's result is the number of UChars generated.
- *
- * The result string is nul-terminated, though most callers rely on the
- * result length instead.
- */
-int32_t
-icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
-{
-	int32_t		len_uchar;
-
-	init_icu_converter();
-
-	len_uchar = uchar_length(icu_converter, buff, nbytes);
-
-	*buff_uchar = palloc((len_uchar + 1) * sizeof(**buff_uchar));
-	len_uchar = uchar_convert(icu_converter,
-							  *buff_uchar, len_uchar + 1, buff, nbytes);
-
-	return len_uchar;
-}
-
-/*
- * Convert a string of UChars into the database encoding.
- *
- * The source string at buff_uchar is of length len_uchar
- * (it needn't be nul-terminated)
- *
- * *result receives a pointer to the palloc'd result string, and the
- * function's result is the number of bytes generated (not counting nul).
- *
- * The result string is nul-terminated.
- */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
-{
-	UErrorCode	status;
-	int32_t		len_result;
-
-	init_icu_converter();
-
-	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, NULL, 0,
-								 buff_uchar, len_uchar, &status);
-	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_fromUChars",
-						u_errorName(status))));
-
-	*result = palloc(len_result + 1);
-
-	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
-								 buff_uchar, len_uchar, &status);
-	if (U_FAILURE(status) ||
-		status == U_STRING_NOT_TERMINATED_WARNING)
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_fromUChars",
-						u_errorName(status))));
-
-	return len_result;
-}
-
-/*
- * Parse collation attributes from the given locale string and apply them to
- * the open collator.
- *
- * First, the locale string is canonicalized to an ICU format locale ID such
- * as "und@colStrength=primary;colCaseLevel=yes". Then, it parses and applies
- * the key-value arguments.
- *
- * Starting with ICU version 54, the attributes are processed automatically by
- * ucol_open(), so this is only necessary for emulating this behavior on older
- * versions.
- */
-pg_attribute_unused()
-static void
-icu_set_collation_attributes(UCollator *collator, const char *loc,
-							 UErrorCode *status)
-{
-	int32_t		len;
-	char	   *icu_locale_id;
-	char	   *lower_str;
-	char	   *str;
-	char	   *token;
-
-	/*
-	 * The input locale may be a BCP 47 language tag, e.g.
-	 * "und-u-kc-ks-level1", which expresses the same attributes in a
-	 * different form. It will be converted to the equivalent ICU format
-	 * locale ID, e.g. "und@colcaselevel=yes;colstrength=primary", by
-	 * uloc_canonicalize().
-	 */
-	*status = U_ZERO_ERROR;
-	len = uloc_canonicalize(loc, NULL, 0, status);
-	icu_locale_id = palloc(len + 1);
-	*status = U_ZERO_ERROR;
-	len = uloc_canonicalize(loc, icu_locale_id, len + 1, status);
-	if (U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING)
-		return;
-
-	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
-
-	pfree(icu_locale_id);
-
-	str = strchr(lower_str, '@');
-	if (!str)
-		return;
-	str++;
-
-	while ((token = strsep(&str, ";")))
-	{
-		char	   *e = strchr(token, '=');
-
-		if (e)
-		{
-			char	   *name;
-			char	   *value;
-			UColAttribute uattr;
-			UColAttributeValue uvalue;
-
-			*status = U_ZERO_ERROR;
-
-			*e = '\0';
-			name = token;
-			value = e + 1;
-
-			/*
-			 * See attribute name and value lists in ICU i18n/coll.cpp
-			 */
-			if (strcmp(name, "colstrength") == 0)
-				uattr = UCOL_STRENGTH;
-			else if (strcmp(name, "colbackwards") == 0)
-				uattr = UCOL_FRENCH_COLLATION;
-			else if (strcmp(name, "colcaselevel") == 0)
-				uattr = UCOL_CASE_LEVEL;
-			else if (strcmp(name, "colcasefirst") == 0)
-				uattr = UCOL_CASE_FIRST;
-			else if (strcmp(name, "colalternate") == 0)
-				uattr = UCOL_ALTERNATE_HANDLING;
-			else if (strcmp(name, "colnormalization") == 0)
-				uattr = UCOL_NORMALIZATION_MODE;
-			else if (strcmp(name, "colnumeric") == 0)
-				uattr = UCOL_NUMERIC_COLLATION;
-			else
-				/* ignore if unknown */
-				continue;
-
-			if (strcmp(value, "primary") == 0)
-				uvalue = UCOL_PRIMARY;
-			else if (strcmp(value, "secondary") == 0)
-				uvalue = UCOL_SECONDARY;
-			else if (strcmp(value, "tertiary") == 0)
-				uvalue = UCOL_TERTIARY;
-			else if (strcmp(value, "quaternary") == 0)
-				uvalue = UCOL_QUATERNARY;
-			else if (strcmp(value, "identical") == 0)
-				uvalue = UCOL_IDENTICAL;
-			else if (strcmp(value, "no") == 0)
-				uvalue = UCOL_OFF;
-			else if (strcmp(value, "yes") == 0)
-				uvalue = UCOL_ON;
-			else if (strcmp(value, "shifted") == 0)
-				uvalue = UCOL_SHIFTED;
-			else if (strcmp(value, "non-ignorable") == 0)
-				uvalue = UCOL_NON_IGNORABLE;
-			else if (strcmp(value, "lower") == 0)
-				uvalue = UCOL_LOWER_FIRST;
-			else if (strcmp(value, "upper") == 0)
-				uvalue = UCOL_UPPER_FIRST;
-			else
-			{
-				*status = U_ILLEGAL_ARGUMENT_ERROR;
-				break;
-			}
-
-			ucol_setAttribute(collator, uattr, uvalue, status);
-		}
-	}
-
-	pfree(lower_str);
-}
-#endif
 
 /*
  * Return the BCP47 language tag representation of the requested locale.
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
new file mode 100644
index 00000000000..2ffd98ececa
--- /dev/null
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -0,0 +1,706 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL locale utilities for ICU
+ *
+ * Portions Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * src/backend/utils/adt/pg_locale_icu.c
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#ifdef USE_ICU
+
+#include <unicode/ucnv.h>
+#include <unicode/ustring.h>
+
+#include "catalog/pg_collation.h"
+#include "mb/pg_wchar.h"
+#include "utils/formatting.h"
+#include "utils/pg_locale.h"
+
+/*
+ * This should be large enough that most strings will fit, but small enough
+ * that we feel comfortable putting it on the stack
+ */
+#define		TEXTBUFLEN			1024
+
+extern UCollator *pg_ucol_open(const char *loc_str);
+extern UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
+extern int strncoll_icu(const char *arg1, ssize_t len1,
+						const char *arg2, ssize_t len2,
+						pg_locale_t locale);
+extern size_t strnxfrm_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+
+/*
+ * Converter object for converting between ICU's UChar strings and C strings
+ * in database encoding.  Since the database encoding doesn't change, we only
+ * need one of these per session.
+ */
+static UConverter *icu_converter = NULL;
+
+static int strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
+								const char *arg2, ssize_t len2,
+								pg_locale_t locale);
+static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
+										  const char *src, ssize_t srclen,
+										  pg_locale_t locale);
+static void init_icu_converter(void);
+static size_t uchar_length(UConverter *converter,
+						   const char *str, int32_t len);
+static int32_t uchar_convert(UConverter *converter,
+							 UChar *dest, int32_t destlen,
+							 const char *src, int32_t srclen);
+static void icu_set_collation_attributes(UCollator *collator, const char *loc,
+										 UErrorCode *status);
+
+/*
+ * Wrapper around ucol_open() to handle API differences for older ICU
+ * versions.
+ *
+ * Ensure that no path leaks a UCollator.
+ */
+UCollator *
+pg_ucol_open(const char *loc_str)
+{
+	UCollator  *collator;
+	UErrorCode	status;
+	const char *orig_str = loc_str;
+	char	   *fixed_str = NULL;
+
+	/*
+	 * Must never open default collator, because it depends on the environment
+	 * and may change at any time. Should not happen, but check here to catch
+	 * bugs that might be hard to catch otherwise.
+	 *
+	 * NB: the default collator is not the same as the collator for the root
+	 * locale. The root locale may be specified as the empty string, "und", or
+	 * "root". The default collator is opened by passing NULL to ucol_open().
+	 */
+	if (loc_str == NULL)
+		elog(ERROR, "opening default collator is not supported");
+
+	/*
+	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
+	 * the root locale. If the first component of the locale is "und", replace
+	 * with "root" before opening.
+	 */
+	if (U_ICU_VERSION_MAJOR_NUM < 55)
+	{
+		char		lang[ULOC_LANG_CAPACITY];
+
+		status = U_ZERO_ERROR;
+		uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("could not get language from locale \"%s\": %s",
+							loc_str, u_errorName(status))));
+		}
+
+		if (strcmp(lang, "und") == 0)
+		{
+			const char *remainder = loc_str + strlen("und");
+
+			fixed_str = palloc(strlen("root") + strlen(remainder) + 1);
+			strcpy(fixed_str, "root");
+			strcat(fixed_str, remainder);
+
+			loc_str = fixed_str;
+		}
+	}
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(loc_str, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+		/* use original string for error report */
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("could not open collator for locale \"%s\": %s",
+						orig_str, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+	{
+		status = U_ZERO_ERROR;
+		icu_set_collation_attributes(collator, loc_str, &status);
+
+		/*
+		 * Pretend the error came from ucol_open(), for consistent error
+		 * message across ICU versions.
+		 */
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
+		{
+			ucol_close(collator);
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("could not open collator for locale \"%s\": %s",
+							orig_str, u_errorName(status))));
+		}
+	}
+
+	if (fixed_str != NULL)
+		pfree(fixed_str);
+
+	return collator;
+}
+
+/*
+ * Create a UCollator with the given locale string and rules.
+ *
+ * Ensure that no path leaks a UCollator.
+ */
+UCollator *
+make_icu_collator(const char *iculocstr, const char *icurules)
+{
+	if (!icurules)
+	{
+		/* simple case without rules */
+		return pg_ucol_open(iculocstr);
+	}
+	else
+	{
+		UCollator  *collator_std_rules;
+		UCollator  *collator_all_rules;
+		const UChar *std_rules;
+		UChar	   *my_rules;
+		UChar	   *all_rules;
+		int32_t		length;
+		int32_t		total;
+		UErrorCode	status;
+
+		/*
+		 * If rules are specified, we extract the rules of the standard
+		 * collation, add our own rules, and make a new collator with the
+		 * combined rules.
+		 */
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		collator_std_rules = pg_ucol_open(iculocstr);
+
+		std_rules = ucol_getRules(collator_std_rules, &length);
+
+		total = u_strlen(std_rules) + u_strlen(my_rules) + 1;
+
+		/* avoid leaking collator on OOM */
+		all_rules = palloc_extended(sizeof(UChar) * total, MCXT_ALLOC_NO_OOM);
+		if (!all_rules)
+		{
+			ucol_close(collator_std_rules);
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+		}
+
+		u_strcpy(all_rules, std_rules);
+		u_strcat(all_rules, my_rules);
+
+		ucol_close(collator_std_rules);
+
+		status = U_ZERO_ERROR;
+		collator_all_rules = ucol_openRules(all_rules, u_strlen(all_rules),
+											UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH,
+											NULL, &status);
+		if (U_FAILURE(status))
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+		}
+
+		return collator_all_rules;
+	}
+}
+
+/*
+ * strncoll_icu
+ *
+ * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
+ * database encoding. An argument length of -1 means the string is
+ * NUL-terminated.
+ */
+int
+strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+			 pg_locale_t locale)
+{
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		UErrorCode	status;
+
+		status = U_ZERO_ERROR;
+		result = ucol_strcollUTF8(locale->info.icu.ucol,
+								  arg1, len1,
+								  arg2, len2,
+								  &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("collation failed: %s", u_errorName(status))));
+	}
+	else
+#endif
+	{
+		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
+	}
+
+	return result;
+}
+
+/* 'srclen' of -1 means the strings are NUL-terminated */
+size_t
+strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	UChar	   *uchar;
+	int32_t		ulen;
+	size_t		uchar_bsize;
+	Size		result_bsize;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	init_icu_converter();
+
+	ulen = uchar_length(icu_converter, src, srclen);
+
+	uchar_bsize = (ulen + 1) * sizeof(UChar);
+
+	if (uchar_bsize > TEXTBUFLEN)
+		buf = palloc(uchar_bsize);
+
+	uchar = (UChar *) buf;
+
+	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
+
+	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+								   uchar, ulen,
+								   (uint8_t *) dest, destsize);
+
+	/*
+	 * ucol_getSortKey() counts the nul-terminator in the result length, but
+	 * this function should not.
+	 */
+	Assert(result_bsize > 0);
+	result_bsize--;
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	/* if dest is defined, it should be nul-terminated */
+	Assert(result_bsize >= destsize || dest[result_bsize] == '\0');
+
+	return result_bsize;
+}
+
+/* 'srclen' of -1 means the strings are NUL-terminated */
+size_t
+strnxfrm_prefix_icu(char *dest, size_t destsize,
+					const char *src, ssize_t srclen,
+					pg_locale_t locale)
+{
+	size_t		result;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		UCharIterator iter;
+		uint32_t	state[2];
+		UErrorCode	status;
+
+		uiter_setUTF8(&iter, src, srclen);
+		state[0] = state[1] = 0;	/* won't need that again */
+		status = U_ZERO_ERROR;
+		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+									  &iter,
+									  state,
+									  (uint8_t *) dest,
+									  destsize,
+									  &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("sort key generation failed: %s",
+							u_errorName(status))));
+	}
+	else
+		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
+											 locale);
+
+	return result;
+}
+
+/*
+ * Convert a string in the database encoding into a string of UChars.
+ *
+ * The source string at buff is of length nbytes
+ * (it needn't be nul-terminated)
+ *
+ * *buff_uchar receives a pointer to the palloc'd result string, and
+ * the function's result is the number of UChars generated.
+ *
+ * The result string is nul-terminated, though most callers rely on the
+ * result length instead.
+ */
+int32_t
+icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
+{
+	int32_t		len_uchar;
+
+	init_icu_converter();
+
+	len_uchar = uchar_length(icu_converter, buff, nbytes);
+
+	*buff_uchar = palloc((len_uchar + 1) * sizeof(**buff_uchar));
+	len_uchar = uchar_convert(icu_converter,
+							  *buff_uchar, len_uchar + 1, buff, nbytes);
+
+	return len_uchar;
+}
+
+/*
+ * Convert a string of UChars into the database encoding.
+ *
+ * The source string at buff_uchar is of length len_uchar
+ * (it needn't be nul-terminated)
+ *
+ * *result receives a pointer to the palloc'd result string, and the
+ * function's result is the number of bytes generated (not counting nul).
+ *
+ * The result string is nul-terminated.
+ */
+int32_t
+icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+{
+	UErrorCode	status;
+	int32_t		len_result;
+
+	init_icu_converter();
+
+	status = U_ZERO_ERROR;
+	len_result = ucnv_fromUChars(icu_converter, NULL, 0,
+								 buff_uchar, len_uchar, &status);
+	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_fromUChars",
+						u_errorName(status))));
+
+	*result = palloc(len_result + 1);
+
+	status = U_ZERO_ERROR;
+	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+								 buff_uchar, len_uchar, &status);
+	if (U_FAILURE(status) ||
+		status == U_STRING_NOT_TERMINATED_WARNING)
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_fromUChars",
+						u_errorName(status))));
+
+	return len_result;
+}
+
+/*
+ * strncoll_icu_no_utf8
+ *
+ * Convert the arguments from the database encoding to UChar strings, then
+ * call ucol_strcoll(). An argument length of -1 means that the string is
+ * NUL-terminated.
+ *
+ * When the database encoding is UTF-8, and ICU supports ucol_strcollUTF8(),
+ * caller should call that instead.
+ */
+static int
+strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
+					 const char *arg2, ssize_t len2, pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	int32_t		ulen1;
+	int32_t		ulen2;
+	size_t		bufsize1;
+	size_t		bufsize2;
+	UChar	   *uchar1,
+			   *uchar2;
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+#endif
+
+	init_icu_converter();
+
+	ulen1 = uchar_length(icu_converter, arg1, len1);
+	ulen2 = uchar_length(icu_converter, arg2, len2);
+
+	bufsize1 = (ulen1 + 1) * sizeof(UChar);
+	bufsize2 = (ulen2 + 1) * sizeof(UChar);
+
+	if (bufsize1 + bufsize2 > TEXTBUFLEN)
+		buf = palloc(bufsize1 + bufsize2);
+
+	uchar1 = (UChar *) buf;
+	uchar2 = (UChar *) (buf + bufsize1);
+
+	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
+	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
+
+	result = ucol_strcoll(locale->info.icu.ucol,
+						  uchar1, ulen1,
+						  uchar2, ulen2);
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	return result;
+}
+
+/* 'srclen' of -1 means the strings are NUL-terminated */
+static size_t
+strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	UCharIterator iter;
+	uint32_t	state[2];
+	UErrorCode	status;
+	int32_t		ulen = -1;
+	UChar	   *uchar = NULL;
+	size_t		uchar_bsize;
+	Size		result_bsize;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	init_icu_converter();
+
+	ulen = uchar_length(icu_converter, src, srclen);
+
+	uchar_bsize = (ulen + 1) * sizeof(UChar);
+
+	if (uchar_bsize > TEXTBUFLEN)
+		buf = palloc(uchar_bsize);
+
+	uchar = (UChar *) buf;
+
+	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
+
+	uiter_setString(&iter, uchar, ulen);
+	state[0] = state[1] = 0;	/* won't need that again */
+	status = U_ZERO_ERROR;
+	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+										&iter,
+										state,
+										(uint8_t *) dest,
+										destsize,
+										&status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("sort key generation failed: %s",
+						u_errorName(status))));
+
+	return result_bsize;
+}
+
+static void
+init_icu_converter(void)
+{
+	const char *icu_encoding_name;
+	UErrorCode	status;
+	UConverter *conv;
+
+	if (icu_converter)
+		return;					/* already done */
+
+	icu_encoding_name = get_encoding_name_for_icu(GetDatabaseEncoding());
+	if (!icu_encoding_name)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("encoding \"%s\" not supported by ICU",
+						pg_encoding_to_char(GetDatabaseEncoding()))));
+
+	status = U_ZERO_ERROR;
+	conv = ucnv_open(icu_encoding_name, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open ICU converter for encoding \"%s\": %s",
+						icu_encoding_name, u_errorName(status))));
+
+	icu_converter = conv;
+}
+
+/*
+ * Find length, in UChars, of given string if converted to UChar string.
+ *
+ * A length of -1 indicates that the input string is NUL-terminated.
+ */
+static size_t
+uchar_length(UConverter *converter, const char *str, int32_t len)
+{
+	UErrorCode	status = U_ZERO_ERROR;
+	int32_t		ulen;
+
+	ulen = ucnv_toUChars(converter, NULL, 0, str, len, &status);
+	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
+	return ulen;
+}
+
+/*
+ * Convert the given source string into a UChar string, stored in dest, and
+ * return the length (in UChars).
+ *
+ * A srclen of -1 indicates that the input string is NUL-terminated.
+ */
+static int32_t
+uchar_convert(UConverter *converter, UChar *dest, int32_t destlen,
+			  const char *src, int32_t srclen)
+{
+	UErrorCode	status = U_ZERO_ERROR;
+	int32_t		ulen;
+
+	status = U_ZERO_ERROR;
+	ulen = ucnv_toUChars(converter, dest, destlen, src, srclen, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
+	return ulen;
+}
+
+/*
+ * Parse collation attributes from the given locale string and apply them to
+ * the open collator.
+ *
+ * First, the locale string is canonicalized to an ICU format locale ID such
+ * as "und@colStrength=primary;colCaseLevel=yes". Then, it parses and applies
+ * the key-value arguments.
+ *
+ * Starting with ICU version 54, the attributes are processed automatically by
+ * ucol_open(), so this is only necessary for emulating this behavior on older
+ * versions.
+ */
+pg_attribute_unused()
+static void
+icu_set_collation_attributes(UCollator *collator, const char *loc,
+							 UErrorCode *status)
+{
+	int32_t		len;
+	char	   *icu_locale_id;
+	char	   *lower_str;
+	char	   *str;
+	char	   *token;
+
+	/*
+	 * The input locale may be a BCP 47 language tag, e.g.
+	 * "und-u-kc-ks-level1", which expresses the same attributes in a
+	 * different form. It will be converted to the equivalent ICU format
+	 * locale ID, e.g. "und@colcaselevel=yes;colstrength=primary", by
+	 * uloc_canonicalize().
+	 */
+	*status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, NULL, 0, status);
+	icu_locale_id = palloc(len + 1);
+	*status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, icu_locale_id, len + 1, status);
+	if (U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING)
+		return;
+
+	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
+
+	pfree(icu_locale_id);
+
+	str = strchr(lower_str, '@');
+	if (!str)
+		return;
+	str++;
+
+	while ((token = strsep(&str, ";")))
+	{
+		char	   *e = strchr(token, '=');
+
+		if (e)
+		{
+			char	   *name;
+			char	   *value;
+			UColAttribute uattr;
+			UColAttributeValue uvalue;
+
+			*status = U_ZERO_ERROR;
+
+			*e = '\0';
+			name = token;
+			value = e + 1;
+
+			/*
+			 * See attribute name and value lists in ICU i18n/coll.cpp
+			 */
+			if (strcmp(name, "colstrength") == 0)
+				uattr = UCOL_STRENGTH;
+			else if (strcmp(name, "colbackwards") == 0)
+				uattr = UCOL_FRENCH_COLLATION;
+			else if (strcmp(name, "colcaselevel") == 0)
+				uattr = UCOL_CASE_LEVEL;
+			else if (strcmp(name, "colcasefirst") == 0)
+				uattr = UCOL_CASE_FIRST;
+			else if (strcmp(name, "colalternate") == 0)
+				uattr = UCOL_ALTERNATE_HANDLING;
+			else if (strcmp(name, "colnormalization") == 0)
+				uattr = UCOL_NORMALIZATION_MODE;
+			else if (strcmp(name, "colnumeric") == 0)
+				uattr = UCOL_NUMERIC_COLLATION;
+			else
+				/* ignore if unknown */
+				continue;
+
+			if (strcmp(value, "primary") == 0)
+				uvalue = UCOL_PRIMARY;
+			else if (strcmp(value, "secondary") == 0)
+				uvalue = UCOL_SECONDARY;
+			else if (strcmp(value, "tertiary") == 0)
+				uvalue = UCOL_TERTIARY;
+			else if (strcmp(value, "quaternary") == 0)
+				uvalue = UCOL_QUATERNARY;
+			else if (strcmp(value, "identical") == 0)
+				uvalue = UCOL_IDENTICAL;
+			else if (strcmp(value, "no") == 0)
+				uvalue = UCOL_OFF;
+			else if (strcmp(value, "yes") == 0)
+				uvalue = UCOL_ON;
+			else if (strcmp(value, "shifted") == 0)
+				uvalue = UCOL_SHIFTED;
+			else if (strcmp(value, "non-ignorable") == 0)
+				uvalue = UCOL_NON_IGNORABLE;
+			else if (strcmp(value, "lower") == 0)
+				uvalue = UCOL_LOWER_FIRST;
+			else if (strcmp(value, "upper") == 0)
+				uvalue = UCOL_UPPER_FIRST;
+			else
+			{
+				*status = U_ILLEGAL_ARGUMENT_ERROR;
+				break;
+			}
+
+			ucol_setAttribute(collator, uattr, uvalue, status);
+		}
+	}
+
+	pfree(lower_str);
+}
+
+#endif							/* USE_ICU */
-- 
2.34.1

Andreas Karlsson

andreas@proxel.se

over 1 year ago

In reply to: Jeff Davis (#1)

Re: Collation & ctype method table, and extension hooks

On 9/27/24 12:30 AM, Jeff Davis wrote:

The attached patch series refactors the collation and ctype behavior
into method tables, and provides a way to hook the creation of a
pg_locale_t so that an extension can create any kind of method table it
wants.

Great! I had been planning to do this myself so great to see that you
already did it before me. Will take a look at this work later.

Andreas

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Andreas Karlsson (#2)

Re: Collation & ctype method table, and extension hooks

On Fri, 2024-10-04 at 15:24 +0200, Andreas Karlsson wrote:

Great! I had been planning to do this myself so great to see that you
already did it before me. Will take a look at this work later.

Great! We'll need to test whether there are any regressions in the
regex & pattern matching code due to the indirection.

What would be a good test for that? Just running it over long strings?

Regards,
Jeff Davis

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Andreas Karlsson (#2)

11 attachment(s)

Re: Collation & ctype method table, and extension hooks

On Fri, 2024-10-04 at 15:24 +0200, Andreas Karlsson wrote:

On 9/27/24 12:30 AM, Jeff Davis wrote:

The attached patch series refactors the collation and ctype
behavior
into method tables, and provides a way to hook the creation of a
pg_locale_t so that an extension can create any kind of method
table it
wants.

Great! I had been planning to do this myself so great to see that you
already did it before me. Will take a look at this work later.

Attached v6 with significant improvements, and should be easier to
review.

This removes all runtime branching for collation & ctype operations; I
even removed the "provider" field of pg_locale_t to be sure.

This series gets us to the point where it's possible (though not easy)
to completely replace the provider at runtime without missing any
capabilities.

There are many things that would be nice to improve further, such as:

* Have a CREATE LOCALE PROVIDER command and make "provider" an Oid
rather than a char ('b'/'i'/'c'). The v6 patches brings us close to
this point, but I'm not sure if we want to go this far in v18.

* Need an actual extension to prove that it works.

* Clean up the way versions are handled.

* Do we want to provide support for changing the provider at initdb
time?

* The catalog representation is not very clean or general. The libc
provider allows collation and ctype to be set separately, but they
control the environment variables, too. ICU has rules, which are
specific to ICU.

* I've tested the performance for collation and case mapping, and there
does not appear to be any overhead. I didn't observe any performance
overhead for ctype either, but I think I need a more strenuous test to
be sure.

Regards,
Jeff Davis

Attachments:

v6-0001-Move-ICU-specific-code-from-pg_locale.c-into-pg_l.patchtext/x-patch; charset=UTF-8; name=v6-0001-Move-ICU-specific-code-from-pg_locale.c-into-pg_l.patchDownload

From 5e1577f0d466ca112aa9b437aa557a8a4210dca0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 24 Sep 2024 17:15:06 -0700
Subject: [PATCH v6 01/11] Move ICU-specific code from pg_locale.c into
 pg_locale_icu.c.

---
 src/backend/utils/adt/Makefile        |   1 +
 src/backend/utils/adt/meson.build     |   1 +
 src/backend/utils/adt/pg_locale.c     | 691 +------------------------
 src/backend/utils/adt/pg_locale_icu.c | 706 ++++++++++++++++++++++++++
 4 files changed, 722 insertions(+), 677 deletions(-)
 create mode 100644 src/backend/utils/adt/pg_locale_icu.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index edb09d4e35..bb416c8674 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -79,6 +79,7 @@ OBJS = \
 	orderedsetaggs.o \
 	partitionfuncs.o \
 	pg_locale.o \
+	pg_locale_icu.o \
 	pg_lsn.o \
 	pg_upgrade_support.o \
 	pgstatfuncs.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 8c6fc80c37..19a27465a2 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -66,6 +66,7 @@ backend_sources += files(
   'orderedsetaggs.c',
   'partitionfuncs.c',
   'pg_locale.c',
+  'pg_locale_icu.c',
   'pg_lsn.c',
   'pg_upgrade_support.c',
   'pgstatfuncs.c',
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b4954959f9..a13fd5fad6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,11 +69,6 @@
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
 
-#ifdef USE_ICU
-#include <unicode/ucnv.h>
-#include <unicode/ustring.h>
-#endif
-
 #ifdef __GLIBC__
 #include <gnu/libc-version.h>
 #endif
@@ -94,6 +89,20 @@
 
 #define		MAX_L10N_DATA		80
 
+#ifdef USE_ICU
+extern UCollator *pg_ucol_open(const char *loc_str);
+extern UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
+extern int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+extern size_t strnxfrm_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+#endif
 
 /* GUC settings */
 char	   *locale_messages;
@@ -163,25 +172,6 @@ static pg_locale_t last_collation_cache_locale = NULL;
 static char *IsoLocaleName(const char *);
 #endif
 
-#ifdef USE_ICU
-/*
- * Converter object for converting between ICU's UChar strings and C strings
- * in database encoding.  Since the database encoding doesn't change, we only
- * need one of these per session.
- */
-static UConverter *icu_converter = NULL;
-
-static UCollator *pg_ucol_open(const char *loc_str);
-static void init_icu_converter(void);
-static size_t uchar_length(UConverter *converter,
-						   const char *str, int32_t len);
-static int32_t uchar_convert(UConverter *converter,
-							 UChar *dest, int32_t destlen,
-							 const char *src, int32_t srclen);
-static void icu_set_collation_attributes(UCollator *collator, const char *loc,
-										 UErrorCode *status);
-#endif
-
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
@@ -1391,76 +1381,6 @@ make_libc_collator(const char *collate, const char *ctype)
 	return loc;
 }
 
-/*
- * Create a UCollator with the given locale string and rules.
- *
- * Ensure that no path leaks a UCollator.
- */
-#ifdef USE_ICU
-static UCollator *
-make_icu_collator(const char *iculocstr, const char *icurules)
-{
-	if (!icurules)
-	{
-		/* simple case without rules */
-		return pg_ucol_open(iculocstr);
-	}
-	else
-	{
-		UCollator  *collator_std_rules;
-		UCollator  *collator_all_rules;
-		const UChar *std_rules;
-		UChar	   *my_rules;
-		UChar	   *all_rules;
-		int32_t		length;
-		int32_t		total;
-		UErrorCode	status;
-
-		/*
-		 * If rules are specified, we extract the rules of the standard
-		 * collation, add our own rules, and make a new collator with the
-		 * combined rules.
-		 */
-		icu_to_uchar(&my_rules, icurules, strlen(icurules));
-
-		collator_std_rules = pg_ucol_open(iculocstr);
-
-		std_rules = ucol_getRules(collator_std_rules, &length);
-
-		total = u_strlen(std_rules) + u_strlen(my_rules) + 1;
-
-		/* avoid leaking collator on OOM */
-		all_rules = palloc_extended(sizeof(UChar) * total, MCXT_ALLOC_NO_OOM);
-		if (!all_rules)
-		{
-			ucol_close(collator_std_rules);
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-		}
-
-		u_strcpy(all_rules, std_rules);
-		u_strcat(all_rules, my_rules);
-
-		ucol_close(collator_std_rules);
-
-		status = U_ZERO_ERROR;
-		collator_all_rules = ucol_openRules(all_rules, u_strlen(all_rules),
-											UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH,
-											NULL, &status);
-		if (U_FAILURE(status))
-		{
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
-							iculocstr, icurules, u_errorName(status))));
-		}
-
-		return collator_all_rules;
-	}
-}
-#endif							/* not USE_ICU */
-
 /*
  * Initialize default_locale with database locale settings.
  */
@@ -1969,104 +1889,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	return result;
 }
 
-#ifdef USE_ICU
-
-/*
- * strncoll_icu_no_utf8
- *
- * Convert the arguments from the database encoding to UChar strings, then
- * call ucol_strcoll(). An argument length of -1 means that the string is
- * NUL-terminated.
- *
- * When the database encoding is UTF-8, and ICU supports ucol_strcollUTF8(),
- * caller should call that instead.
- */
-static int
-strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-					 const char *arg2, ssize_t len2, pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	int32_t		ulen1;
-	int32_t		ulen2;
-	size_t		bufsize1;
-	size_t		bufsize2;
-	UChar	   *uchar1,
-			   *uchar2;
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	Assert(GetDatabaseEncoding() != PG_UTF8);
-#endif
-
-	init_icu_converter();
-
-	ulen1 = uchar_length(icu_converter, arg1, len1);
-	ulen2 = uchar_length(icu_converter, arg2, len2);
-
-	bufsize1 = (ulen1 + 1) * sizeof(UChar);
-	bufsize2 = (ulen2 + 1) * sizeof(UChar);
-
-	if (bufsize1 + bufsize2 > TEXTBUFLEN)
-		buf = palloc(bufsize1 + bufsize2);
-
-	uchar1 = (UChar *) buf;
-	uchar2 = (UChar *) (buf + bufsize1);
-
-	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
-	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
-
-	result = ucol_strcoll(locale->info.icu.ucol,
-						  uchar1, ulen1,
-						  uchar2, ulen2);
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	return result;
-}
-
-/*
- * strncoll_icu
- *
- * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
- * database encoding. An argument length of -1 means the string is
- * NUL-terminated.
- */
-static int
-strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			 pg_locale_t locale)
-{
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UErrorCode	status;
-
-		status = U_ZERO_ERROR;
-		result = ucol_strcollUTF8(locale->info.icu.ucol,
-								  arg1, len1,
-								  arg2, len2,
-								  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("collation failed: %s", u_errorName(status))));
-	}
-	else
-#endif
-	{
-		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
-	}
-
-	return result;
-}
-
-#endif							/* USE_ICU */
-
 /*
  * pg_strcoll
  *
@@ -2162,143 +1984,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result;
 }
 
-#ifdef USE_ICU
-
-/* 'srclen' of -1 means the strings are NUL-terminated */
-static size_t
-strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
-			 pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	UChar	   *uchar;
-	int32_t		ulen;
-	size_t		uchar_bsize;
-	Size		result_bsize;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
-	init_icu_converter();
-
-	ulen = uchar_length(icu_converter, src, srclen);
-
-	uchar_bsize = (ulen + 1) * sizeof(UChar);
-
-	if (uchar_bsize > TEXTBUFLEN)
-		buf = palloc(uchar_bsize);
-
-	uchar = (UChar *) buf;
-
-	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
-
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
-								   uchar, ulen,
-								   (uint8_t *) dest, destsize);
-
-	/*
-	 * ucol_getSortKey() counts the nul-terminator in the result length, but
-	 * this function should not.
-	 */
-	Assert(result_bsize > 0);
-	result_bsize--;
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	/* if dest is defined, it should be nul-terminated */
-	Assert(result_bsize >= destsize || dest[result_bsize] == '\0');
-
-	return result_bsize;
-}
-
-/* 'srclen' of -1 means the strings are NUL-terminated */
-static size_t
-strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	UCharIterator iter;
-	uint32_t	state[2];
-	UErrorCode	status;
-	int32_t		ulen = -1;
-	UChar	   *uchar = NULL;
-	size_t		uchar_bsize;
-	Size		result_bsize;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-	Assert(GetDatabaseEncoding() != PG_UTF8);
-
-	init_icu_converter();
-
-	ulen = uchar_length(icu_converter, src, srclen);
-
-	uchar_bsize = (ulen + 1) * sizeof(UChar);
-
-	if (uchar_bsize > TEXTBUFLEN)
-		buf = palloc(uchar_bsize);
-
-	uchar = (UChar *) buf;
-
-	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
-
-	uiter_setString(&iter, uchar, ulen);
-	state[0] = state[1] = 0;	/* won't need that again */
-	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
-										&iter,
-										state,
-										(uint8_t *) dest,
-										destsize,
-										&status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("sort key generation failed: %s",
-						u_errorName(status))));
-
-	return result_bsize;
-}
-
-/* 'srclen' of -1 means the strings are NUL-terminated */
-static size_t
-strnxfrm_prefix_icu(char *dest, size_t destsize,
-					const char *src, ssize_t srclen,
-					pg_locale_t locale)
-{
-	size_t		result;
-
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UCharIterator iter;
-		uint32_t	state[2];
-		UErrorCode	status;
-
-		uiter_setUTF8(&iter, src, srclen);
-		state[0] = state[1] = 0;	/* won't need that again */
-		status = U_ZERO_ERROR;
-		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
-									  &iter,
-									  state,
-									  (uint8_t *) dest,
-									  destsize,
-									  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("sort key generation failed: %s",
-							u_errorName(status))));
-	}
-	else
-		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
-											 locale);
-
-	return result;
-}
-
-#endif
-
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
@@ -2509,354 +2194,6 @@ builtin_validate_locale(int encoding, const char *locale)
 }
 
 
-#ifdef USE_ICU
-
-/*
- * Wrapper around ucol_open() to handle API differences for older ICU
- * versions.
- *
- * Ensure that no path leaks a UCollator.
- */
-static UCollator *
-pg_ucol_open(const char *loc_str)
-{
-	UCollator  *collator;
-	UErrorCode	status;
-	const char *orig_str = loc_str;
-	char	   *fixed_str = NULL;
-
-	/*
-	 * Must never open default collator, because it depends on the environment
-	 * and may change at any time. Should not happen, but check here to catch
-	 * bugs that might be hard to catch otherwise.
-	 *
-	 * NB: the default collator is not the same as the collator for the root
-	 * locale. The root locale may be specified as the empty string, "und", or
-	 * "root". The default collator is opened by passing NULL to ucol_open().
-	 */
-	if (loc_str == NULL)
-		elog(ERROR, "opening default collator is not supported");
-
-	/*
-	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
-	 * the root locale. If the first component of the locale is "und", replace
-	 * with "root" before opening.
-	 */
-	if (U_ICU_VERSION_MAJOR_NUM < 55)
-	{
-		char		lang[ULOC_LANG_CAPACITY];
-
-		status = U_ZERO_ERROR;
-		uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
-		{
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("could not get language from locale \"%s\": %s",
-							loc_str, u_errorName(status))));
-		}
-
-		if (strcmp(lang, "und") == 0)
-		{
-			const char *remainder = loc_str + strlen("und");
-
-			fixed_str = palloc(strlen("root") + strlen(remainder) + 1);
-			strcpy(fixed_str, "root");
-			strcat(fixed_str, remainder);
-
-			loc_str = fixed_str;
-		}
-	}
-
-	status = U_ZERO_ERROR;
-	collator = ucol_open(loc_str, &status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-		/* use original string for error report */
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("could not open collator for locale \"%s\": %s",
-						orig_str, u_errorName(status))));
-
-	if (U_ICU_VERSION_MAJOR_NUM < 54)
-	{
-		status = U_ZERO_ERROR;
-		icu_set_collation_attributes(collator, loc_str, &status);
-
-		/*
-		 * Pretend the error came from ucol_open(), for consistent error
-		 * message across ICU versions.
-		 */
-		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
-		{
-			ucol_close(collator);
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("could not open collator for locale \"%s\": %s",
-							orig_str, u_errorName(status))));
-		}
-	}
-
-	if (fixed_str != NULL)
-		pfree(fixed_str);
-
-	return collator;
-}
-
-static void
-init_icu_converter(void)
-{
-	const char *icu_encoding_name;
-	UErrorCode	status;
-	UConverter *conv;
-
-	if (icu_converter)
-		return;					/* already done */
-
-	icu_encoding_name = get_encoding_name_for_icu(GetDatabaseEncoding());
-	if (!icu_encoding_name)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("encoding \"%s\" not supported by ICU",
-						pg_encoding_to_char(GetDatabaseEncoding()))));
-
-	status = U_ZERO_ERROR;
-	conv = ucnv_open(icu_encoding_name, &status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("could not open ICU converter for encoding \"%s\": %s",
-						icu_encoding_name, u_errorName(status))));
-
-	icu_converter = conv;
-}
-
-/*
- * Find length, in UChars, of given string if converted to UChar string.
- *
- * A length of -1 indicates that the input string is NUL-terminated.
- */
-static size_t
-uchar_length(UConverter *converter, const char *str, int32_t len)
-{
-	UErrorCode	status = U_ZERO_ERROR;
-	int32_t		ulen;
-
-	ulen = ucnv_toUChars(converter, NULL, 0, str, len, &status);
-	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
-	return ulen;
-}
-
-/*
- * Convert the given source string into a UChar string, stored in dest, and
- * return the length (in UChars).
- *
- * A srclen of -1 indicates that the input string is NUL-terminated.
- */
-static int32_t
-uchar_convert(UConverter *converter, UChar *dest, int32_t destlen,
-			  const char *src, int32_t srclen)
-{
-	UErrorCode	status = U_ZERO_ERROR;
-	int32_t		ulen;
-
-	status = U_ZERO_ERROR;
-	ulen = ucnv_toUChars(converter, dest, destlen, src, srclen, &status);
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
-	return ulen;
-}
-
-/*
- * Convert a string in the database encoding into a string of UChars.
- *
- * The source string at buff is of length nbytes
- * (it needn't be nul-terminated)
- *
- * *buff_uchar receives a pointer to the palloc'd result string, and
- * the function's result is the number of UChars generated.
- *
- * The result string is nul-terminated, though most callers rely on the
- * result length instead.
- */
-int32_t
-icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
-{
-	int32_t		len_uchar;
-
-	init_icu_converter();
-
-	len_uchar = uchar_length(icu_converter, buff, nbytes);
-
-	*buff_uchar = palloc((len_uchar + 1) * sizeof(**buff_uchar));
-	len_uchar = uchar_convert(icu_converter,
-							  *buff_uchar, len_uchar + 1, buff, nbytes);
-
-	return len_uchar;
-}
-
-/*
- * Convert a string of UChars into the database encoding.
- *
- * The source string at buff_uchar is of length len_uchar
- * (it needn't be nul-terminated)
- *
- * *result receives a pointer to the palloc'd result string, and the
- * function's result is the number of bytes generated (not counting nul).
- *
- * The result string is nul-terminated.
- */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
-{
-	UErrorCode	status;
-	int32_t		len_result;
-
-	init_icu_converter();
-
-	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, NULL, 0,
-								 buff_uchar, len_uchar, &status);
-	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_fromUChars",
-						u_errorName(status))));
-
-	*result = palloc(len_result + 1);
-
-	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
-								 buff_uchar, len_uchar, &status);
-	if (U_FAILURE(status) ||
-		status == U_STRING_NOT_TERMINATED_WARNING)
-		ereport(ERROR,
-				(errmsg("%s failed: %s", "ucnv_fromUChars",
-						u_errorName(status))));
-
-	return len_result;
-}
-
-/*
- * Parse collation attributes from the given locale string and apply them to
- * the open collator.
- *
- * First, the locale string is canonicalized to an ICU format locale ID such
- * as "und@colStrength=primary;colCaseLevel=yes". Then, it parses and applies
- * the key-value arguments.
- *
- * Starting with ICU version 54, the attributes are processed automatically by
- * ucol_open(), so this is only necessary for emulating this behavior on older
- * versions.
- */
-pg_attribute_unused()
-static void
-icu_set_collation_attributes(UCollator *collator, const char *loc,
-							 UErrorCode *status)
-{
-	int32_t		len;
-	char	   *icu_locale_id;
-	char	   *lower_str;
-	char	   *str;
-	char	   *token;
-
-	/*
-	 * The input locale may be a BCP 47 language tag, e.g.
-	 * "und-u-kc-ks-level1", which expresses the same attributes in a
-	 * different form. It will be converted to the equivalent ICU format
-	 * locale ID, e.g. "und@colcaselevel=yes;colstrength=primary", by
-	 * uloc_canonicalize().
-	 */
-	*status = U_ZERO_ERROR;
-	len = uloc_canonicalize(loc, NULL, 0, status);
-	icu_locale_id = palloc(len + 1);
-	*status = U_ZERO_ERROR;
-	len = uloc_canonicalize(loc, icu_locale_id, len + 1, status);
-	if (U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING)
-		return;
-
-	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
-
-	pfree(icu_locale_id);
-
-	str = strchr(lower_str, '@');
-	if (!str)
-		return;
-	str++;
-
-	while ((token = strsep(&str, ";")))
-	{
-		char	   *e = strchr(token, '=');
-
-		if (e)
-		{
-			char	   *name;
-			char	   *value;
-			UColAttribute uattr;
-			UColAttributeValue uvalue;
-
-			*status = U_ZERO_ERROR;
-
-			*e = '\0';
-			name = token;
-			value = e + 1;
-
-			/*
-			 * See attribute name and value lists in ICU i18n/coll.cpp
-			 */
-			if (strcmp(name, "colstrength") == 0)
-				uattr = UCOL_STRENGTH;
-			else if (strcmp(name, "colbackwards") == 0)
-				uattr = UCOL_FRENCH_COLLATION;
-			else if (strcmp(name, "colcaselevel") == 0)
-				uattr = UCOL_CASE_LEVEL;
-			else if (strcmp(name, "colcasefirst") == 0)
-				uattr = UCOL_CASE_FIRST;
-			else if (strcmp(name, "colalternate") == 0)
-				uattr = UCOL_ALTERNATE_HANDLING;
-			else if (strcmp(name, "colnormalization") == 0)
-				uattr = UCOL_NORMALIZATION_MODE;
-			else if (strcmp(name, "colnumeric") == 0)
-				uattr = UCOL_NUMERIC_COLLATION;
-			else
-				/* ignore if unknown */
-				continue;
-
-			if (strcmp(value, "primary") == 0)
-				uvalue = UCOL_PRIMARY;
-			else if (strcmp(value, "secondary") == 0)
-				uvalue = UCOL_SECONDARY;
-			else if (strcmp(value, "tertiary") == 0)
-				uvalue = UCOL_TERTIARY;
-			else if (strcmp(value, "quaternary") == 0)
-				uvalue = UCOL_QUATERNARY;
-			else if (strcmp(value, "identical") == 0)
-				uvalue = UCOL_IDENTICAL;
-			else if (strcmp(value, "no") == 0)
-				uvalue = UCOL_OFF;
-			else if (strcmp(value, "yes") == 0)
-				uvalue = UCOL_ON;
-			else if (strcmp(value, "shifted") == 0)
-				uvalue = UCOL_SHIFTED;
-			else if (strcmp(value, "non-ignorable") == 0)
-				uvalue = UCOL_NON_IGNORABLE;
-			else if (strcmp(value, "lower") == 0)
-				uvalue = UCOL_LOWER_FIRST;
-			else if (strcmp(value, "upper") == 0)
-				uvalue = UCOL_UPPER_FIRST;
-			else
-			{
-				*status = U_ILLEGAL_ARGUMENT_ERROR;
-				break;
-			}
-
-			ucol_setAttribute(collator, uattr, uvalue, status);
-		}
-	}
-
-	pfree(lower_str);
-}
-#endif
 
 /*
  * Return the BCP47 language tag representation of the requested locale.
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
new file mode 100644
index 0000000000..c91954787d
--- /dev/null
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -0,0 +1,706 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL locale utilities for ICU
+ *
+ * Portions Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * src/backend/utils/adt/pg_locale_icu.c
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#ifdef USE_ICU
+
+#include <unicode/ucnv.h>
+#include <unicode/ustring.h>
+
+#include "catalog/pg_collation.h"
+#include "mb/pg_wchar.h"
+#include "utils/formatting.h"
+#include "utils/pg_locale.h"
+
+/*
+ * This should be large enough that most strings will fit, but small enough
+ * that we feel comfortable putting it on the stack
+ */
+#define		TEXTBUFLEN			1024
+
+extern UCollator *pg_ucol_open(const char *loc_str);
+extern UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
+extern int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+extern size_t strnxfrm_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+
+/*
+ * Converter object for converting between ICU's UChar strings and C strings
+ * in database encoding.  Since the database encoding doesn't change, we only
+ * need one of these per session.
+ */
+static UConverter *icu_converter = NULL;
+
+static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
+								 const char *arg2, ssize_t len2,
+								 pg_locale_t locale);
+static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
+										  const char *src, ssize_t srclen,
+										  pg_locale_t locale);
+static void init_icu_converter(void);
+static size_t uchar_length(UConverter *converter,
+						   const char *str, int32_t len);
+static int32_t uchar_convert(UConverter *converter,
+							 UChar *dest, int32_t destlen,
+							 const char *src, int32_t srclen);
+static void icu_set_collation_attributes(UCollator *collator, const char *loc,
+										 UErrorCode *status);
+
+/*
+ * Wrapper around ucol_open() to handle API differences for older ICU
+ * versions.
+ *
+ * Ensure that no path leaks a UCollator.
+ */
+UCollator *
+pg_ucol_open(const char *loc_str)
+{
+	UCollator  *collator;
+	UErrorCode	status;
+	const char *orig_str = loc_str;
+	char	   *fixed_str = NULL;
+
+	/*
+	 * Must never open default collator, because it depends on the environment
+	 * and may change at any time. Should not happen, but check here to catch
+	 * bugs that might be hard to catch otherwise.
+	 *
+	 * NB: the default collator is not the same as the collator for the root
+	 * locale. The root locale may be specified as the empty string, "und", or
+	 * "root". The default collator is opened by passing NULL to ucol_open().
+	 */
+	if (loc_str == NULL)
+		elog(ERROR, "opening default collator is not supported");
+
+	/*
+	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
+	 * the root locale. If the first component of the locale is "und", replace
+	 * with "root" before opening.
+	 */
+	if (U_ICU_VERSION_MAJOR_NUM < 55)
+	{
+		char		lang[ULOC_LANG_CAPACITY];
+
+		status = U_ZERO_ERROR;
+		uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("could not get language from locale \"%s\": %s",
+							loc_str, u_errorName(status))));
+		}
+
+		if (strcmp(lang, "und") == 0)
+		{
+			const char *remainder = loc_str + strlen("und");
+
+			fixed_str = palloc(strlen("root") + strlen(remainder) + 1);
+			strcpy(fixed_str, "root");
+			strcat(fixed_str, remainder);
+
+			loc_str = fixed_str;
+		}
+	}
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(loc_str, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+		/* use original string for error report */
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("could not open collator for locale \"%s\": %s",
+						orig_str, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+	{
+		status = U_ZERO_ERROR;
+		icu_set_collation_attributes(collator, loc_str, &status);
+
+		/*
+		 * Pretend the error came from ucol_open(), for consistent error
+		 * message across ICU versions.
+		 */
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
+		{
+			ucol_close(collator);
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("could not open collator for locale \"%s\": %s",
+							orig_str, u_errorName(status))));
+		}
+	}
+
+	if (fixed_str != NULL)
+		pfree(fixed_str);
+
+	return collator;
+}
+
+/*
+ * Create a UCollator with the given locale string and rules.
+ *
+ * Ensure that no path leaks a UCollator.
+ */
+UCollator *
+make_icu_collator(const char *iculocstr, const char *icurules)
+{
+	if (!icurules)
+	{
+		/* simple case without rules */
+		return pg_ucol_open(iculocstr);
+	}
+	else
+	{
+		UCollator  *collator_std_rules;
+		UCollator  *collator_all_rules;
+		const UChar *std_rules;
+		UChar	   *my_rules;
+		UChar	   *all_rules;
+		int32_t		length;
+		int32_t		total;
+		UErrorCode	status;
+
+		/*
+		 * If rules are specified, we extract the rules of the standard
+		 * collation, add our own rules, and make a new collator with the
+		 * combined rules.
+		 */
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		collator_std_rules = pg_ucol_open(iculocstr);
+
+		std_rules = ucol_getRules(collator_std_rules, &length);
+
+		total = u_strlen(std_rules) + u_strlen(my_rules) + 1;
+
+		/* avoid leaking collator on OOM */
+		all_rules = palloc_extended(sizeof(UChar) * total, MCXT_ALLOC_NO_OOM);
+		if (!all_rules)
+		{
+			ucol_close(collator_std_rules);
+			ereport(ERROR,
+					(errcode(ERRCODE_OUT_OF_MEMORY),
+					 errmsg("out of memory")));
+		}
+
+		u_strcpy(all_rules, std_rules);
+		u_strcat(all_rules, my_rules);
+
+		ucol_close(collator_std_rules);
+
+		status = U_ZERO_ERROR;
+		collator_all_rules = ucol_openRules(all_rules, u_strlen(all_rules),
+											UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH,
+											NULL, &status);
+		if (U_FAILURE(status))
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+		}
+
+		return collator_all_rules;
+	}
+}
+
+/*
+ * strncoll_icu
+ *
+ * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
+ * database encoding. An argument length of -1 means the string is
+ * NUL-terminated.
+ */
+int
+strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+			 pg_locale_t locale)
+{
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		UErrorCode	status;
+
+		status = U_ZERO_ERROR;
+		result = ucol_strcollUTF8(locale->info.icu.ucol,
+								  arg1, len1,
+								  arg2, len2,
+								  &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("collation failed: %s", u_errorName(status))));
+	}
+	else
+#endif
+	{
+		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
+	}
+
+	return result;
+}
+
+/* 'srclen' of -1 means the strings are NUL-terminated */
+size_t
+strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	UChar	   *uchar;
+	int32_t		ulen;
+	size_t		uchar_bsize;
+	Size		result_bsize;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	init_icu_converter();
+
+	ulen = uchar_length(icu_converter, src, srclen);
+
+	uchar_bsize = (ulen + 1) * sizeof(UChar);
+
+	if (uchar_bsize > TEXTBUFLEN)
+		buf = palloc(uchar_bsize);
+
+	uchar = (UChar *) buf;
+
+	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
+
+	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+								   uchar, ulen,
+								   (uint8_t *) dest, destsize);
+
+	/*
+	 * ucol_getSortKey() counts the nul-terminator in the result length, but
+	 * this function should not.
+	 */
+	Assert(result_bsize > 0);
+	result_bsize--;
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	/* if dest is defined, it should be nul-terminated */
+	Assert(result_bsize >= destsize || dest[result_bsize] == '\0');
+
+	return result_bsize;
+}
+
+/* 'srclen' of -1 means the strings are NUL-terminated */
+size_t
+strnxfrm_prefix_icu(char *dest, size_t destsize,
+					const char *src, ssize_t srclen,
+					pg_locale_t locale)
+{
+	size_t		result;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		UCharIterator iter;
+		uint32_t	state[2];
+		UErrorCode	status;
+
+		uiter_setUTF8(&iter, src, srclen);
+		state[0] = state[1] = 0;	/* won't need that again */
+		status = U_ZERO_ERROR;
+		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+									  &iter,
+									  state,
+									  (uint8_t *) dest,
+									  destsize,
+									  &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("sort key generation failed: %s",
+							u_errorName(status))));
+	}
+	else
+		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
+											 locale);
+
+	return result;
+}
+
+/*
+ * Convert a string in the database encoding into a string of UChars.
+ *
+ * The source string at buff is of length nbytes
+ * (it needn't be nul-terminated)
+ *
+ * *buff_uchar receives a pointer to the palloc'd result string, and
+ * the function's result is the number of UChars generated.
+ *
+ * The result string is nul-terminated, though most callers rely on the
+ * result length instead.
+ */
+int32_t
+icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
+{
+	int32_t		len_uchar;
+
+	init_icu_converter();
+
+	len_uchar = uchar_length(icu_converter, buff, nbytes);
+
+	*buff_uchar = palloc((len_uchar + 1) * sizeof(**buff_uchar));
+	len_uchar = uchar_convert(icu_converter,
+							  *buff_uchar, len_uchar + 1, buff, nbytes);
+
+	return len_uchar;
+}
+
+/*
+ * Convert a string of UChars into the database encoding.
+ *
+ * The source string at buff_uchar is of length len_uchar
+ * (it needn't be nul-terminated)
+ *
+ * *result receives a pointer to the palloc'd result string, and the
+ * function's result is the number of bytes generated (not counting nul).
+ *
+ * The result string is nul-terminated.
+ */
+int32_t
+icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+{
+	UErrorCode	status;
+	int32_t		len_result;
+
+	init_icu_converter();
+
+	status = U_ZERO_ERROR;
+	len_result = ucnv_fromUChars(icu_converter, NULL, 0,
+								 buff_uchar, len_uchar, &status);
+	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_fromUChars",
+						u_errorName(status))));
+
+	*result = palloc(len_result + 1);
+
+	status = U_ZERO_ERROR;
+	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+								 buff_uchar, len_uchar, &status);
+	if (U_FAILURE(status) ||
+		status == U_STRING_NOT_TERMINATED_WARNING)
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_fromUChars",
+						u_errorName(status))));
+
+	return len_result;
+}
+
+/*
+ * strncoll_icu_no_utf8
+ *
+ * Convert the arguments from the database encoding to UChar strings, then
+ * call ucol_strcoll(). An argument length of -1 means that the string is
+ * NUL-terminated.
+ *
+ * When the database encoding is UTF-8, and ICU supports ucol_strcollUTF8(),
+ * caller should call that instead.
+ */
+static int
+strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
+					 const char *arg2, ssize_t len2, pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	int32_t		ulen1;
+	int32_t		ulen2;
+	size_t		bufsize1;
+	size_t		bufsize2;
+	UChar	   *uchar1,
+			   *uchar2;
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+#endif
+
+	init_icu_converter();
+
+	ulen1 = uchar_length(icu_converter, arg1, len1);
+	ulen2 = uchar_length(icu_converter, arg2, len2);
+
+	bufsize1 = (ulen1 + 1) * sizeof(UChar);
+	bufsize2 = (ulen2 + 1) * sizeof(UChar);
+
+	if (bufsize1 + bufsize2 > TEXTBUFLEN)
+		buf = palloc(bufsize1 + bufsize2);
+
+	uchar1 = (UChar *) buf;
+	uchar2 = (UChar *) (buf + bufsize1);
+
+	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
+	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
+
+	result = ucol_strcoll(locale->info.icu.ucol,
+						  uchar1, ulen1,
+						  uchar2, ulen2);
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	return result;
+}
+
+/* 'srclen' of -1 means the strings are NUL-terminated */
+static size_t
+strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	UCharIterator iter;
+	uint32_t	state[2];
+	UErrorCode	status;
+	int32_t		ulen = -1;
+	UChar	   *uchar = NULL;
+	size_t		uchar_bsize;
+	Size		result_bsize;
+
+	Assert(locale->provider == COLLPROVIDER_ICU);
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	init_icu_converter();
+
+	ulen = uchar_length(icu_converter, src, srclen);
+
+	uchar_bsize = (ulen + 1) * sizeof(UChar);
+
+	if (uchar_bsize > TEXTBUFLEN)
+		buf = palloc(uchar_bsize);
+
+	uchar = (UChar *) buf;
+
+	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
+
+	uiter_setString(&iter, uchar, ulen);
+	state[0] = state[1] = 0;	/* won't need that again */
+	status = U_ZERO_ERROR;
+	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+										&iter,
+										state,
+										(uint8_t *) dest,
+										destsize,
+										&status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("sort key generation failed: %s",
+						u_errorName(status))));
+
+	return result_bsize;
+}
+
+static void
+init_icu_converter(void)
+{
+	const char *icu_encoding_name;
+	UErrorCode	status;
+	UConverter *conv;
+
+	if (icu_converter)
+		return;					/* already done */
+
+	icu_encoding_name = get_encoding_name_for_icu(GetDatabaseEncoding());
+	if (!icu_encoding_name)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("encoding \"%s\" not supported by ICU",
+						pg_encoding_to_char(GetDatabaseEncoding()))));
+
+	status = U_ZERO_ERROR;
+	conv = ucnv_open(icu_encoding_name, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open ICU converter for encoding \"%s\": %s",
+						icu_encoding_name, u_errorName(status))));
+
+	icu_converter = conv;
+}
+
+/*
+ * Find length, in UChars, of given string if converted to UChar string.
+ *
+ * A length of -1 indicates that the input string is NUL-terminated.
+ */
+static size_t
+uchar_length(UConverter *converter, const char *str, int32_t len)
+{
+	UErrorCode	status = U_ZERO_ERROR;
+	int32_t		ulen;
+
+	ulen = ucnv_toUChars(converter, NULL, 0, str, len, &status);
+	if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
+	return ulen;
+}
+
+/*
+ * Convert the given source string into a UChar string, stored in dest, and
+ * return the length (in UChars).
+ *
+ * A srclen of -1 indicates that the input string is NUL-terminated.
+ */
+static int32_t
+uchar_convert(UConverter *converter, UChar *dest, int32_t destlen,
+			  const char *src, int32_t srclen)
+{
+	UErrorCode	status = U_ZERO_ERROR;
+	int32_t		ulen;
+
+	status = U_ZERO_ERROR;
+	ulen = ucnv_toUChars(converter, dest, destlen, src, srclen, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("%s failed: %s", "ucnv_toUChars", u_errorName(status))));
+	return ulen;
+}
+
+/*
+ * Parse collation attributes from the given locale string and apply them to
+ * the open collator.
+ *
+ * First, the locale string is canonicalized to an ICU format locale ID such
+ * as "und@colStrength=primary;colCaseLevel=yes". Then, it parses and applies
+ * the key-value arguments.
+ *
+ * Starting with ICU version 54, the attributes are processed automatically by
+ * ucol_open(), so this is only necessary for emulating this behavior on older
+ * versions.
+ */
+pg_attribute_unused()
+static void
+icu_set_collation_attributes(UCollator *collator, const char *loc,
+							 UErrorCode *status)
+{
+	int32_t		len;
+	char	   *icu_locale_id;
+	char	   *lower_str;
+	char	   *str;
+	char	   *token;
+
+	/*
+	 * The input locale may be a BCP 47 language tag, e.g.
+	 * "und-u-kc-ks-level1", which expresses the same attributes in a
+	 * different form. It will be converted to the equivalent ICU format
+	 * locale ID, e.g. "und@colcaselevel=yes;colstrength=primary", by
+	 * uloc_canonicalize().
+	 */
+	*status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, NULL, 0, status);
+	icu_locale_id = palloc(len + 1);
+	*status = U_ZERO_ERROR;
+	len = uloc_canonicalize(loc, icu_locale_id, len + 1, status);
+	if (U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING)
+		return;
+
+	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
+
+	pfree(icu_locale_id);
+
+	str = strchr(lower_str, '@');
+	if (!str)
+		return;
+	str++;
+
+	while ((token = strsep(&str, ";")))
+	{
+		char	   *e = strchr(token, '=');
+
+		if (e)
+		{
+			char	   *name;
+			char	   *value;
+			UColAttribute uattr;
+			UColAttributeValue uvalue;
+
+			*status = U_ZERO_ERROR;
+
+			*e = '\0';
+			name = token;
+			value = e + 1;
+
+			/*
+			 * See attribute name and value lists in ICU i18n/coll.cpp
+			 */
+			if (strcmp(name, "colstrength") == 0)
+				uattr = UCOL_STRENGTH;
+			else if (strcmp(name, "colbackwards") == 0)
+				uattr = UCOL_FRENCH_COLLATION;
+			else if (strcmp(name, "colcaselevel") == 0)
+				uattr = UCOL_CASE_LEVEL;
+			else if (strcmp(name, "colcasefirst") == 0)
+				uattr = UCOL_CASE_FIRST;
+			else if (strcmp(name, "colalternate") == 0)
+				uattr = UCOL_ALTERNATE_HANDLING;
+			else if (strcmp(name, "colnormalization") == 0)
+				uattr = UCOL_NORMALIZATION_MODE;
+			else if (strcmp(name, "colnumeric") == 0)
+				uattr = UCOL_NUMERIC_COLLATION;
+			else
+				/* ignore if unknown */
+				continue;
+
+			if (strcmp(value, "primary") == 0)
+				uvalue = UCOL_PRIMARY;
+			else if (strcmp(value, "secondary") == 0)
+				uvalue = UCOL_SECONDARY;
+			else if (strcmp(value, "tertiary") == 0)
+				uvalue = UCOL_TERTIARY;
+			else if (strcmp(value, "quaternary") == 0)
+				uvalue = UCOL_QUATERNARY;
+			else if (strcmp(value, "identical") == 0)
+				uvalue = UCOL_IDENTICAL;
+			else if (strcmp(value, "no") == 0)
+				uvalue = UCOL_OFF;
+			else if (strcmp(value, "yes") == 0)
+				uvalue = UCOL_ON;
+			else if (strcmp(value, "shifted") == 0)
+				uvalue = UCOL_SHIFTED;
+			else if (strcmp(value, "non-ignorable") == 0)
+				uvalue = UCOL_NON_IGNORABLE;
+			else if (strcmp(value, "lower") == 0)
+				uvalue = UCOL_LOWER_FIRST;
+			else if (strcmp(value, "upper") == 0)
+				uvalue = UCOL_UPPER_FIRST;
+			else
+			{
+				*status = U_ILLEGAL_ARGUMENT_ERROR;
+				break;
+			}
+
+			ucol_setAttribute(collator, uattr, uvalue, status);
+		}
+	}
+
+	pfree(lower_str);
+}
+
+#endif							/* USE_ICU */
-- 
2.34.1

v6-0002-Move-libc-specific-code-from-pg_locale.c-into-pg_.patchtext/x-patch; charset=UTF-8; name=v6-0002-Move-libc-specific-code-from-pg_locale.c-into-pg_.patchDownload

From 140e89d940ce0f138629c5fa5cbdcc5369840647 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 14:35:11 -0700
Subject: [PATCH v6 02/11] Move libc-specific code from pg_locale.c into
 pg_locale_libc.c.

---
 src/backend/utils/adt/Makefile         |   1 +
 src/backend/utils/adt/meson.build      |   1 +
 src/backend/utils/adt/pg_locale.c      | 473 +----------------------
 src/backend/utils/adt/pg_locale_libc.c | 500 +++++++++++++++++++++++++
 4 files changed, 511 insertions(+), 464 deletions(-)
 create mode 100644 src/backend/utils/adt/pg_locale_libc.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index bb416c8674..85e5eaf32e 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -80,6 +80,7 @@ OBJS = \
 	partitionfuncs.o \
 	pg_locale.o \
 	pg_locale_icu.o \
+	pg_locale_libc.o \
 	pg_lsn.o \
 	pg_upgrade_support.o \
 	pgstatfuncs.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 19a27465a2..f73f294b8f 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -67,6 +67,7 @@ backend_sources += files(
   'partitionfuncs.c',
   'pg_locale.c',
   'pg_locale_icu.c',
+  'pg_locale_libc.c',
   'pg_lsn.c',
   'pg_upgrade_support.c',
   'pgstatfuncs.c',
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index a13fd5fad6..298c2a23e5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -104,6 +104,15 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  pg_locale_t locale);
 #endif
 
+extern locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
+extern int	strncoll_libc(const char *arg1, ssize_t len1,
+						  const char *arg2, ssize_t len2,
+						  pg_locale_t locale);
+extern size_t strnxfrm_libc(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -172,43 +181,6 @@ static pg_locale_t last_collation_cache_locale = NULL;
 static char *IsoLocaleName(const char *);
 #endif
 
-/*
- * POSIX doesn't define _l-variants of these functions, but several systems
- * have them.  We provide our own replacements here.
- */
-#ifndef HAVE_MBSTOWCS_L
-static size_t
-mbstowcs_l(wchar_t *dest, const char *src, size_t n, locale_t loc)
-{
-#ifdef WIN32
-	return _mbstowcs_l(dest, src, n, loc);
-#else
-	size_t		result;
-	locale_t	save_locale = uselocale(loc);
-
-	result = mbstowcs(dest, src, n);
-	uselocale(save_locale);
-	return result;
-#endif
-}
-#endif
-#ifndef HAVE_WCSTOMBS_L
-static size_t
-wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
-{
-#ifdef WIN32
-	return _wcstombs_l(dest, src, n, loc);
-#else
-	size_t		result;
-	locale_t	save_locale = uselocale(loc);
-
-	result = wcstombs(dest, src, n);
-	uselocale(save_locale);
-	return result;
-#endif
-}
-#endif
-
 /*
  * pg_perm_setlocale
  *
@@ -1279,108 +1251,6 @@ lookup_collation_cache(Oid collation)
 	return cache_entry;
 }
 
-/* simple subroutine for reporting errors from newlocale() */
-static void
-report_newlocale_failure(const char *localename)
-{
-	int			save_errno;
-
-	/*
-	 * Windows doesn't provide any useful error indication from
-	 * _create_locale(), and BSD-derived platforms don't seem to feel they
-	 * need to set errno either (even though POSIX is pretty clear that
-	 * newlocale should do so).  So, if errno hasn't been set, assume ENOENT
-	 * is what to report.
-	 */
-	if (errno == 0)
-		errno = ENOENT;
-
-	/*
-	 * ENOENT means "no such locale", not "no such file", so clarify that
-	 * errno with an errdetail message.
-	 */
-	save_errno = errno;			/* auxiliary funcs might change errno */
-	ereport(ERROR,
-			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-			 errmsg("could not create locale \"%s\": %m",
-					localename),
-			 (save_errno == ENOENT ?
-			  errdetail("The operating system could not find any locale data for the locale name \"%s\".",
-						localename) : 0)));
-}
-
-/*
- * Create a locale_t with the given collation and ctype.
- *
- * The "C" and "POSIX" locales are not actually handled by libc, so return
- * NULL.
- *
- * Ensure that no path leaks a locale_t.
- */
-static locale_t
-make_libc_collator(const char *collate, const char *ctype)
-{
-	locale_t	loc = 0;
-
-	if (strcmp(collate, ctype) == 0)
-	{
-		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
-		{
-			/* Normal case where they're the same */
-			errno = 0;
-#ifndef WIN32
-			loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, collate,
-							NULL);
-#else
-			loc = _create_locale(LC_ALL, collate);
-#endif
-			if (!loc)
-				report_newlocale_failure(collate);
-		}
-	}
-	else
-	{
-#ifndef WIN32
-		/* We need two newlocale() steps */
-		locale_t	loc1 = 0;
-
-		if (strcmp(collate, "C") != 0 && strcmp(collate, "POSIX") != 0)
-		{
-			errno = 0;
-			loc1 = newlocale(LC_COLLATE_MASK, collate, NULL);
-			if (!loc1)
-				report_newlocale_failure(collate);
-		}
-
-		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
-		{
-			errno = 0;
-			loc = newlocale(LC_CTYPE_MASK, ctype, loc1);
-			if (!loc)
-			{
-				if (loc1)
-					freelocale(loc1);
-				report_newlocale_failure(ctype);
-			}
-		}
-		else
-			loc = loc1;
-#else
-
-		/*
-		 * XXX The _create_locale() API doesn't appear to support this. Could
-		 * perhaps be worked around by changing pg_locale_t to contain two
-		 * separate fields.
-		 */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("collations with different collate and ctype values are not supported on this platform")));
-#endif
-	}
-
-	return loc;
-}
-
 /*
  * Initialize default_locale with database locale settings.
  */
@@ -1745,150 +1615,6 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
-/*
- * strncoll_libc_win32_utf8
- *
- * Win32 does not have UTF-8. Convert UTF8 arguments to wide characters and
- * invoke wcscoll_l().
- *
- * An input string length of -1 means that it's NUL-terminated.
- */
-#ifdef WIN32
-static int
-strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
-						 ssize_t len2, pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	char	   *a1p,
-			   *a2p;
-	int			a1len;
-	int			a2len;
-	int			r;
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-	Assert(GetDatabaseEncoding() == PG_UTF8);
-
-	if (len1 == -1)
-		len1 = strlen(arg1);
-	if (len2 == -1)
-		len2 = strlen(arg2);
-
-	a1len = len1 * 2 + 2;
-	a2len = len2 * 2 + 2;
-
-	if (a1len + a2len > TEXTBUFLEN)
-		buf = palloc(a1len + a2len);
-
-	a1p = buf;
-	a2p = buf + a1len;
-
-	/* API does not work for zero-length input */
-	if (len1 == 0)
-		r = 0;
-	else
-	{
-		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
-								(LPWSTR) a1p, a1len / 2);
-		if (!r)
-			ereport(ERROR,
-					(errmsg("could not convert string to UTF-16: error code %lu",
-							GetLastError())));
-	}
-	((LPWSTR) a1p)[r] = 0;
-
-	if (len2 == 0)
-		r = 0;
-	else
-	{
-		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
-								(LPWSTR) a2p, a2len / 2);
-		if (!r)
-			ereport(ERROR,
-					(errmsg("could not convert string to UTF-16: error code %lu",
-							GetLastError())));
-	}
-	((LPWSTR) a2p)[r] = 0;
-
-	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
-	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
-		ereport(ERROR,
-				(errmsg("could not compare Unicode strings: %m")));
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	return result;
-}
-#endif							/* WIN32 */
-
-/*
- * strncoll_libc
- *
- * NUL-terminate arguments, if necessary, and pass to strcoll_l().
- *
- * An input string length of -1 means that it's already NUL-terminated.
- */
-static int
-strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			  pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	size_t		bufsize1 = (len1 == -1) ? 0 : len1 + 1;
-	size_t		bufsize2 = (len2 == -1) ? 0 : len2 + 1;
-	const char *arg1n;
-	const char *arg2n;
-	int			result;
-
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
-#ifdef WIN32
-	/* check for this case before doing the work for nul-termination */
-	if (GetDatabaseEncoding() == PG_UTF8)
-		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
-#endif							/* WIN32 */
-
-	if (bufsize1 + bufsize2 > TEXTBUFLEN)
-		buf = palloc(bufsize1 + bufsize2);
-
-	/* nul-terminate arguments if necessary */
-	if (len1 == -1)
-	{
-		arg1n = arg1;
-	}
-	else
-	{
-		char	   *buf1 = buf;
-
-		memcpy(buf1, arg1, len1);
-		buf1[len1] = '\0';
-		arg1n = buf1;
-	}
-
-	if (len2 == -1)
-	{
-		arg2n = arg2;
-	}
-	else
-	{
-		char	   *buf2 = buf + bufsize1;
-
-		memcpy(buf2, arg2, len2);
-		buf2[len2] = '\0';
-		arg2n = buf2;
-	}
-
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	return result;
-}
-
 /*
  * pg_strcoll
  *
@@ -1945,45 +1671,6 @@ pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	return result;
 }
 
-/*
- * strnxfrm_libc
- *
- * NUL-terminate src, if necessary, and pass to strxfrm_l().
- *
- * A source length of -1 means that it's already NUL-terminated.
- */
-static size_t
-strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
-			  pg_locale_t locale)
-{
-	char		sbuf[TEXTBUFLEN];
-	char	   *buf = sbuf;
-	size_t		bufsize = srclen + 1;
-	size_t		result;
-
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
-	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
-
-	if (bufsize > TEXTBUFLEN)
-		buf = palloc(bufsize);
-
-	/* nul-terminate argument */
-	memcpy(buf, src, srclen);
-	buf[srclen] = '\0';
-
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
-
-	if (buf != sbuf)
-		pfree(buf);
-
-	/* if dest is defined, it should be nul-terminated */
-	Assert(result >= destsize || dest[result] == '\0');
-
-	return result;
-}
-
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
@@ -2332,145 +2019,3 @@ icu_validate_locale(const char *loc_str)
 			 errmsg("ICU is not supported in this build")));
 #endif							/* not USE_ICU */
 }
-
-/*
- * These functions convert from/to libc's wchar_t, *not* pg_wchar_t.
- * Therefore we keep them here rather than with the mbutils code.
- */
-
-/*
- * wchar2char --- convert wide characters to multibyte format
- *
- * This has the same API as the standard wcstombs_l() function; in particular,
- * tolen is the maximum number of bytes to store at *to, and *from must be
- * zero-terminated.  The output will be zero-terminated iff there is room.
- */
-size_t
-wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
-{
-	size_t		result;
-
-	Assert(!locale || locale->provider == COLLPROVIDER_LIBC);
-
-	if (tolen == 0)
-		return 0;
-
-#ifdef WIN32
-
-	/*
-	 * On Windows, the "Unicode" locales assume UTF16 not UTF8 encoding, and
-	 * for some reason mbstowcs and wcstombs won't do this for us, so we use
-	 * MultiByteToWideChar().
-	 */
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		result = WideCharToMultiByte(CP_UTF8, 0, from, -1, to, tolen,
-									 NULL, NULL);
-		/* A zero return is failure */
-		if (result <= 0)
-			result = -1;
-		else
-		{
-			Assert(result <= tolen);
-			/* Microsoft counts the zero terminator in the result */
-			result--;
-		}
-	}
-	else
-#endif							/* WIN32 */
-	if (locale == (pg_locale_t) 0)
-	{
-		/* Use wcstombs directly for the default locale */
-		result = wcstombs(to, from, tolen);
-	}
-	else
-	{
-		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
-	}
-
-	return result;
-}
-
-/*
- * char2wchar --- convert multibyte characters to wide characters
- *
- * This has almost the API of mbstowcs_l(), except that *from need not be
- * null-terminated; instead, the number of input bytes is specified as
- * fromlen.  Also, we ereport() rather than returning -1 for invalid
- * input encoding.  tolen is the maximum number of wchar_t's to store at *to.
- * The output will be zero-terminated iff there is room.
- */
-size_t
-char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
-		   pg_locale_t locale)
-{
-	size_t		result;
-
-	Assert(!locale || locale->provider == COLLPROVIDER_LIBC);
-
-	if (tolen == 0)
-		return 0;
-
-#ifdef WIN32
-	/* See WIN32 "Unicode" comment above */
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		/* Win32 API does not work for zero-length input */
-		if (fromlen == 0)
-			result = 0;
-		else
-		{
-			result = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen - 1);
-			/* A zero return is failure */
-			if (result == 0)
-				result = -1;
-		}
-
-		if (result != -1)
-		{
-			Assert(result < tolen);
-			/* Append trailing null wchar (MultiByteToWideChar() does not) */
-			to[result] = 0;
-		}
-	}
-	else
-#endif							/* WIN32 */
-	{
-		/* mbstowcs requires ending '\0' */
-		char	   *str = pnstrdup(from, fromlen);
-
-		if (locale == (pg_locale_t) 0)
-		{
-			/* Use mbstowcs directly for the default locale */
-			result = mbstowcs(to, str, tolen);
-		}
-		else
-		{
-			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
-		}
-
-		pfree(str);
-	}
-
-	if (result == -1)
-	{
-		/*
-		 * Invalid multibyte character encountered.  We try to give a useful
-		 * error message by letting pg_verifymbstr check the string.  But it's
-		 * possible that the string is OK to us, and not OK to mbstowcs ---
-		 * this suggests that the LC_CTYPE locale is different from the
-		 * database encoding.  Give a generic error message if pg_verifymbstr
-		 * can't find anything wrong.
-		 */
-		pg_verifymbstr(from, fromlen, false);	/* might not return */
-		/* but if it does ... */
-		ereport(ERROR,
-				(errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
-				 errmsg("invalid multibyte character for locale"),
-				 errhint("The server's LC_CTYPE locale is probably incompatible with the database encoding.")));
-	}
-
-	return result;
-}
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
new file mode 100644
index 0000000000..61066ee21a
--- /dev/null
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -0,0 +1,500 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL locale utilities for libc
+ *
+ * Portions Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * src/backend/utils/adt/pg_locale_libc.c
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_collation.h"
+#include "mb/pg_wchar.h"
+#include "utils/formatting.h"
+#include "utils/pg_locale.h"
+
+/*
+ * This should be large enough that most strings will fit, but small enough
+ * that we feel comfortable putting it on the stack
+ */
+#define		TEXTBUFLEN			1024
+
+extern locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
+extern int	strncoll_libc(const char *arg1, ssize_t len1,
+						  const char *arg2, ssize_t len2,
+						  pg_locale_t locale);
+extern size_t strnxfrm_libc(char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+
+static void report_newlocale_failure(const char *localename);
+
+#ifdef WIN32
+static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
+									 const char *arg2, ssize_t len2,
+									 pg_locale_t locale);
+#endif
+
+/*
+ * Create a locale_t with the given collation and ctype.
+ *
+ * The "C" and "POSIX" locales are not actually handled by libc, so return
+ * NULL.
+ *
+ * Ensure that no path leaks a locale_t.
+ */
+locale_t
+make_libc_collator(const char *collate, const char *ctype)
+{
+	locale_t	loc = 0;
+
+	if (strcmp(collate, ctype) == 0)
+	{
+		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
+		{
+			/* Normal case where they're the same */
+			errno = 0;
+#ifndef WIN32
+			loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, collate,
+							NULL);
+#else
+			loc = _create_locale(LC_ALL, collate);
+#endif
+			if (!loc)
+				report_newlocale_failure(collate);
+		}
+	}
+	else
+	{
+#ifndef WIN32
+		/* We need two newlocale() steps */
+		locale_t	loc1 = 0;
+
+		if (strcmp(collate, "C") != 0 && strcmp(collate, "POSIX") != 0)
+		{
+			errno = 0;
+			loc1 = newlocale(LC_COLLATE_MASK, collate, NULL);
+			if (!loc1)
+				report_newlocale_failure(collate);
+		}
+
+		if (strcmp(ctype, "C") != 0 && strcmp(ctype, "POSIX") != 0)
+		{
+			errno = 0;
+			loc = newlocale(LC_CTYPE_MASK, ctype, loc1);
+			if (!loc)
+			{
+				if (loc1)
+					freelocale(loc1);
+				report_newlocale_failure(ctype);
+			}
+		}
+		else
+			loc = loc1;
+#else
+
+		/*
+		 * XXX The _create_locale() API doesn't appear to support this. Could
+		 * perhaps be worked around by changing pg_locale_t to contain two
+		 * separate fields.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("collations with different collate and ctype values are not supported on this platform")));
+#endif
+	}
+
+	return loc;
+}
+
+/*
+ * strncoll_libc
+ *
+ * NUL-terminate arguments, if necessary, and pass to strcoll_l().
+ *
+ * An input string length of -1 means that it's already NUL-terminated.
+ */
+int
+strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+			  pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	size_t		bufsize1 = (len1 == -1) ? 0 : len1 + 1;
+	size_t		bufsize2 = (len2 == -1) ? 0 : len2 + 1;
+	const char *arg1n;
+	const char *arg2n;
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_LIBC);
+
+#ifdef WIN32
+	/* check for this case before doing the work for nul-termination */
+	if (GetDatabaseEncoding() == PG_UTF8)
+		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
+#endif							/* WIN32 */
+
+	if (bufsize1 + bufsize2 > TEXTBUFLEN)
+		buf = palloc(bufsize1 + bufsize2);
+
+	/* nul-terminate arguments if necessary */
+	if (len1 == -1)
+	{
+		arg1n = arg1;
+	}
+	else
+	{
+		char	   *buf1 = buf;
+
+		memcpy(buf1, arg1, len1);
+		buf1[len1] = '\0';
+		arg1n = buf1;
+	}
+
+	if (len2 == -1)
+	{
+		arg2n = arg2;
+	}
+	else
+	{
+		char	   *buf2 = buf + bufsize1;
+
+		memcpy(buf2, arg2, len2);
+		buf2[len2] = '\0';
+		arg2n = buf2;
+	}
+
+	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	return result;
+}
+
+/*
+ * strnxfrm_libc
+ *
+ * NUL-terminate src, if necessary, and pass to strxfrm_l().
+ *
+ * A source length of -1 means that it's already NUL-terminated.
+ */
+size_t
+strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			  pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	size_t		bufsize = srclen + 1;
+	size_t		result;
+
+	Assert(locale->provider == COLLPROVIDER_LIBC);
+
+	if (srclen == -1)
+		return strxfrm_l(dest, src, destsize, locale->info.lt);
+
+	if (bufsize > TEXTBUFLEN)
+		buf = palloc(bufsize);
+
+	/* nul-terminate argument */
+	memcpy(buf, src, srclen);
+	buf[srclen] = '\0';
+
+	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	/* if dest is defined, it should be nul-terminated */
+	Assert(result >= destsize || dest[result] == '\0');
+
+	return result;
+}
+
+/*
+ * strncoll_libc_win32_utf8
+ *
+ * Win32 does not have UTF-8. Convert UTF8 arguments to wide characters and
+ * invoke wcscoll_l().
+ *
+ * An input string length of -1 means that it's NUL-terminated.
+ */
+#ifdef WIN32
+static int
+strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
+						 ssize_t len2, pg_locale_t locale)
+{
+	char		sbuf[TEXTBUFLEN];
+	char	   *buf = sbuf;
+	char	   *a1p,
+			   *a2p;
+	int			a1len;
+	int			a2len;
+	int			r;
+	int			result;
+
+	Assert(locale->provider == COLLPROVIDER_LIBC);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (len1 == -1)
+		len1 = strlen(arg1);
+	if (len2 == -1)
+		len2 = strlen(arg2);
+
+	a1len = len1 * 2 + 2;
+	a2len = len2 * 2 + 2;
+
+	if (a1len + a2len > TEXTBUFLEN)
+		buf = palloc(a1len + a2len);
+
+	a1p = buf;
+	a2p = buf + a1len;
+
+	/* API does not work for zero-length input */
+	if (len1 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg1, len1,
+								(LPWSTR) a1p, a1len / 2);
+		if (!r)
+			ereport(ERROR,
+					(errmsg("could not convert string to UTF-16: error code %lu",
+							GetLastError())));
+	}
+	((LPWSTR) a1p)[r] = 0;
+
+	if (len2 == 0)
+		r = 0;
+	else
+	{
+		r = MultiByteToWideChar(CP_UTF8, 0, arg2, len2,
+								(LPWSTR) a2p, a2len / 2);
+		if (!r)
+			ereport(ERROR,
+					(errmsg("could not convert string to UTF-16: error code %lu",
+							GetLastError())));
+	}
+	((LPWSTR) a2p)[r] = 0;
+
+	errno = 0;
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
+		ereport(ERROR,
+				(errmsg("could not compare Unicode strings: %m")));
+
+	if (buf != sbuf)
+		pfree(buf);
+
+	return result;
+}
+#endif							/* WIN32 */
+
+/* simple subroutine for reporting errors from newlocale() */
+static void
+report_newlocale_failure(const char *localename)
+{
+	int			save_errno;
+
+	/*
+	 * Windows doesn't provide any useful error indication from
+	 * _create_locale(), and BSD-derived platforms don't seem to feel they
+	 * need to set errno either (even though POSIX is pretty clear that
+	 * newlocale should do so).  So, if errno hasn't been set, assume ENOENT
+	 * is what to report.
+	 */
+	if (errno == 0)
+		errno = ENOENT;
+
+	/*
+	 * ENOENT means "no such locale", not "no such file", so clarify that
+	 * errno with an errdetail message.
+	 */
+	save_errno = errno;			/* auxiliary funcs might change errno */
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			 errmsg("could not create locale \"%s\": %m",
+					localename),
+			 (save_errno == ENOENT ?
+			  errdetail("The operating system could not find any locale data for the locale name \"%s\".",
+						localename) : 0)));
+}
+
+/*
+ * POSIX doesn't define _l-variants of these functions, but several systems
+ * have them.  We provide our own replacements here.
+ */
+#ifndef HAVE_MBSTOWCS_L
+static size_t
+mbstowcs_l(wchar_t *dest, const char *src, size_t n, locale_t loc)
+{
+#ifdef WIN32
+	return _mbstowcs_l(dest, src, n, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = mbstowcs(dest, src, n);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+#ifndef HAVE_WCSTOMBS_L
+static size_t
+wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
+{
+#ifdef WIN32
+	return _wcstombs_l(dest, src, n, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = wcstombs(dest, src, n);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+
+/*
+ * These functions convert from/to libc's wchar_t, *not* pg_wchar_t.
+ * Therefore we keep them here rather than with the mbutils code.
+ */
+
+/*
+ * wchar2char --- convert wide characters to multibyte format
+ *
+ * This has the same API as the standard wcstombs_l() function; in particular,
+ * tolen is the maximum number of bytes to store at *to, and *from must be
+ * zero-terminated.  The output will be zero-terminated iff there is room.
+ */
+size_t
+wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
+{
+	size_t		result;
+
+	if (tolen == 0)
+		return 0;
+
+#ifdef WIN32
+
+	/*
+	 * On Windows, the "Unicode" locales assume UTF16 not UTF8 encoding, and
+	 * for some reason mbstowcs and wcstombs won't do this for us, so we use
+	 * MultiByteToWideChar().
+	 */
+	if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		result = WideCharToMultiByte(CP_UTF8, 0, from, -1, to, tolen,
+									 NULL, NULL);
+		/* A zero return is failure */
+		if (result <= 0)
+			result = -1;
+		else
+		{
+			Assert(result <= tolen);
+			/* Microsoft counts the zero terminator in the result */
+			result--;
+		}
+	}
+	else
+#endif							/* WIN32 */
+	if (locale == (pg_locale_t) 0)
+	{
+		/* Use wcstombs directly for the default locale */
+		result = wcstombs(to, from, tolen);
+	}
+	else
+	{
+		/* Use wcstombs_l for nondefault locales */
+		result = wcstombs_l(to, from, tolen, locale->info.lt);
+	}
+
+	return result;
+}
+
+/*
+ * char2wchar --- convert multibyte characters to wide characters
+ *
+ * This has almost the API of mbstowcs_l(), except that *from need not be
+ * null-terminated; instead, the number of input bytes is specified as
+ * fromlen.  Also, we ereport() rather than returning -1 for invalid
+ * input encoding.  tolen is the maximum number of wchar_t's to store at *to.
+ * The output will be zero-terminated iff there is room.
+ */
+size_t
+char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
+		   pg_locale_t locale)
+{
+	size_t		result;
+
+	if (tolen == 0)
+		return 0;
+
+#ifdef WIN32
+	/* See WIN32 "Unicode" comment above */
+	if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		/* Win32 API does not work for zero-length input */
+		if (fromlen == 0)
+			result = 0;
+		else
+		{
+			result = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen - 1);
+			/* A zero return is failure */
+			if (result == 0)
+				result = -1;
+		}
+
+		if (result != -1)
+		{
+			Assert(result < tolen);
+			/* Append trailing null wchar (MultiByteToWideChar() does not) */
+			to[result] = 0;
+		}
+	}
+	else
+#endif							/* WIN32 */
+	{
+		/* mbstowcs requires ending '\0' */
+		char	   *str = pnstrdup(from, fromlen);
+
+		if (locale == (pg_locale_t) 0)
+		{
+			/* Use mbstowcs directly for the default locale */
+			result = mbstowcs(to, str, tolen);
+		}
+		else
+		{
+			/* Use mbstowcs_l for nondefault locales */
+			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+		}
+
+		pfree(str);
+	}
+
+	if (result == -1)
+	{
+		/*
+		 * Invalid multibyte character encountered.  We try to give a useful
+		 * error message by letting pg_verifymbstr check the string.  But it's
+		 * possible that the string is OK to us, and not OK to mbstowcs ---
+		 * this suggests that the LC_CTYPE locale is different from the
+		 * database encoding.  Give a generic error message if pg_verifymbstr
+		 * can't find anything wrong.
+		 */
+		pg_verifymbstr(from, fromlen, false);	/* might not return */
+		/* but if it does ... */
+		ereport(ERROR,
+				(errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
+				 errmsg("invalid multibyte character for locale"),
+				 errhint("The server's LC_CTYPE locale is probably incompatible with the database encoding.")));
+	}
+
+	return result;
+}
-- 
2.34.1

v6-0003-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patchtext/x-patch; charset=UTF-8; name=v6-0003-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patchDownload

From 906849b84576c3af1f29fd081d98403430466220 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 14:58:52 -0700
Subject: [PATCH v6 03/11] Refactor the code to create a pg_locale_t into new
 function.

---
 src/backend/utils/adt/pg_locale.c | 297 ++++++++++++++----------------
 1 file changed, 140 insertions(+), 157 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 298c2a23e5..9bbb3420be 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1213,42 +1213,136 @@ IsoLocaleName(const char *winlocname)
 
 
 /*
- * Cache mechanism for collation information.
- *
- * Note that we currently lack any way to flush the cache.  Since we don't
- * support ALTER COLLATION, this is OK.  The worst case is that someone
- * drops a collation, and a useless cache entry hangs around in existing
- * backends.
+ * Create a new pg_locale_t struct for the given collation oid.
  */
-static collation_cache_entry *
-lookup_collation_cache(Oid collation)
+static pg_locale_t
+create_pg_locale(Oid collid, MemoryContext context)
 {
-	collation_cache_entry *cache_entry;
-	bool		found;
+	/* We haven't computed this yet in this session, so do it */
+	HeapTuple	tp;
+	Form_pg_collation collform;
+	pg_locale_t result;
+	Datum		datum;
+	bool		isnull;
 
-	Assert(OidIsValid(collation));
-	Assert(collation != DEFAULT_COLLATION_OID);
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	if (CollationCache == NULL)
+	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for collation %u", collid);
+	collform = (Form_pg_collation) GETSTRUCT(tp);
+
+	result->provider = collform->collprovider;
+	result->deterministic = collform->collisdeterministic;
+
+	if (collform->collprovider == COLLPROVIDER_BUILTIN)
 	{
-		CollationCacheContext = AllocSetContextCreate(TopMemoryContext,
-													  "collation cache",
-													  ALLOCSET_DEFAULT_SIZES);
-		CollationCache = collation_cache_create(CollationCacheContext,
-												16, NULL);
+		const char *locstr;
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+		locstr = TextDatumGetCString(datum);
+
+		result->collate_is_c = true;
+		result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+		builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+		result->info.builtin.locale = MemoryContextStrdup(context,
+														  locstr);
 	}
+	else if (collform->collprovider == COLLPROVIDER_ICU)
+	{
+#ifdef USE_ICU
+		const char *iculocstr;
+		const char *icurules;
 
-	cache_entry = collation_cache_insert(CollationCache, collation, &found);
-	if (!found)
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+		iculocstr = TextDatumGetCString(datum);
+
+		result->collate_is_c = false;
+		result->ctype_is_c = false;
+
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
+#else
+		/* could get here if a collation was created by a build with ICU */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#endif
+	}
+	else if (collform->collprovider == COLLPROVIDER_LIBC)
 	{
-		/*
-		 * Make sure cache entry is marked invalid, in case we fail before
-		 * setting things.
-		 */
-		cache_entry->locale = 0;
+		const char *collcollate;
+		const char *collctype;
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
+		collcollate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
+		collctype = TextDatumGetCString(datum);
+
+		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
+			(strcmp(collcollate, "POSIX") == 0);
+		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
+			(strcmp(collctype, "POSIX") == 0);
+
+		result->info.lt = make_libc_collator(collcollate, collctype);
+	}
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+
+	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
+							&isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+
+		collversionstr = TextDatumGetCString(datum);
+
+		if (collform->collprovider == COLLPROVIDER_LIBC)
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
+		else
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 TextDatumGetCString(datum));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE COLLATION
+			 * but the provider does not support versioning, or manually
+			 * creating a mess in the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was recorded",
+							NameStr(collform->collname))));
+		}
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
 	}
 
-	return cache_entry;
+	ReleaseSysCache(tp);
+
+	return result;
 }
 
 /*
@@ -1356,6 +1450,7 @@ pg_locale_t
 pg_newlocale_from_collation(Oid collid)
 {
 	collation_cache_entry *cache_entry;
+	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
 		return &default_locale;
@@ -1366,140 +1461,28 @@ pg_newlocale_from_collation(Oid collid)
 	if (last_collation_cache_oid == collid)
 		return last_collation_cache_locale;
 
-	cache_entry = lookup_collation_cache(collid);
-
-	if (cache_entry->locale == 0)
+	if (CollationCache == NULL)
 	{
-		/* We haven't computed this yet in this session, so do it */
-		HeapTuple	tp;
-		Form_pg_collation collform;
-		struct pg_locale_struct result;
-		pg_locale_t resultp;
-		Datum		datum;
-		bool		isnull;
-
-		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
-		if (!HeapTupleIsValid(tp))
-			elog(ERROR, "cache lookup failed for collation %u", collid);
-		collform = (Form_pg_collation) GETSTRUCT(tp);
-
-		/* We'll fill in the result struct locally before allocating memory */
-		memset(&result, 0, sizeof(result));
-		result.provider = collform->collprovider;
-		result.deterministic = collform->collisdeterministic;
-
-		if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		{
-			const char *locstr;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-			locstr = TextDatumGetCString(datum);
-
-			result.collate_is_c = true;
-			result.ctype_is_c = (strcmp(locstr, "C") == 0);
-
-			builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-			result.info.builtin.locale = MemoryContextStrdup(TopMemoryContext,
-															 locstr);
-		}
-		else if (collform->collprovider == COLLPROVIDER_ICU)
-		{
-#ifdef USE_ICU
-			const char *iculocstr;
-			const char *icurules;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-			iculocstr = TextDatumGetCString(datum);
-
-			result.collate_is_c = false;
-			result.ctype_is_c = false;
-
-			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-			if (!isnull)
-				icurules = TextDatumGetCString(datum);
-			else
-				icurules = NULL;
-
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
-			result.info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build")));
-#endif
-		}
-		else if (collform->collprovider == COLLPROVIDER_LIBC)
-		{
-			const char *collcollate;
-			const char *collctype;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-			collcollate = TextDatumGetCString(datum);
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-			collctype = TextDatumGetCString(datum);
-
-			result.collate_is_c = (strcmp(collcollate, "C") == 0) ||
-				(strcmp(collcollate, "POSIX") == 0);
-			result.ctype_is_c = (strcmp(collctype, "C") == 0) ||
-				(strcmp(collctype, "POSIX") == 0);
-
-			result.info.lt = make_libc_collator(collcollate, collctype);
-		}
-		else
-			/* shouldn't happen */
-			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-								&isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			collversionstr = TextDatumGetCString(datum);
-
-			if (collform->collprovider == COLLPROVIDER_LIBC)
-				datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-			else
-				datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider,
-															 TextDatumGetCString(datum));
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION but the provider does not support versioning, or
-				 * manually creating a mess in the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was recorded",
-								NameStr(collform->collname))));
-			}
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
-
-		ReleaseSysCache(tp);
+		CollationCacheContext = AllocSetContextCreate(TopMemoryContext,
+													  "collation cache",
+													  ALLOCSET_DEFAULT_SIZES);
+		CollationCache = collation_cache_create(CollationCacheContext,
+												16, NULL);
+	}
 
-		/* We'll keep the pg_locale_t structures in TopMemoryContext */
-		resultp = MemoryContextAlloc(TopMemoryContext, sizeof(*resultp));
-		*resultp = result;
+	cache_entry = collation_cache_insert(CollationCache, collid, &found);
+	if (!found)
+	{
+		/*
+		 * Make sure cache entry is marked invalid, in case we fail before
+		 * setting things.
+		 */
+		cache_entry->locale = 0;
+	}
 
-		cache_entry->locale = resultp;
+	if (cache_entry->locale == 0)
+	{
+		cache_entry->locale = create_pg_locale(collid, CollationCacheContext);
 	}
 
 	last_collation_cache_oid = collid;
-- 
2.34.1

v6-0004-Perform-provider-specific-initialization-code-in-.patchtext/x-patch; charset=UTF-8; name=v6-0004-Perform-provider-specific-initialization-code-in-.patchDownload

From 18b5b0055c6e59ed4739a626d59a88891f0ca382 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 15:49:32 -0700
Subject: [PATCH v6 04/11] Perform provider-specific initialization code in new
 functions.

---
 src/backend/utils/adt/pg_locale.c      | 199 ++++++++-----------------
 src/backend/utils/adt/pg_locale_icu.c  |  97 +++++++++++-
 src/backend/utils/adt/pg_locale_libc.c |  74 ++++++++-
 3 files changed, 227 insertions(+), 143 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9bbb3420be..0534a232a5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -89,10 +89,11 @@
 
 #define		MAX_L10N_DATA		80
 
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
+
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -104,8 +105,6 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  pg_locale_t locale);
 #endif
 
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -136,7 +135,7 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
-static struct pg_locale_struct default_locale;
+static pg_locale_t default_locale = NULL;
 
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
@@ -1211,6 +1210,51 @@ IsoLocaleName(const char *winlocname)
 
 #endif							/* WIN32 && LC_MESSAGES */
 
+static pg_locale_t
+create_pg_locale_builtin(Oid collid, MemoryContext context)
+{
+	const char *locstr;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+
+	builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	result->provider = COLLPROVIDER_BUILTIN;
+	result->deterministic = true;
+	result->collate_is_c = true;
+	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+	return result;
+}
 
 /*
  * Create a new pg_locale_t struct for the given collation oid.
@@ -1225,75 +1269,17 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Datum		datum;
 	bool		isnull;
 
-	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	result->provider = collform->collprovider;
-	result->deterministic = collform->collisdeterministic;
-
 	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-	{
-		const char *locstr;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		locstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = true;
-		result->ctype_is_c = (strcmp(locstr, "C") == 0);
-
-		builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-		result->info.builtin.locale = MemoryContextStrdup(context,
-														  locstr);
-	}
+		result = create_pg_locale_builtin(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		const char *iculocstr;
-		const char *icurules;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		iculocstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = false;
-		result->ctype_is_c = false;
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_LIBC)
-	{
-		const char *collcollate;
-		const char *collctype;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-		collctype = TextDatumGetCString(datum);
-
-		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
-			(strcmp(collcollate, "POSIX") == 0);
-		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
-			(strcmp(collctype, "POSIX") == 0);
-
-		result->info.lt = make_libc_collator(collcollate, collctype);
-	}
+		result = create_pg_locale_libc(collid, context);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
@@ -1353,7 +1339,9 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	Datum		datum;
+	pg_locale_t result;
+
+	Assert(default_locale == NULL);
 
 	/* Fetch our pg_database row normally, via syscache */
 	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
@@ -1362,80 +1350,21 @@ init_database_collation(void)
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
 	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-	{
-		char	   *datlocale;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		builtin_validate_locale(dbform->encoding, datlocale);
-
-		default_locale.collate_is_c = true;
-		default_locale.ctype_is_c = (strcmp(datlocale, "C") == 0);
-
-		default_locale.info.builtin.locale = MemoryContextStrdup(
-																 TopMemoryContext, datlocale);
-	}
+		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		char	   *datlocale;
-		char	   *icurules;
-		bool		isnull;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = false;
-		default_locale.ctype_is_c = false;
-
-		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		default_locale.info.icu.locale = MemoryContextStrdup(TopMemoryContext, datlocale);
-		default_locale.info.icu.ucol = make_icu_collator(datlocale, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+									  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-	{
-		const char *datcollate;
-		const char *datctype;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datcollate);
-		datcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
-		datctype = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = (strcmp(datcollate, "C") == 0) ||
-			(strcmp(datcollate, "POSIX") == 0);
-		default_locale.ctype_is_c = (strcmp(datctype, "C") == 0) ||
-			(strcmp(datctype, "POSIX") == 0);
-
-		default_locale.info.lt = make_libc_collator(datcollate, datctype);
-	}
+		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+									   TopMemoryContext);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
 
-
-	default_locale.provider = dbform->datlocprovider;
-
-	/*
-	 * Default locale is currently always deterministic.  Nondeterministic
-	 * locales currently don't support pattern matching, which would break a
-	 * lot of things if applied globally.
-	 */
-	default_locale.deterministic = true;
-
 	ReleaseSysCache(tup);
+
+	default_locale = result;
 }
 
 /*
@@ -1453,7 +1382,7 @@ pg_newlocale_from_collation(Oid collid)
 	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return &default_locale;
+		return default_locale;
 
 	if (!OidIsValid(collid))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index c91954787d..e3268f9f69 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -12,14 +12,20 @@
 #include "postgres.h"
 
 #ifdef USE_ICU
-
 #include <unicode/ucnv.h>
 #include <unicode/ustring.h>
+#endif
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * This should be large enough that most strings will fit, but small enough
@@ -27,9 +33,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+
+#ifdef USE_ICU
+
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -47,6 +55,8 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
  */
 static UConverter *icu_converter = NULL;
 
+static UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
 static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 								 const char *arg2, ssize_t len2,
 								 pg_locale_t locale);
@@ -61,6 +71,85 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+#endif
+
+pg_locale_t
+create_pg_locale_icu(Oid collid, MemoryContext context)
+{
+#ifdef USE_ICU
+	bool		deterministic;
+	const char *iculocstr;
+	const char *icurules = NULL;
+	UCollator  *collator;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+
+		/* default database collation is always deterministic */
+		deterministic = true;
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(DATABASEOID, tp,
+								Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		Form_pg_collation collform;
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		collform = (Form_pg_collation) GETSTRUCT(tp);
+		deterministic = collform->collisdeterministic;
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(COLLOID, tp,
+								Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+	collator = make_icu_collator(iculocstr, icurules);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+	result->info.icu.ucol = collator;
+	result->provider = COLLPROVIDER_ICU;
+	result->deterministic = deterministic;
+	result->collate_is_c = false;
+	result->ctype_is_c = false;
+
+	return result;
+#else
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build")));
+
+	return NULL;
+#endif
+}
+
+#ifdef USE_ICU
 
 /*
  * Wrapper around ucol_open() to handle API differences for older ICU
@@ -158,7 +247,7 @@ pg_ucol_open(const char *loc_str)
  *
  * Ensure that no path leaks a UCollator.
  */
-UCollator *
+static UCollator *
 make_icu_collator(const char *iculocstr, const char *icurules)
 {
 	if (!icurules)
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 61066ee21a..8736661111 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,10 +11,16 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * This should be large enough that most strings will fit, but small enough
@@ -22,15 +28,16 @@
  */
 #define		TEXTBUFLEN			1024
 
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
+
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
-
+static locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
 static void report_newlocale_failure(const char *localename);
 
 #ifdef WIN32
@@ -39,6 +46,65 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+pg_locale_t
+create_pg_locale_libc(Oid collid, MemoryContext context)
+{
+	const char *collate;
+	const char *ctype;
+	locale_t	loc;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+
+	loc = make_libc_collator(collate, ctype);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->provider = COLLPROVIDER_LIBC;
+	result->deterministic = true;
+	result->collate_is_c = (strcmp(collate, "C") == 0) ||
+		(strcmp(collate, "POSIX") == 0);
+	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
+		(strcmp(ctype, "POSIX") == 0);
+	result->info.lt = loc;
+
+	return result;
+}
+
 /*
  * Create a locale_t with the given collation and ctype.
  *
@@ -47,7 +113,7 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
  *
  * Ensure that no path leaks a locale_t.
  */
-locale_t
+static locale_t
 make_libc_collator(const char *collate, const char *ctype)
 {
 	locale_t	loc = 0;
-- 
2.34.1

v6-0005-Control-collation-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v6-0005-Control-collation-behavior-with-a-method-table.patchDownload

From cdf049433fed4b644c101e2dafebb29edb4003d0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 11:27:29 -0700
Subject: [PATCH v6 05/11] Control collation behavior with a method table.

Previously, behavior branched based on the provider.

A method table is less error prone and easier to hook.
---
 src/backend/utils/adt/pg_locale.c      | 121 +++-----------------
 src/backend/utils/adt/pg_locale_icu.c  | 147 +++++++++++++++----------
 src/backend/utils/adt/pg_locale_libc.c |  40 +++++--
 src/include/utils/pg_locale.h          |  33 ++++++
 4 files changed, 164 insertions(+), 177 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 0534a232a5..662113edc5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -94,24 +94,8 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
-						 const char *arg2, ssize_t len2,
-						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
-						   const char *src, ssize_t srclen,
-						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
-								  const char *src, ssize_t srclen,
-								  pg_locale_t locale);
 #endif
 
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
-						  const char *arg2, ssize_t len2,
-						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1535,19 +1519,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 int
 pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, -1, arg2, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, -1, arg2, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, -1, arg2, -1, locale);
 }
 
 /*
@@ -1568,51 +1540,25 @@ int
 pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 			pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, len1, arg2, len2, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, len1, arg2, len2, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, len1, arg2, len2, locale);
 }
 
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
  *
- * Unfortunately, it seems that strxfrm() for non-C collations is broken on
- * many common platforms; testing of multiple versions of glibc reveals that,
- * for many locales, strcoll() and strxfrm() do not return consistent
- * results. While no other libc other than Cygwin has so far been shown to
- * have a problem, we take the conservative course of action for right now and
- * disable this categorically.  (Users who are certain this isn't a problem on
- * their system can define TRUST_STRXFRM.)
  *
  * No similar problem is known for the ICU provider.
  */
 bool
 pg_strxfrm_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-#ifdef TRUST_STRXFRM
-		return true;
-#else
-		return false;
-#endif
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	/*
+	 * locale->collate->strnxfrm is still a required method, even if it may
+	 * have the wrong behavior, because the planner uses it for estimates in
+	 * some cases.
+	 */
+	return locale->collate->strxfrm_is_safe;
 }
 
 /*
@@ -1623,19 +1569,7 @@ pg_strxfrm_enabled(pg_locale_t locale)
 size_t
 pg_strxfrm(char *dest, const char *src, size_t destsize, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1661,19 +1595,7 @@ size_t
 pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, srclen, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, srclen, locale);
 }
 
 /*
@@ -1683,15 +1605,7 @@ pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 bool
 pg_strxfrm_prefix_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-		return false;
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	return (locale->collate->strnxfrm_prefix != NULL);
 }
 
 /*
@@ -1703,7 +1617,7 @@ size_t
 pg_strxfrm_prefix(char *dest, const char *src, size_t destsize,
 				  pg_locale_t locale)
 {
-	return pg_strnxfrm_prefix(dest, destsize, src, -1, locale);
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1728,16 +1642,7 @@ size_t
 pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 				   ssize_t srclen, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-#ifdef USE_ICU
-	if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_prefix_icu(dest, destsize, src, -1, locale);
-	else
-#endif
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index e3268f9f69..d9bc409c27 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -38,13 +38,14 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
+
+static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
+static size_t strnxfrm_icu(char *dest, size_t destsize,
 						   const char *src, ssize_t srclen,
 						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
@@ -57,12 +58,20 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
-static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-								 const char *arg2, ssize_t len2,
-								 pg_locale_t locale);
-static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-										  const char *src, ssize_t srclen,
-										  pg_locale_t locale);
+static int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+#ifdef HAVE_UCOL_STRCOLLUTF8
+static int	strncoll_icu_utf8(const char *arg1, ssize_t len1,
+							  const char *arg2, ssize_t len2,
+							  pg_locale_t locale);
+#endif
+static size_t strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+									   const char *src, ssize_t srclen,
+									   pg_locale_t locale);
 static void init_icu_converter(void);
 static size_t uchar_length(UConverter *converter,
 						   const char *str, int32_t len);
@@ -71,6 +80,25 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+
+static const struct collate_methods collate_methods_icu = {
+	.strncoll = strncoll_icu,
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu,
+	.strxfrm_is_safe = true,
+};
+
+static const struct collate_methods collate_methods_icu_utf8 = {
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	.strncoll = strncoll_icu_utf8,
+#else
+	.strncoll = strncoll_icu,
+#endif
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu_utf8,
+	.strxfrm_is_safe = true,
+};
+
 #endif
 
 pg_locale_t
@@ -137,6 +165,10 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
+	if (GetDatabaseEncoding() == PG_UTF8)
+		result->collate = &collate_methods_icu_utf8;
+	else
+		result->collate = &collate_methods_icu;
 
 	return result;
 #else
@@ -311,42 +343,36 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 }
 
 /*
- * strncoll_icu
+ * strncoll_icu_utf8
  *
  * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
  * database encoding. An argument length of -1 means the string is
  * NUL-terminated.
  */
+#ifdef HAVE_UCOL_STRCOLLUTF8
 int
-strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			 pg_locale_t locale)
+strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+				  pg_locale_t locale)
 {
 	int			result;
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		status = U_ZERO_ERROR;
-		result = ucol_strcollUTF8(locale->info.icu.ucol,
-								  arg1, len1,
-								  arg2, len2,
-								  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("collation failed: %s", u_errorName(status))));
-	}
-	else
-#endif
-	{
-		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
-	}
+	status = U_ZERO_ERROR;
+	result = ucol_strcollUTF8(locale->info.icu.ucol,
+							  arg1, len1,
+							  arg2, len2,
+							  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("collation failed: %s", u_errorName(status))));
 
 	return result;
 }
+#endif
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
@@ -397,37 +423,32 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
-strnxfrm_prefix_icu(char *dest, size_t destsize,
-					const char *src, ssize_t srclen,
-					pg_locale_t locale)
+strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+						 const char *src, ssize_t srclen,
+						 pg_locale_t locale)
 {
 	size_t		result;
+	UCharIterator iter;
+	uint32_t	state[2];
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UCharIterator iter;
-		uint32_t	state[2];
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		uiter_setUTF8(&iter, src, srclen);
-		state[0] = state[1] = 0;	/* won't need that again */
-		status = U_ZERO_ERROR;
-		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
-									  &iter,
-									  state,
-									  (uint8_t *) dest,
-									  destsize,
-									  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("sort key generation failed: %s",
-							u_errorName(status))));
-	}
-	else
-		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
-											 locale);
+	uiter_setUTF8(&iter, src, srclen);
+	state[0] = state[1] = 0;	/* won't need that again */
+	status = U_ZERO_ERROR;
+	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+								  &iter,
+								  state,
+								  (uint8_t *) dest,
+								  destsize,
+								  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("sort key generation failed: %s",
+						u_errorName(status))));
 
 	return result;
 }
@@ -502,7 +523,7 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 }
 
 /*
- * strncoll_icu_no_utf8
+ * strncoll_icu
  *
  * Convert the arguments from the database encoding to UChar strings, then
  * call ucol_strcoll(). An argument length of -1 means that the string is
@@ -512,8 +533,8 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
  * caller should call that instead.
  */
 static int
-strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-					 const char *arg2, ssize_t len2, pg_locale_t locale)
+strncoll_icu(const char *arg1, ssize_t len1,
+			 const char *arg2, ssize_t len2, pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -526,6 +547,8 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 	int			result;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 #endif
@@ -559,9 +582,9 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 static size_t
-strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale)
+strnxfrm_prefix_icu(char *dest, size_t destsize,
+					const char *src, ssize_t srclen,
+					pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -574,6 +597,8 @@ strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
 	Size		result_bsize;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	init_icu_converter();
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 8736661111..5fee68edca 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -30,10 +30,10 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
+static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
+static size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 static locale_t make_libc_collator(const char *collate,
@@ -46,6 +46,27 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static const struct collate_methods collate_methods_libc = {
+	.strncoll = strncoll_libc,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+
+	/*
+	 * Unfortunately, it seems that strxfrm() for non-C collations is broken
+	 * on many common platforms; testing of multiple versions of glibc reveals
+	 * that, for many locales, strcoll() and strxfrm() do not return
+	 * consistent results. While no other libc other than Cygwin has so far
+	 * been shown to have a problem, we take the conservative course of action
+	 * for right now and disable this categorically.  (Users who are certain
+	 * this isn't a problem on their system can define TRUST_STRXFRM.)
+	 */
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -101,6 +122,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
 	result->info.lt = loc;
+	if (!result->collate_is_c)
+	{
+#ifdef WIN32
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->collate = &collate_methods_libc_win32_utf8;
+		else
+#endif
+			result->collate = &collate_methods_libc;
+	}
 
 	return result;
 }
@@ -198,12 +228,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 
 	Assert(locale->provider == COLLPROVIDER_LIBC);
 
-#ifdef WIN32
-	/* check for this case before doing the work for nul-termination */
-	if (GetDatabaseEncoding() == PG_UTF8)
-		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
-#endif							/* WIN32 */
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 37ecf95193..2f05dffcdd 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -60,6 +60,36 @@ extern struct lconv *PGLC_localeconv(void);
 extern void cache_locale_time(void);
 
 
+struct pg_locale_struct;
+typedef struct pg_locale_struct *pg_locale_t;
+
+/* methods that define collation behavior */
+struct collate_methods
+{
+	/* required */
+	int			(*strncoll) (const char *arg1, ssize_t len1,
+							 const char *arg2, ssize_t len2,
+							 pg_locale_t locale);
+
+	/* required */
+	size_t		(*strnxfrm) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* optional */
+	size_t		(*strnxfrm_prefix) (char *dest, size_t destsize,
+									const char *src, ssize_t srclen,
+									pg_locale_t locale);
+
+	/*
+	 * If the strnxfrm method is not trusted to return the correct results,
+	 * set strxfrm_is_safe to false. It set to false, the method will not be
+	 * used in most cases, but the planner still expects it to be there for
+	 * estimation purposes (where incorrect results are acceptable).
+	 */
+	bool		strxfrm_is_safe;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -82,6 +112,9 @@ struct pg_locale_struct
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
+
+	const struct collate_methods *collate;	/* NULL if collate_is_c */
+
 	union
 	{
 		struct
-- 
2.34.1

v6-0006-Control-case-mapping-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v6-0006-Control-case-mapping-behavior-with-a-method-table.patchDownload

From 71764f264bcffd0be6b101946913719f8277f36d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 12:12:51 -0700
Subject: [PATCH v6 06/11] Control case mapping behavior with a method table.

Previously, case mapping (LOWER(), INITCAP(), UPPER()) behavior
branched based on the provider.

A method table is less error-prone and easier to hook.
---
 src/backend/utils/adt/formatting.c     | 445 ++++---------------------
 src/backend/utils/adt/pg_locale.c      | 101 ++++++
 src/backend/utils/adt/pg_locale_icu.c  | 140 +++++++-
 src/backend/utils/adt/pg_locale_libc.c | 302 +++++++++++++++++
 src/include/utils/pg_locale.h          |  29 +-
 5 files changed, 619 insertions(+), 398 deletions(-)

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 85a7dd4561..6a0571f93e 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1570,52 +1570,6 @@ str_numth(char *dest, char *num, int type)
  *			upper/lower/initcap functions
  *****************************************************************************/
 
-#ifdef USE_ICU
-
-typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
-									 const UChar *src, int32_t srcLength,
-									 const char *locale,
-									 UErrorCode *pErrorCode);
-
-static int32_t
-icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
-				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
-{
-	UErrorCode	status;
-	int32_t		len_dest;
-
-	len_dest = len_source;		/* try first with same length */
-	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-	status = U_ZERO_ERROR;
-	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
-	if (status == U_BUFFER_OVERFLOW_ERROR)
-	{
-		/* try again with adjusted length */
-		pfree(*buff_dest);
-		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-		status = U_ZERO_ERROR;
-		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
-	}
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("case conversion failed: %s", u_errorName(status))));
-	return len_dest;
-}
-
-static int32_t
-u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
-						const UChar *src, int32_t srcLength,
-						const char *locale,
-						UErrorCode *pErrorCode)
-{
-	return u_strToTitle(dest, destCapacity, src, srcLength,
-						NULL, locale, pErrorCode);
-}
-
-#endif							/* USE_ICU */
-
 /*
  * If the system provides the needed functions for wide-character manipulation
  * (which are all standardized by C99), then we implement upper/lower/initcap
@@ -1663,101 +1617,28 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar;
-			int32_t		len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToLower, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strlower(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strlower(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that tolower_l() will not be so broken as
-				 * to need an isupper_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
@@ -1800,147 +1681,33 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToUpper, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
-		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strupper(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strupper(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l() will not be so broken as
-				 * to need an islower_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
 }
 
-struct WordBoundaryState
-{
-	const char *str;
-	size_t		len;
-	size_t		offset;
-	bool		init;
-	bool		prev_alnum;
-};
-
-/*
- * Simple word boundary iterator that draws boundaries each time the result of
- * pg_u_isalnum() changes.
- */
-static size_t
-initcap_wbnext(void *state)
-{
-	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
-
-	while (wbstate->offset < wbstate->len &&
-		   wbstate->str[wbstate->offset] != '\0')
-	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
-										wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
-
-		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
-		{
-			size_t		prev_offset = wbstate->offset;
-
-			wbstate->init = true;
-			wbstate->offset += unicode_utf8len(u);
-			wbstate->prev_alnum = curr_alnum;
-			return prev_offset;
-		}
-
-		wbstate->offset += unicode_utf8len(u);
-	}
-
-	return wbstate->len;
-}
-
 /*
  * collation-aware, wide-character-aware initcap function
  *
@@ -1951,7 +1718,6 @@ char *
 str_initcap(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
-	int			wasalnum = false;
 	pg_locale_t mylocale;
 
 	if (!buff)
@@ -1979,125 +1745,28 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToTitle_default_BI, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-			struct WordBoundaryState wbstate = {
-				.str = src,
-				.len = srclen,
-				.offset = 0,
-				.init = false,
-				.prev_alnum = false,
-			};
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strtitle(dst, dstsize, src, srclen,
-									  initcap_wbnext, &wbstate);
-			if (needed + 1 > dstsize)
-			{
-				/* reset iterator */
-				wbstate.offset = 0;
-				wbstate.init = false;
-
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strtitle(dst, dstsize, src, srclen,
-										  initcap_wbnext, &wbstate);
-				Assert(needed + 1 == dstsize);
-			}
-
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-				{
-					if (wasalnum)
-						workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
-					else
-						workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-					wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
-				}
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
 
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l()/tolower_l() will not be so
-				 * broken as to need guard tests.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (wasalnum)
-						*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-					else
-						*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-					wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 662113edc5..05a7a09887 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -59,6 +59,8 @@
 #include "catalog/pg_database.h"
 #include "common/hashfn.h"
 #include "common/string.h"
+#include "common/unicode_case.h"
+#include "common/unicode_category.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -164,6 +166,83 @@ static pg_locale_t last_collation_cache_locale = NULL;
 static char *IsoLocaleName(const char *);
 #endif
 
+struct WordBoundaryState
+{
+	const char *str;
+	size_t		len;
+	size_t		offset;
+	bool		init;
+	bool		prev_alnum;
+};
+
+/*
+ * Simple word boundary iterator that draws boundaries each time the result of
+ * pg_u_isalnum() changes.
+ */
+static size_t
+initcap_wbnext(void *state)
+{
+	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
+
+	while (wbstate->offset < wbstate->len &&
+		   wbstate->str[wbstate->offset] != '\0')
+	{
+		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+										wbstate->offset);
+		bool		curr_alnum = pg_u_isalnum(u, true);
+
+		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
+		{
+			size_t		prev_offset = wbstate->offset;
+
+			wbstate->init = true;
+			wbstate->offset += unicode_utf8len(u);
+			wbstate->prev_alnum = curr_alnum;
+			return prev_offset;
+		}
+
+		wbstate->offset += unicode_utf8len(u);
+	}
+
+	return wbstate->len;
+}
+
+static size_t
+strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strlower(dest, destsize, src, srclen);
+}
+
+static size_t
+strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	struct WordBoundaryState wbstate = {
+		.str = src,
+		.len = srclen,
+		.offset = 0,
+		.init = false,
+		.prev_alnum = false,
+	};
+
+	return unicode_strtitle(dest, destsize, src, srclen,
+							initcap_wbnext, &wbstate);
+}
+
+static size_t
+strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strupper(dest, destsize, src, srclen);
+}
+
+static const struct casemap_methods casemap_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+};
+
 /*
  * pg_perm_setlocale
  *
@@ -1236,6 +1315,7 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	result->casemap = &casemap_methods_builtin;
 
 	return result;
 }
@@ -1511,6 +1591,27 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+size_t
+pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strlower(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strtitle(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strupper(dst, dstsize, src, srclen, locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index d9bc409c27..1d13f2daa3 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -49,6 +49,11 @@ static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
+typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
+									 const UChar *src, int32_t srcLength,
+									 const char *locale,
+									 UErrorCode *pErrorCode);
+
 /*
  * Converter object for converting between ICU's UChar strings and C strings
  * in database encoding.  Since the database encoding doesn't change, we only
@@ -58,6 +63,16 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
+
+static size_t strlower_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strtitle_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strupper_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -78,8 +93,19 @@ static size_t uchar_length(UConverter *converter,
 static int32_t uchar_convert(UConverter *converter,
 							 UChar *dest, int32_t destlen,
 							 const char *src, int32_t srclen);
+static int32_t icu_to_uchar(UChar **buff_uchar, const char *buff,
+							size_t nbytes);
+static size_t icu_from_uchar(char *dest, size_t destsize,
+							 const UChar *buff_uchar, int32_t len_uchar);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+static int32_t icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+								UChar **buff_dest, UChar *buff_source,
+								int32_t len_source);
+static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+									   const UChar *src, int32_t srcLength,
+									   const char *locale,
+									   UErrorCode *pErrorCode);
 
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
@@ -99,6 +125,11 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static const struct casemap_methods casemap_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+};
 #endif
 
 pg_locale_t
@@ -169,6 +200,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->casemap = &casemap_methods_icu;
 
 	return result;
 #else
@@ -342,6 +374,66 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
+static size_t
+strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToLower, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToTitle_default_BI, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToUpper, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
 /*
  * strncoll_icu_utf8
  *
@@ -465,7 +557,7 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
  * The result string is nul-terminated, though most callers rely on the
  * result length instead.
  */
-int32_t
+static int32_t
 icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
 {
 	int32_t		len_uchar;
@@ -492,8 +584,8 @@ icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
  *
  * The result string is nul-terminated.
  */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+static size_t
+icu_from_uchar(char *dest, size_t destsize, const UChar *buff_uchar, int32_t len_uchar)
 {
 	UErrorCode	status;
 	int32_t		len_result;
@@ -508,10 +600,11 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 				(errmsg("%s failed: %s", "ucnv_fromUChars",
 						u_errorName(status))));
 
-	*result = palloc(len_result + 1);
+	if (len_result + 1 > destsize)
+		return len_result;
 
 	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+	len_result = ucnv_fromUChars(icu_converter, dest, len_result + 1,
 								 buff_uchar, len_uchar, &status);
 	if (U_FAILURE(status) ||
 		status == U_STRING_NOT_TERMINATED_WARNING)
@@ -522,6 +615,43 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	return len_result;
 }
 
+static int32_t
+icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
+{
+	UErrorCode	status;
+	int32_t		len_dest;
+
+	len_dest = len_source;		/* try first with same length */
+	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+	status = U_ZERO_ERROR;
+	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+					mylocale->info.icu.locale, &status);
+	if (status == U_BUFFER_OVERFLOW_ERROR)
+	{
+		/* try again with adjusted length */
+		pfree(*buff_dest);
+		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+		status = U_ZERO_ERROR;
+		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+						mylocale->info.icu.locale, &status);
+	}
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("case conversion failed: %s", u_errorName(status))));
+	return len_dest;
+}
+
+static int32_t
+u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+						const UChar *src, int32_t srcLength,
+						const char *locale,
+						UErrorCode *pErrorCode)
+{
+	return u_strToTitle(dest, destCapacity, src, srcLength,
+						NULL, locale, pErrorCode);
+}
+
 /*
  * strncoll_icu
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 5fee68edca..bdf8b71274 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,6 +11,9 @@
 
 #include "postgres.h"
 
+#include <limits.h>
+#include <wctype.h>
+
 #include "access/htup_details.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
@@ -46,6 +49,25 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static size_t strlower_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strlower_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -67,6 +89,279 @@ static const struct collate_methods collate_methods_libc = {
 #endif
 };
 
+#ifdef WIN32
+static const struct collate_methods collate_methods_libc_win32_utf8 = {
+	.strncoll = strncoll_libc_win32_utf8,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+#endif
+
+static const struct casemap_methods casemap_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+};
+
+static const struct casemap_methods casemap_methods_libc_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+};
+
+static size_t
+strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		if (srclen + 1 > destsize)
+			return srclen;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that tolower_l() will not be so broken as to need
+		 * an isupper_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = tolower_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		int			wasalnum = false;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l()/tolower_l() will not be so broken
+		 * as to need guard tests.  When using the default collation, we apply
+		 * the traditional Postgres behavior that forces ASCII-style treatment
+		 * of I/i, but in non-default collations you get exactly what the
+		 * collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (wasalnum)
+				*p = tolower_l((unsigned char) *p, loc);
+			else
+				*p = toupper_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, loc);
+		}
+	}
+
+	return srclen;
+}
+
+static size_t
+strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	int			wasalnum = false;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+	{
+		if (wasalnum)
+			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		else
+			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		wasalnum = iswalnum_l(workspace[curr_char], loc);
+	}
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l() will not be so broken as to need
+		 * an islower_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = toupper_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -131,6 +426,13 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (pg_database_encoding_max_length() > 1)
+			result->casemap = &casemap_methods_libc_mb;
+		else
+			result->casemap = &casemap_methods_libc_sb;
+	}
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 2f05dffcdd..bbc10e0c3d 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -90,6 +90,20 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+/* methods that define string case mapping behavior */
+struct casemap_methods
+{
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -114,6 +128,7 @@ struct pg_locale_struct
 	bool		ctype_is_c;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct casemap_methods *casemap;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -138,6 +153,15 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern size_t pg_strlower(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strtitle(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strupper(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
@@ -157,11 +181,6 @@ extern const char *builtin_validate_locale(int encoding, const char *locale);
 extern void icu_validate_locale(const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
-#ifdef USE_ICU
-extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
-extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
-
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
 						 pg_locale_t locale);
-- 
2.34.1

v6-0007-Control-ctype-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v6-0007-Control-ctype-behavior-with-a-method-table.patchDownload

From 54758b0e0e17fcec189e8c18c8cc2f0a1823f406 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 14:30:07 -0700
Subject: [PATCH v6 07/11] Control ctype behavior with a method table.

Previously, ctype behavior (pattern matching) behavior branched based
on the provider.

A method table is less error-prone and easier to hook.
---
 src/backend/regex/regc_pg_locale.c     | 388 +++++--------------------
 src/backend/utils/adt/like.c           |  22 +-
 src/backend/utils/adt/like_support.c   |   9 +-
 src/backend/utils/adt/pg_locale.c      | 101 +++++++
 src/backend/utils/adt/pg_locale_icu.c  |  52 ++++
 src/backend/utils/adt/pg_locale_libc.c | 158 ++++++++++
 src/include/utils/pg_locale.h          |  46 +++
 src/tools/pgindent/typedefs.list       |   1 -
 8 files changed, 448 insertions(+), 329 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index b75784b6ce..e898634fdf 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,33 +63,18 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 static Oid	pg_regex_collation;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
-#define PG_ISDIGIT	0x01
-#define PG_ISALPHA	0x02
-#define PG_ISALNUM	(PG_ISDIGIT | PG_ISALPHA)
-#define PG_ISUPPER	0x04
-#define PG_ISLOWER	0x08
-#define PG_ISGRAPH	0x10
-#define PG_ISPRINT	0x20
-#define PG_ISPUNCT	0x40
-#define PG_ISSPACE	0x80
-
-static const unsigned char pg_char_properties[128] = {
+static const unsigned char char_properties_tbl[128] = {
 	 /* NUL */ 0,
 	 /* ^A */ 0,
 	 /* ^B */ 0,
@@ -232,7 +217,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -253,8 +237,8 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
 		collation = C_COLLATION_OID;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -271,32 +255,11 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
+			locale = &dummy_c_locale;
 			collation = C_COLLATION_OID;
 		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
-		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 	pg_regex_collation = collation;
 }
@@ -304,82 +267,31 @@ pg_set_regex_collation(Oid collation)
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISDIGIT));
+	else
+		return char_properties(c, PG_ISDIGIT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALPHA));
+	else
+		return char_properties(c, PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALNUM));
+	else
+		return char_properties(c, PG_ISDIGIT | PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
@@ -394,219 +306,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISUPPER));
+	else
+		return char_properties(c, PG_ISUPPER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISLOWER));
+	else
+		return char_properties(c, PG_ISLOWER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISGRAPH));
+	else
+		return char_properties(c, PG_ISGRAPH, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPRINT));
+	else
+		return char_properties(c, PG_ISPRINT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPUNCT));
+	else
+		return char_properties(c, PG_ISPUNCT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISSPACE));
+	else
+		return char_properties(c, PG_ISSPACE, pg_regex_locale) != 0;
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -732,37 +512,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 0152723b2a..5b679bcad8 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -96,7 +96,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	if (locale->ctype_is_c)
 		return pg_ascii_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -201,7 +201,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -216,14 +226,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 79c4ddc757..bf718f1a3d 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1498,15 +1498,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return isalpha_l((unsigned char) c, locale->info.lt);
 	else
-		return isalpha((unsigned char) c);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 05a7a09887..51f8a7dc61 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -243,6 +243,58 @@ static const struct casemap_methods casemap_methods_builtin = {
 	.strupper = strupper_builtin,
 };
 
+static int
+char_properties_builtin(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && pg_u_isdigit(wc, true))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && pg_u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && pg_u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && pg_u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && pg_u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && pg_u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && pg_u_ispunct(wc, true))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && pg_u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.char_properties = char_properties_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
+
 /*
  * pg_perm_setlocale
  *
@@ -1316,6 +1368,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
 	result->casemap = &casemap_methods_builtin;
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
@@ -1746,6 +1800,53 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_properties()
+ *
+ * Out of the properties specified in the given mask, return a new mask of the
+ * properties true for the given character.
+ */
+int
+char_properties(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	return locale->ctype->char_properties(wc, mask, locale);
+}
+
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 1d13f2daa3..1993f92af9 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -107,6 +107,50 @@ static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 									   const char *locale,
 									   UErrorCode *pErrorCode);
 
+static int
+char_properties_icu(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && u_isdigit(wc))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && u_ispunct(wc))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -130,6 +174,13 @@ static const struct casemap_methods casemap_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 };
+
+static const struct ctype_methods ctype_methods_icu = {
+	.char_properties = char_properties_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -201,6 +252,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	else
 		result->collate = &collate_methods_icu;
 	result->casemap = &casemap_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index bdf8b71274..4790e5fc8f 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -68,6 +68,15 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static int	char_properties_libc_1byte(pg_wchar wc, int mask,
+									   pg_locale_t locale);
+static int	char_properties_libc_wide(pg_wchar wc, int mask,
+									  pg_locale_t locale);
+static pg_wchar toupper_libc_1byte(pg_wchar wc, pg_locale_t locale);
+static pg_wchar toupper_libc_wide(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_1byte(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_wide(pg_wchar wc, pg_locale_t locale);
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -102,6 +111,24 @@ static const struct collate_methods collate_methods_libc_win32_utf8 = {
 };
 #endif
 
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
 static const struct casemap_methods casemap_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
@@ -114,6 +141,23 @@ static const struct casemap_methods casemap_methods_libc_mb = {
 	.strupper = strupper_libc_mb,
 };
 
+static const struct ctype_methods ctype_methods_libc_1byte = {
+	.char_properties = char_properties_libc_1byte,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_1byte,
+	.wc_tolower = tolower_libc_1byte,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_wide = {
+	.char_properties = char_properties_libc_wide,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_wide,
+	.wc_tolower = tolower_libc_wide,
+};
+
 static size_t
 strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
@@ -433,6 +477,13 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		else
 			result->casemap = &casemap_methods_libc_sb;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_wide;
+		else
+			result->ctype = &ctype_methods_libc_1byte;
+	}
 
 	return result;
 }
@@ -716,6 +767,113 @@ report_newlocale_failure(const char *localename)
 						localename) : 0)));
 }
 
+static int
+char_properties_libc_1byte(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc > (pg_wchar) UCHAR_MAX)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static int
+char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	/* if wchar_t cannot represent the value, just return 0 */
+	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static pg_wchar
+toupper_libc_1byte(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_wide(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_1byte(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_wide(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index bbc10e0c3d..a5abf48bff 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,10 +12,25 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
 
+/*
+ * Character properties for regular expressions.
+ */
+#define PG_ISDIGIT     0x01
+#define PG_ISALPHA     0x02
+#define PG_ISALNUM     (PG_ISDIGIT | PG_ISALPHA)
+#define PG_ISUPPER     0x04
+#define PG_ISLOWER     0x08
+#define PG_ISGRAPH     0x10
+#define PG_ISPRINT     0x20
+#define PG_ISPUNCT     0x40
+#define PG_ISSPACE     0x80
+
 #ifdef USE_ICU
 /*
  * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
@@ -104,6 +119,32 @@ struct casemap_methods
 							 pg_locale_t locale);
 };
 
+struct ctype_methods
+{
+	/* required */
+	int			(*char_properties) (pg_wchar wc, int mask, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/* required */
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -129,6 +170,7 @@ struct pg_locale_struct
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct casemap_methods *casemap;	/* NULL if ctype_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -153,6 +195,10 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern int	char_properties(pg_wchar wc, int mask, pg_locale_t locale);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dest, size_t destsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a65e1c07c5..65f4489dda 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1821,7 +1821,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.34.1

v6-0008-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v6-0008-Remove-provider-field-from-pg_locale_t.patchDownload

From 44842a7e079d6d4e48e7153d4bd0e7356d7a3e6a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v6 08/11] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale.c      |  1 -
 src/backend/utils/adt/pg_locale_icu.c  | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c |  6 ------
 src/include/utils/pg_locale.h          |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 51f8a7dc61..5a06c2ec58 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1363,7 +1363,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 1993f92af9..e7ef7f4c09 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -243,7 +243,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -501,8 +500,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -530,8 +527,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -576,8 +571,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -728,8 +721,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -778,8 +769,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 4790e5fc8f..77220e134a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -454,7 +454,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -579,8 +578,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -635,8 +632,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -680,7 +675,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a5abf48bff..deb035cfd0 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -163,7 +163,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.34.1

v6-0009-Make-provider-data-in-pg_locale_t-an-opaque-point.patchtext/x-patch; charset=UTF-8; name=v6-0009-Make-provider-data-in-pg_locale_t-an-opaque-point.patchDownload

From 19610d71cbd87898c0008130f7370e14d517e64e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v6 09/11] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale.c      |  10 +-
 src/backend/utils/adt/pg_locale_icu.c  |  40 ++++++--
 src/backend/utils/adt/pg_locale_libc.c | 131 ++++++++++++++++---------
 src/include/utils/pg_locale.h          |  16 +--
 4 files changed, 126 insertions(+), 71 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5a06c2ec58..7bc27b3005 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -127,6 +127,11 @@ static pg_locale_t default_locale = NULL;
 static bool CurrentLocaleConvValid = false;
 static bool CurrentLCTimeValid = false;
 
+struct builtin_provider
+{
+	const char *locale;
+};
+
 /* Cache for collation-related knowledge */
 
 typedef struct
@@ -1330,6 +1335,7 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
 	pg_locale_t result;
+	struct builtin_provider *builtin;
 
 	if (collid == DEFAULT_COLLATION_OID)
 	{
@@ -1361,8 +1367,10 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	builtin_validate_locale(GetDatabaseEncoding(), locstr);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	builtin = MemoryContextAlloc(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	result->provider_data = (void *) builtin;
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index e7ef7f4c09..2d47faf3aa 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -37,6 +37,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static int	strncoll_icu(const char *arg1, ssize_t len1,
@@ -190,6 +196,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -241,8 +248,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -499,11 +510,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -527,6 +539,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -540,7 +554,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -571,12 +585,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -667,11 +683,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -679,7 +697,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -721,6 +739,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -743,7 +763,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -769,6 +789,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -788,7 +810,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 77220e134a..c08102002f 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -31,6 +32,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -116,17 +122,21 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static const struct casemap_methods casemap_methods_libc_sb = {
@@ -167,7 +177,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -184,7 +194,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = tolower_l((unsigned char) *p, loc);
+			*p = tolower_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -194,7 +204,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -216,7 +227,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -247,7 +258,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -264,10 +275,10 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (wasalnum)
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -278,7 +289,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -303,10 +315,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -338,7 +350,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -352,7 +364,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = toupper_l((unsigned char) *p, loc);
+			*p = toupper_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -362,7 +374,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -384,7 +397,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -412,6 +425,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -450,16 +464,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -578,6 +595,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -608,7 +627,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -632,8 +651,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -642,7 +663,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -675,6 +696,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -719,7 +742,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -766,27 +789,29 @@ char_properties_libc_1byte(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc > (pg_wchar) UCHAR_MAX)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -797,6 +822,8 @@ char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
@@ -804,21 +831,21 @@ char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
 	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -827,10 +854,12 @@ char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
 static pg_wchar
 toupper_libc_1byte(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -838,10 +867,12 @@ toupper_libc_1byte(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_wide(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -849,10 +880,12 @@ toupper_libc_wide(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_1byte(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -860,10 +893,12 @@ tolower_libc_1byte(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_wide(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -955,8 +990,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1015,8 +1052,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index deb035cfd0..e8a6e0d364 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -171,21 +171,7 @@ struct pg_locale_struct
 	const struct casemap_methods *casemap;	/* NULL if ctype_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 typedef struct pg_locale_struct *pg_locale_t;
-- 
2.34.1

v6-0010-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v6-0010-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From 2bb352a3c339ba3dbe3c6ae8ac3cde16704ab3ef Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v6 10/11] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  |  4 ++++
 src/backend/utils/adt/formatting.c    |  4 ----
 src/backend/utils/adt/pg_locale.c     |  4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 13 +++++++++++++
 src/backend/utils/adt/varlena.c       |  4 ++++
 src/include/utils/pg_locale.h         | 17 -----------------
 6 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53b6a479aa..afc2330f51 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 6a0571f93e..387009a4a9 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -71,10 +71,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/unicode_case.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7bc27b3005..b58a3a729f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2d47faf3aa..bdcb5a0ab8 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,7 +13,20 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
+
+/*
+ * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
+ * (see
+ * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
+ */
+#if U_ICU_VERSION_MAJOR_NUM >= 53
+#define HAVE_UCOL_STRCOLLUTF8 1
+#else
+#undef HAVE_UCOL_STRCOLLUTF8
+#endif
+
 #endif
 
 #include "access/htup_details.h"
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 533bebc1c7..37b3506f06 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e8a6e0d364..cbc045f126 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /*
  * Character properties for regular expressions.
  */
@@ -31,19 +27,6 @@
 #define PG_ISPUNCT     0x40
 #define PG_ISSPACE     0x80
 
-#ifdef USE_ICU
-/*
- * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
- * (see
- * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
- */
-#if U_ICU_VERSION_MAJOR_NUM >= 53
-#define HAVE_UCOL_STRCOLLUTF8 1
-#else
-#undef HAVE_UCOL_STRCOLLUTF8
-#endif
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.34.1

v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patchDownload

From 87cca7d633f75523f61f1cee30ce5cd04d0bbd4b Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 16:10:28 -0700
Subject: [PATCH v6 11/11] Introduce hooks for creating custom pg_locale_t.

Now that collation, case mapping, and ctype behavior is controlled
with a method table, we can hook the behavior.

The hooks can provide their own arbitrary method table, which may be
based on a different version of ICU than what Postgres was built with,
or entirely unrelated to ICU/libc.
---
 src/backend/utils/adt/pg_locale.c | 68 +++++++++++++++++++++----------
 src/include/utils/pg_locale.h     | 24 +++++++++++
 src/tools/pgindent/typedefs.list  |  3 ++
 3 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b58a3a729f..def24c7cfb 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -98,6 +98,9 @@
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
+create_pg_locale_hook_type create_pg_locale_hook = NULL;
+collation_version_hook_type collation_version_hook = NULL;
+
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
 #endif
@@ -1394,7 +1397,7 @@ create_pg_locale(Oid collid, MemoryContext context)
 	/* We haven't computed this yet in this session, so do it */
 	HeapTuple	tp;
 	Form_pg_collation collform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 	Datum		datum;
 	bool		isnull;
 
@@ -1403,15 +1406,21 @@ create_pg_locale(Oid collid, MemoryContext context)
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(collid, context);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(collid, context);
+
+	if (result == NULL)
+	{
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(collid, context);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	}
 
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
@@ -1468,7 +1477,7 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 
 	Assert(default_locale == NULL);
 
@@ -1478,18 +1487,25 @@ init_database_collation(void)
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
-										  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
-									  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(DEFAULT_COLLATION_OID,
 									   TopMemoryContext);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+
+	if (result == NULL)
+	{
+		if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+											  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+										   TopMemoryContext);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+	}
 
 	ReleaseSysCache(tup);
 
@@ -1558,6 +1574,14 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collation_version_hook != NULL)
+	{
+		char	   *version;
+
+		if (collation_version_hook(collprovider, collcollate, &version))
+			return version;
+	}
+
 	/*
 	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
 	 * and are not expected to change, but track the version anyway.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cbc045f126..058fdb2c74 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -159,6 +159,30 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+/*
+ * Hooks to enable custom locale providers.
+ */
+
+/*
+ * Hook create_pg_locale(). Return result (allocated in the given context) to
+ * override; or return NULL to return control to create_pg_locale(). When
+ * creating the default database collation, collid is DEFAULT_COLLATION_OID.
+ */
+typedef pg_locale_t (*create_pg_locale_hook_type) (Oid collid,
+												   MemoryContext context);
+
+/*
+ * Hook get_collation_actual_version(). Set *version out parameter and return
+ * true to override; or return false to return control to
+ * get_collation_actual_version().
+ */
+typedef bool (*collation_version_hook_type) (char collprovider,
+											 const char *collcollate,
+											 char **version);
+
+extern PGDLLIMPORT create_pg_locale_hook_type create_pg_locale_hook;
+extern PGDLLIMPORT collation_version_hook_type collation_version_hook;
+
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 65f4489dda..d0fce029b7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3372,6 +3372,7 @@ cmpEntriesArg
 codes_t
 collation_cache_entry
 collation_cache_hash
+collation_version_hook_type
 color
 colormaprange
 compare_context
@@ -3388,6 +3389,7 @@ core_yyscan_t
 corrupt_items
 cost_qual_eval_context
 cp_hash_func
+create_pg_locale_hook_type
 create_upper_paths_hook_type
 createdb_failure_params
 crosstab_HashEnt
@@ -3399,6 +3401,7 @@ datetkn
 dce_uuid_t
 dclist_head
 decimal
+default_pg_locale_hook_type
 deparse_columns
 deparse_context
 deparse_expr_cxt
-- 
2.34.1

Andreas Karlsson

andreas@proxel.se

about 1 year ago

In reply to: Jeff Davis (#4)

10 attachment(s)

Re: Collation & ctype method table, and extension hooks

On 10/10/24 1:27 AM, Jeff Davis wrote:

Attached v6 with significant improvements, and should be easier to
review.

This removes all runtime branching for collation & ctype operations; I
even removed the "provider" field of pg_locale_t to be sure.

Nice! Some great changes. I did a quick review:

= General

Why is there no pg_locale_builtin.c? I feel the code would be easier to
understand for someone not familiar with it if each provider was defined
in its own file.

= v6-0003-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patch

Looks good.

= v6-0004-Perform-provider-specific-initialization-code-in-.patch

I am not a fan of all the #ifdef USE_ICU in pg_locale_icu.c but I am
not sure if I have a cleaner solution.

= v6-0005-Control-collation-behavior-with-a-method-table.patch

strncoll_libc_win32_utf8 is used in one patch but then later defined in
the next patch. So seems like you accidentally added that to the wrong
patch,

I think adding an assert to create_pg_locale() which enforces valid
there is always a combination of collate_is_c and collate would be good.
Especially when we have the hook.

= v6-0006-Control-case-mapping-behavior-with-a-method-table.patch

I think you forgot to remove #include <wctype.h> from formatting.c.

I need to look at it more in detail but I think this new version makes
us do extra work when ICU strings grow in length when calling upper/lower.

I think adding an assert to create_pg_locale() which enforces valid
there is always a combination of ctype_is_c and casemap would be good,
similar to the collate field.

= v6-0007-Control-ctype-behavior-with-a-method-table.patch

Why are casemap and ctype_methods not the same struct? They seem very
closely related.

This commit makes me tempted to handle the ctype_is_c logic for
character classes also in callbacks and remove the if in functions like
pg_wc_ispunct(). But this si something that would need to be benchmarked.

I wonder if the bitmask idea isn't terrible for the branch predictor and
that me may want one function per character class, but this is yet again
something we need to benchmark.

= v6-0008-Remove-provider-field-from-pg_locale_t.patch

Looks good.

= v6-0009-Make-provider-data-in-pg_locale_t-an-opaque-point.patch

Is there a reason we allocate the icu_provider in create_pg_locale_icu
with MemoryContextAllocZero when we intialize everything anyway? And
similar for other providers.

= v6-0010-Don-t-include-ICU-headers-in-pg_locale.h.patch

Looks good.

= v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patch

Looks good but seems like a quite painful API to use.

* Have a CREATE LOCALE PROVIDER command and make "provider" an Oid
rather than a char ('b'/'i'/'c'). The v6 patches brings us close to
this point, but I'm not sure if we want to go this far in v18.

Probably necessary but I hate all the DDL commands the way to SQL
standard is written forces us to add.

* Need an actual extension to prove that it works.

* Clean up the way versions are handled.

* Do we want to provide support for changing the provider at initdb
time?

Not sure, need to think about this one.

* The catalog representation is not very clean or general. The libc
provider allows collation and ctype to be set separately, but they
control the environment variables, too. ICU has rules, which are
specific to ICU.

Yeah, would be really nice to clean this up but it might be work for a
different patch set.

Rebased patches are attached.

Andreas

Attachments:

v1-0001-Specialize-EEOP_-_TESTVAL-steps.patchtext/x-patch; charset=UTF-8; name=v1-0001-Specialize-EEOP_-_TESTVAL-steps.patchDownload

From d63f681f7d0df06d493a1ec06a706f32e39e250e Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Tue, 3 Sep 2024 13:53:21 +0200
Subject: [PATCH v1] Specialize EEOP_*_TESTVAL steps

Refactor the EEOP_CASE_TESTVAL and EEOP_DOMAIN_TESTVAL steps by
deciding if we should read from caseValue_datum/domainValue_datum
or not when generating the executor steps rather than doing so at
runtime.

This gives a minor performance benefit but the real goal is to make
the code less surprising and easier to follow.
---
 src/backend/executor/execExpr.c       | 27 +++++++----
 src/backend/executor/execExprInterp.c | 49 +++++++------------
 src/backend/jit/llvm/llvmjit_expr.c   | 68 +++++----------------------
 src/include/executor/execExpr.h       |  2 +
 4 files changed, 48 insertions(+), 98 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 63289ee35e..aaa67d3580 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1844,11 +1844,15 @@ ExecInitExprRec(Expr *node, ExprState *state,
 				 * That can happen because some parts of the system abuse
 				 * CaseTestExpr to cause a read of a value externally supplied
 				 * in econtext->caseValue_datum.  We'll take care of that
-				 * scenario at runtime.
+				 * by generating a specialized operation.
 				 */
-				scratch.opcode = EEOP_CASE_TESTVAL;
-				scratch.d.casetest.value = state->innermost_caseval;
-				scratch.d.casetest.isnull = state->innermost_casenull;
+				if (state->innermost_caseval == NULL)
+					scratch.opcode = EEOP_CASE_TESTVAL_EXT;
+				else {
+					scratch.opcode = EEOP_CASE_TESTVAL;
+					scratch.d.casetest.value = state->innermost_caseval;
+					scratch.d.casetest.isnull = state->innermost_casenull;
+				}
 
 				ExprEvalPushStep(state, &scratch);
 				break;
@@ -2535,12 +2539,17 @@ ExecInitExprRec(Expr *node, ExprState *state,
 				 * a standalone domain check rather than one embedded in a
 				 * larger expression.  In that case we must read from
 				 * econtext->domainValue_datum.  We'll take care of that
-				 * scenario at runtime.
+				 * by generating a specialized step.
 				 */
-				scratch.opcode = EEOP_DOMAIN_TESTVAL;
-				/* we share instruction union variant with case testval */
-				scratch.d.casetest.value = state->innermost_domainval;
-				scratch.d.casetest.isnull = state->innermost_domainnull;
+				if (state->innermost_domainval == NULL)
+					scratch.opcode = EEOP_DOMAIN_TESTVAL_EXT;
+				else
+				{
+					scratch.opcode = EEOP_DOMAIN_TESTVAL;
+					/* we share instruction union variant with case testval */
+					scratch.d.casetest.value = state->innermost_domainval;
+					scratch.d.casetest.isnull = state->innermost_domainnull;
+				}
 
 				ExprEvalPushStep(state, &scratch);
 				break;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index a6c47f61e0..83c8ae88dd 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -452,6 +452,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_PARAM_CALLBACK,
 		&&CASE_EEOP_PARAM_SET,
 		&&CASE_EEOP_CASE_TESTVAL,
+		&&CASE_EEOP_CASE_TESTVAL_EXT,
 		&&CASE_EEOP_MAKE_READONLY,
 		&&CASE_EEOP_IOCOERCE,
 		&&CASE_EEOP_IOCOERCE_SAFE,
@@ -475,6 +476,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_SBSREF_ASSIGN,
 		&&CASE_EEOP_SBSREF_FETCH,
 		&&CASE_EEOP_DOMAIN_TESTVAL,
+		&&CASE_EEOP_DOMAIN_TESTVAL_EXT,
 		&&CASE_EEOP_DOMAIN_NOTNULL,
 		&&CASE_EEOP_DOMAIN_CHECK,
 		&&CASE_EEOP_HASHDATUM_SET_INITVAL,
@@ -1107,45 +1109,26 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		}
 
 		EEO_CASE(EEOP_CASE_TESTVAL)
+			EEO_CASE(EEOP_DOMAIN_TESTVAL)
 		{
-			/*
-			 * Normally upper parts of the expression tree have setup the
-			 * values to be returned here, but some parts of the system
-			 * currently misuse {caseValue,domainValue}_{datum,isNull} to set
-			 * run-time data.  So if no values have been set-up, use
-			 * ExprContext's.  This isn't pretty, but also not *that* ugly,
-			 * and this is unlikely to be performance sensitive enough to
-			 * worry about an extra branch.
-			 */
-			if (op->d.casetest.value)
-			{
-				*op->resvalue = *op->d.casetest.value;
-				*op->resnull = *op->d.casetest.isnull;
-			}
-			else
-			{
-				*op->resvalue = econtext->caseValue_datum;
-				*op->resnull = econtext->caseValue_isNull;
-			}
+			*op->resvalue = *op->d.casetest.value;
+			*op->resnull = *op->d.casetest.isnull;
 
 			EEO_NEXT();
 		}
 
-		EEO_CASE(EEOP_DOMAIN_TESTVAL)
+		EEO_CASE(EEOP_CASE_TESTVAL_EXT)
 		{
-			/*
-			 * See EEOP_CASE_TESTVAL comment.
-			 */
-			if (op->d.casetest.value)
-			{
-				*op->resvalue = *op->d.casetest.value;
-				*op->resnull = *op->d.casetest.isnull;
-			}
-			else
-			{
-				*op->resvalue = econtext->domainValue_datum;
-				*op->resnull = econtext->domainValue_isNull;
-			}
+			*op->resvalue = econtext->caseValue_datum;
+			*op->resnull = econtext->caseValue_isNull;
+
+			EEO_NEXT();
+		}
+
+		EEO_CASE(EEOP_DOMAIN_TESTVAL_EXT)
+		{
+			*op->resvalue = econtext->domainValue_datum;
+			*op->resnull = econtext->domainValue_isNull;
 
 			EEO_NEXT();
 		}
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 48ccdb942a..b0780188bd 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1201,42 +1201,32 @@ llvm_compile_expr(ExprState *state)
 				}
 
 			case EEOP_CASE_TESTVAL:
+			case EEOP_DOMAIN_TESTVAL:
 				{
-					LLVMBasicBlockRef b_avail,
-								b_notavail;
 					LLVMValueRef v_casevaluep,
 								v_casevalue;
 					LLVMValueRef v_casenullp,
 								v_casenull;
-					LLVMValueRef v_casevaluenull;
-
-					b_avail = l_bb_before_v(opblocks[opno + 1],
-											"op.%d.avail", opno);
-					b_notavail = l_bb_before_v(opblocks[opno + 1],
-											   "op.%d.notavail", opno);
 
 					v_casevaluep = l_ptr_const(op->d.casetest.value,
 											   l_ptr(TypeSizeT));
 					v_casenullp = l_ptr_const(op->d.casetest.isnull,
 											  l_ptr(TypeStorageBool));
 
-					v_casevaluenull =
-						LLVMBuildICmp(b, LLVMIntEQ,
-									  LLVMBuildPtrToInt(b, v_casevaluep,
-														TypeSizeT, ""),
-									  l_sizet_const(0), "");
-					LLVMBuildCondBr(b, v_casevaluenull, b_notavail, b_avail);
-
-					/* if casetest != NULL */
-					LLVMPositionBuilderAtEnd(b, b_avail);
 					v_casevalue = l_load(b, TypeSizeT, v_casevaluep, "");
 					v_casenull = l_load(b, TypeStorageBool, v_casenullp, "");
 					LLVMBuildStore(b, v_casevalue, v_resvaluep);
 					LLVMBuildStore(b, v_casenull, v_resnullp);
+
 					LLVMBuildBr(b, opblocks[opno + 1]);
+					break;
+				}
+
+			case EEOP_CASE_TESTVAL_EXT:
+				{
+					LLVMValueRef v_casevalue;
+					LLVMValueRef v_casenull;
 
-					/* if casetest == NULL */
-					LLVMPositionBuilderAtEnd(b, b_notavail);
 					v_casevalue =
 						l_load_struct_gep(b,
 										  StructExprContext,
@@ -1830,45 +1820,11 @@ llvm_compile_expr(ExprState *state)
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
-			case EEOP_DOMAIN_TESTVAL:
+			case EEOP_DOMAIN_TESTVAL_EXT:
 				{
-					LLVMBasicBlockRef b_avail,
-								b_notavail;
-					LLVMValueRef v_casevaluep,
-								v_casevalue;
-					LLVMValueRef v_casenullp,
-								v_casenull;
-					LLVMValueRef v_casevaluenull;
-
-					b_avail = l_bb_before_v(opblocks[opno + 1],
-											"op.%d.avail", opno);
-					b_notavail = l_bb_before_v(opblocks[opno + 1],
-											   "op.%d.notavail", opno);
-
-					v_casevaluep = l_ptr_const(op->d.casetest.value,
-											   l_ptr(TypeSizeT));
-					v_casenullp = l_ptr_const(op->d.casetest.isnull,
-											  l_ptr(TypeStorageBool));
-
-					v_casevaluenull =
-						LLVMBuildICmp(b, LLVMIntEQ,
-									  LLVMBuildPtrToInt(b, v_casevaluep,
-														TypeSizeT, ""),
-									  l_sizet_const(0), "");
-					LLVMBuildCondBr(b,
-									v_casevaluenull,
-									b_notavail, b_avail);
-
-					/* if casetest != NULL */
-					LLVMPositionBuilderAtEnd(b, b_avail);
-					v_casevalue = l_load(b, TypeSizeT, v_casevaluep, "");
-					v_casenull = l_load(b, TypeStorageBool, v_casenullp, "");
-					LLVMBuildStore(b, v_casevalue, v_resvaluep);
-					LLVMBuildStore(b, v_casenull, v_resnullp);
-					LLVMBuildBr(b, opblocks[opno + 1]);
+					LLVMValueRef v_casevalue;
+					LLVMValueRef v_casenull;
 
-					/* if casetest == NULL */
-					LLVMPositionBuilderAtEnd(b, b_notavail);
 					v_casevalue =
 						l_load_struct_gep(b,
 										  StructExprContext,
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index eec0aa699e..4146aad88d 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -165,6 +165,7 @@ typedef enum ExprEvalOp
 
 	/* return CaseTestExpr value */
 	EEOP_CASE_TESTVAL,
+	EEOP_CASE_TESTVAL_EXT,
 
 	/* apply MakeExpandedObjectReadOnly() to target value */
 	EEOP_MAKE_READONLY,
@@ -228,6 +229,7 @@ typedef enum ExprEvalOp
 
 	/* evaluate value for CoerceToDomainValue */
 	EEOP_DOMAIN_TESTVAL,
+	EEOP_DOMAIN_TESTVAL_EXT,
 
 	/* evaluate a domain's NOT NULL constraint */
 	EEOP_DOMAIN_NOTNULL,
-- 
2.43.0

v7-0001-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patchtext/x-patch; charset=UTF-8; name=v7-0001-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patchDownload

From 49168ac5d61c7b48e320d48410e8587b8db005af Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 14:58:52 -0700
Subject: [PATCH v7 1/9] Refactor the code to create a pg_locale_t into new
 function.

---
 src/backend/utils/adt/pg_locale.c | 297 ++++++++++++++----------------
 1 file changed, 140 insertions(+), 157 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index daf9689a82..0f43004571 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1215,42 +1215,136 @@ IsoLocaleName(const char *winlocname)
 
 
 /*
- * Cache mechanism for collation information.
- *
- * Note that we currently lack any way to flush the cache.  Since we don't
- * support ALTER COLLATION, this is OK.  The worst case is that someone
- * drops a collation, and a useless cache entry hangs around in existing
- * backends.
+ * Create a new pg_locale_t struct for the given collation oid.
  */
-static collation_cache_entry *
-lookup_collation_cache(Oid collation)
+static pg_locale_t
+create_pg_locale(Oid collid, MemoryContext context)
 {
-	collation_cache_entry *cache_entry;
-	bool		found;
+	/* We haven't computed this yet in this session, so do it */
+	HeapTuple	tp;
+	Form_pg_collation collform;
+	pg_locale_t result;
+	Datum		datum;
+	bool		isnull;
 
-	Assert(OidIsValid(collation));
-	Assert(collation != DEFAULT_COLLATION_OID);
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	if (CollationCache == NULL)
+	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+	if (!HeapTupleIsValid(tp))
+		elog(ERROR, "cache lookup failed for collation %u", collid);
+	collform = (Form_pg_collation) GETSTRUCT(tp);
+
+	result->provider = collform->collprovider;
+	result->deterministic = collform->collisdeterministic;
+
+	if (collform->collprovider == COLLPROVIDER_BUILTIN)
 	{
-		CollationCacheContext = AllocSetContextCreate(TopMemoryContext,
-													  "collation cache",
-													  ALLOCSET_DEFAULT_SIZES);
-		CollationCache = collation_cache_create(CollationCacheContext,
-												16, NULL);
+		const char *locstr;
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+		locstr = TextDatumGetCString(datum);
+
+		result->collate_is_c = true;
+		result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+		builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+		result->info.builtin.locale = MemoryContextStrdup(context,
+														  locstr);
 	}
+	else if (collform->collprovider == COLLPROVIDER_ICU)
+	{
+#ifdef USE_ICU
+		const char *iculocstr;
+		const char *icurules;
 
-	cache_entry = collation_cache_insert(CollationCache, collation, &found);
-	if (!found)
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+		iculocstr = TextDatumGetCString(datum);
+
+		result->collate_is_c = false;
+		result->ctype_is_c = false;
+
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
+#else
+		/* could get here if a collation was created by a build with ICU */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#endif
+	}
+	else if (collform->collprovider == COLLPROVIDER_LIBC)
 	{
-		/*
-		 * Make sure cache entry is marked invalid, in case we fail before
-		 * setting things.
-		 */
-		cache_entry->locale = 0;
+		const char *collcollate;
+		const char *collctype;
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
+		collcollate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
+		collctype = TextDatumGetCString(datum);
+
+		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
+			(strcmp(collcollate, "POSIX") == 0);
+		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
+			(strcmp(collctype, "POSIX") == 0);
+
+		result->info.lt = make_libc_collator(collcollate, collctype);
+	}
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+
+	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
+							&isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+
+		collversionstr = TextDatumGetCString(datum);
+
+		if (collform->collprovider == COLLPROVIDER_LIBC)
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
+		else
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 TextDatumGetCString(datum));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE COLLATION
+			 * but the provider does not support versioning, or manually
+			 * creating a mess in the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was recorded",
+							NameStr(collform->collname))));
+		}
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
 	}
 
-	return cache_entry;
+	ReleaseSysCache(tp);
+
+	return result;
 }
 
 /*
@@ -1358,6 +1452,7 @@ pg_locale_t
 pg_newlocale_from_collation(Oid collid)
 {
 	collation_cache_entry *cache_entry;
+	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
 		return &default_locale;
@@ -1368,140 +1463,28 @@ pg_newlocale_from_collation(Oid collid)
 	if (last_collation_cache_oid == collid)
 		return last_collation_cache_locale;
 
-	cache_entry = lookup_collation_cache(collid);
-
-	if (cache_entry->locale == 0)
+	if (CollationCache == NULL)
 	{
-		/* We haven't computed this yet in this session, so do it */
-		HeapTuple	tp;
-		Form_pg_collation collform;
-		struct pg_locale_struct result;
-		pg_locale_t resultp;
-		Datum		datum;
-		bool		isnull;
-
-		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
-		if (!HeapTupleIsValid(tp))
-			elog(ERROR, "cache lookup failed for collation %u", collid);
-		collform = (Form_pg_collation) GETSTRUCT(tp);
-
-		/* We'll fill in the result struct locally before allocating memory */
-		memset(&result, 0, sizeof(result));
-		result.provider = collform->collprovider;
-		result.deterministic = collform->collisdeterministic;
-
-		if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		{
-			const char *locstr;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-			locstr = TextDatumGetCString(datum);
-
-			result.collate_is_c = true;
-			result.ctype_is_c = (strcmp(locstr, "C") == 0);
-
-			builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-			result.info.builtin.locale = MemoryContextStrdup(TopMemoryContext,
-															 locstr);
-		}
-		else if (collform->collprovider == COLLPROVIDER_ICU)
-		{
-#ifdef USE_ICU
-			const char *iculocstr;
-			const char *icurules;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-			iculocstr = TextDatumGetCString(datum);
-
-			result.collate_is_c = false;
-			result.ctype_is_c = false;
-
-			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-			if (!isnull)
-				icurules = TextDatumGetCString(datum);
-			else
-				icurules = NULL;
-
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
-			result.info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build")));
-#endif
-		}
-		else if (collform->collprovider == COLLPROVIDER_LIBC)
-		{
-			const char *collcollate;
-			const char *collctype;
-
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-			collcollate = TextDatumGetCString(datum);
-			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-			collctype = TextDatumGetCString(datum);
-
-			result.collate_is_c = (strcmp(collcollate, "C") == 0) ||
-				(strcmp(collcollate, "POSIX") == 0);
-			result.ctype_is_c = (strcmp(collctype, "C") == 0) ||
-				(strcmp(collctype, "POSIX") == 0);
-
-			result.info.lt = make_libc_collator(collcollate, collctype);
-		}
-		else
-			/* shouldn't happen */
-			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-								&isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			collversionstr = TextDatumGetCString(datum);
-
-			if (collform->collprovider == COLLPROVIDER_LIBC)
-				datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-			else
-				datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider,
-															 TextDatumGetCString(datum));
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION but the provider does not support versioning, or
-				 * manually creating a mess in the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was recorded",
-								NameStr(collform->collname))));
-			}
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
-
-		ReleaseSysCache(tp);
+		CollationCacheContext = AllocSetContextCreate(TopMemoryContext,
+													  "collation cache",
+													  ALLOCSET_DEFAULT_SIZES);
+		CollationCache = collation_cache_create(CollationCacheContext,
+												16, NULL);
+	}
 
-		/* We'll keep the pg_locale_t structures in TopMemoryContext */
-		resultp = MemoryContextAlloc(TopMemoryContext, sizeof(*resultp));
-		*resultp = result;
+	cache_entry = collation_cache_insert(CollationCache, collid, &found);
+	if (!found)
+	{
+		/*
+		 * Make sure cache entry is marked invalid, in case we fail before
+		 * setting things.
+		 */
+		cache_entry->locale = 0;
+	}
 
-		cache_entry->locale = resultp;
+	if (cache_entry->locale == 0)
+	{
+		cache_entry->locale = create_pg_locale(collid, CollationCacheContext);
 	}
 
 	last_collation_cache_oid = collid;
-- 
2.45.2

v7-0002-Perform-provider-specific-initialization-code-in-.patchtext/x-patch; charset=UTF-8; name=v7-0002-Perform-provider-specific-initialization-code-in-.patchDownload

From 15a8ea61c212084fdfe3c46a9105aa392aeec9cb Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 15:49:32 -0700
Subject: [PATCH v7 2/9] Perform provider-specific initialization code in new
 functions.

---
 src/backend/utils/adt/pg_locale.c      | 199 ++++++++-----------------
 src/backend/utils/adt/pg_locale_icu.c  |  97 +++++++++++-
 src/backend/utils/adt/pg_locale_libc.c |  74 ++++++++-
 3 files changed, 227 insertions(+), 143 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 0f43004571..02dc9d07dc 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -89,11 +89,12 @@
 
 #define		MAX_L10N_DATA		80
 
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
+
 /* pg_locale_icu.c */
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -106,8 +107,6 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 #endif
 
 /* pg_locale_libc.c */
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -138,7 +137,7 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
-static struct pg_locale_struct default_locale;
+static pg_locale_t default_locale = NULL;
 
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
@@ -1213,6 +1212,51 @@ IsoLocaleName(const char *winlocname)
 
 #endif							/* WIN32 && LC_MESSAGES */
 
+static pg_locale_t
+create_pg_locale_builtin(Oid collid, MemoryContext context)
+{
+	const char *locstr;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+
+	builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	result->provider = COLLPROVIDER_BUILTIN;
+	result->deterministic = true;
+	result->collate_is_c = true;
+	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+	return result;
+}
 
 /*
  * Create a new pg_locale_t struct for the given collation oid.
@@ -1227,75 +1271,17 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Datum		datum;
 	bool		isnull;
 
-	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	result->provider = collform->collprovider;
-	result->deterministic = collform->collisdeterministic;
-
 	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-	{
-		const char *locstr;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		locstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = true;
-		result->ctype_is_c = (strcmp(locstr, "C") == 0);
-
-		builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-		result->info.builtin.locale = MemoryContextStrdup(context,
-														  locstr);
-	}
+		result = create_pg_locale_builtin(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		const char *iculocstr;
-		const char *icurules;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		iculocstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = false;
-		result->ctype_is_c = false;
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_LIBC)
-	{
-		const char *collcollate;
-		const char *collctype;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-		collctype = TextDatumGetCString(datum);
-
-		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
-			(strcmp(collcollate, "POSIX") == 0);
-		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
-			(strcmp(collctype, "POSIX") == 0);
-
-		result->info.lt = make_libc_collator(collcollate, collctype);
-	}
+		result = create_pg_locale_libc(collid, context);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
@@ -1355,7 +1341,9 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	Datum		datum;
+	pg_locale_t result;
+
+	Assert(default_locale == NULL);
 
 	/* Fetch our pg_database row normally, via syscache */
 	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
@@ -1364,80 +1352,21 @@ init_database_collation(void)
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
 	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-	{
-		char	   *datlocale;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		builtin_validate_locale(dbform->encoding, datlocale);
-
-		default_locale.collate_is_c = true;
-		default_locale.ctype_is_c = (strcmp(datlocale, "C") == 0);
-
-		default_locale.info.builtin.locale = MemoryContextStrdup(
-																 TopMemoryContext, datlocale);
-	}
+		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		char	   *datlocale;
-		char	   *icurules;
-		bool		isnull;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = false;
-		default_locale.ctype_is_c = false;
-
-		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		default_locale.info.icu.locale = MemoryContextStrdup(TopMemoryContext, datlocale);
-		default_locale.info.icu.ucol = make_icu_collator(datlocale, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+									  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-	{
-		const char *datcollate;
-		const char *datctype;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datcollate);
-		datcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
-		datctype = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = (strcmp(datcollate, "C") == 0) ||
-			(strcmp(datcollate, "POSIX") == 0);
-		default_locale.ctype_is_c = (strcmp(datctype, "C") == 0) ||
-			(strcmp(datctype, "POSIX") == 0);
-
-		default_locale.info.lt = make_libc_collator(datcollate, datctype);
-	}
+		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+									   TopMemoryContext);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
 
-
-	default_locale.provider = dbform->datlocprovider;
-
-	/*
-	 * Default locale is currently always deterministic.  Nondeterministic
-	 * locales currently don't support pattern matching, which would break a
-	 * lot of things if applied globally.
-	 */
-	default_locale.deterministic = true;
-
 	ReleaseSysCache(tup);
+
+	default_locale = result;
 }
 
 /*
@@ -1455,7 +1384,7 @@ pg_newlocale_from_collation(Oid collid)
 	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return &default_locale;
+		return default_locale;
 
 	if (!OidIsValid(collid))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2a87e25dfb..73eb430d75 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -12,14 +12,20 @@
 #include "postgres.h"
 
 #ifdef USE_ICU
-
 #include <unicode/ucnv.h>
 #include <unicode/ustring.h>
+#endif
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
@@ -29,9 +35,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+
+#ifdef USE_ICU
+
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -49,6 +57,8 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
  */
 static UConverter *icu_converter = NULL;
 
+static UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
 static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 								 const char *arg2, ssize_t len2,
 								 pg_locale_t locale);
@@ -63,6 +73,85 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+#endif
+
+pg_locale_t
+create_pg_locale_icu(Oid collid, MemoryContext context)
+{
+#ifdef USE_ICU
+	bool		deterministic;
+	const char *iculocstr;
+	const char *icurules = NULL;
+	UCollator  *collator;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+
+		/* default database collation is always deterministic */
+		deterministic = true;
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(DATABASEOID, tp,
+								Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		Form_pg_collation collform;
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		collform = (Form_pg_collation) GETSTRUCT(tp);
+		deterministic = collform->collisdeterministic;
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(COLLOID, tp,
+								Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+	collator = make_icu_collator(iculocstr, icurules);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+	result->info.icu.ucol = collator;
+	result->provider = COLLPROVIDER_ICU;
+	result->deterministic = deterministic;
+	result->collate_is_c = false;
+	result->ctype_is_c = false;
+
+	return result;
+#else
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build")));
+
+	return NULL;
+#endif
+}
+
+#ifdef USE_ICU
 
 /*
  * Wrapper around ucol_open() to handle API differences for older ICU
@@ -160,7 +249,7 @@ pg_ucol_open(const char *loc_str)
  *
  * Ensure that no path leaks a UCollator.
  */
-UCollator *
+static UCollator *
 make_icu_collator(const char *iculocstr, const char *icurules)
 {
 	if (!icurules)
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 83f310fc71..374ac37ba0 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,10 +11,16 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
@@ -24,15 +30,16 @@
  */
 #define		TEXTBUFLEN			1024
 
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
+
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
-
+static locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
 static void report_newlocale_failure(const char *localename);
 
 #ifdef WIN32
@@ -41,6 +48,65 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+pg_locale_t
+create_pg_locale_libc(Oid collid, MemoryContext context)
+{
+	const char *collate;
+	const char *ctype;
+	locale_t	loc;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+
+	loc = make_libc_collator(collate, ctype);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->provider = COLLPROVIDER_LIBC;
+	result->deterministic = true;
+	result->collate_is_c = (strcmp(collate, "C") == 0) ||
+		(strcmp(collate, "POSIX") == 0);
+	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
+		(strcmp(ctype, "POSIX") == 0);
+	result->info.lt = loc;
+
+	return result;
+}
+
 /*
  * Create a locale_t with the given collation and ctype.
  *
@@ -49,7 +115,7 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
  *
  * Ensure that no path leaks a locale_t.
  */
-locale_t
+static locale_t
 make_libc_collator(const char *collate, const char *ctype)
 {
 	locale_t	loc = 0;
-- 
2.45.2

v7-0003-Control-collation-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v7-0003-Control-collation-behavior-with-a-method-table.patchDownload

From 24175216aad4487df84f13c94dbf783258cd8da7 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 11:27:29 -0700
Subject: [PATCH v7 3/9] Control collation behavior with a method table.

Previously, behavior branched based on the provider.

A method table is less error prone and easier to hook.
---
 src/backend/utils/adt/pg_locale.c      | 122 +++-----------------
 src/backend/utils/adt/pg_locale_icu.c  | 147 +++++++++++++++----------
 src/backend/utils/adt/pg_locale_libc.c |  40 +++++--
 src/include/utils/pg_locale.h          |  33 ++++++
 4 files changed, 164 insertions(+), 178 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 02dc9d07dc..7efb8813bd 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -95,25 +95,8 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 /* pg_locale_icu.c */
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
-						 const char *arg2, ssize_t len2,
-						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
-						   const char *src, ssize_t srclen,
-						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
-								  const char *src, ssize_t srclen,
-								  pg_locale_t locale);
 #endif
 
-/* pg_locale_libc.c */
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
-						  const char *arg2, ssize_t len2,
-						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1537,19 +1520,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 int
 pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, -1, arg2, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, -1, arg2, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, -1, arg2, -1, locale);
 }
 
 /*
@@ -1570,51 +1541,25 @@ int
 pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 			pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, len1, arg2, len2, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, len1, arg2, len2, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, len1, arg2, len2, locale);
 }
 
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
  *
- * Unfortunately, it seems that strxfrm() for non-C collations is broken on
- * many common platforms; testing of multiple versions of glibc reveals that,
- * for many locales, strcoll() and strxfrm() do not return consistent
- * results. While no other libc other than Cygwin has so far been shown to
- * have a problem, we take the conservative course of action for right now and
- * disable this categorically.  (Users who are certain this isn't a problem on
- * their system can define TRUST_STRXFRM.)
  *
  * No similar problem is known for the ICU provider.
  */
 bool
 pg_strxfrm_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-#ifdef TRUST_STRXFRM
-		return true;
-#else
-		return false;
-#endif
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	/*
+	 * locale->collate->strnxfrm is still a required method, even if it may
+	 * have the wrong behavior, because the planner uses it for estimates in
+	 * some cases.
+	 */
+	return locale->collate->strxfrm_is_safe;
 }
 
 /*
@@ -1625,19 +1570,7 @@ pg_strxfrm_enabled(pg_locale_t locale)
 size_t
 pg_strxfrm(char *dest, const char *src, size_t destsize, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1663,19 +1596,7 @@ size_t
 pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, srclen, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, srclen, locale);
 }
 
 /*
@@ -1685,15 +1606,7 @@ pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 bool
 pg_strxfrm_prefix_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-		return false;
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	return (locale->collate->strnxfrm_prefix != NULL);
 }
 
 /*
@@ -1705,7 +1618,7 @@ size_t
 pg_strxfrm_prefix(char *dest, const char *src, size_t destsize,
 				  pg_locale_t locale)
 {
-	return pg_strnxfrm_prefix(dest, destsize, src, -1, locale);
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1730,16 +1643,7 @@ size_t
 pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 				   ssize_t srclen, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-#ifdef USE_ICU
-	if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_prefix_icu(dest, destsize, src, -1, locale);
-	else
-#endif
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 73eb430d75..11ec9d4e4b 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -40,13 +40,14 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
+
+static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
+static size_t strnxfrm_icu(char *dest, size_t destsize,
 						   const char *src, ssize_t srclen,
 						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
@@ -59,12 +60,20 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
-static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-								 const char *arg2, ssize_t len2,
-								 pg_locale_t locale);
-static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-										  const char *src, ssize_t srclen,
-										  pg_locale_t locale);
+static int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+#ifdef HAVE_UCOL_STRCOLLUTF8
+static int	strncoll_icu_utf8(const char *arg1, ssize_t len1,
+							  const char *arg2, ssize_t len2,
+							  pg_locale_t locale);
+#endif
+static size_t strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+									   const char *src, ssize_t srclen,
+									   pg_locale_t locale);
 static void init_icu_converter(void);
 static size_t uchar_length(UConverter *converter,
 						   const char *str, int32_t len);
@@ -73,6 +82,25 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+
+static const struct collate_methods collate_methods_icu = {
+	.strncoll = strncoll_icu,
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu,
+	.strxfrm_is_safe = true,
+};
+
+static const struct collate_methods collate_methods_icu_utf8 = {
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	.strncoll = strncoll_icu_utf8,
+#else
+	.strncoll = strncoll_icu,
+#endif
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu_utf8,
+	.strxfrm_is_safe = true,
+};
+
 #endif
 
 pg_locale_t
@@ -139,6 +167,10 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
+	if (GetDatabaseEncoding() == PG_UTF8)
+		result->collate = &collate_methods_icu_utf8;
+	else
+		result->collate = &collate_methods_icu;
 
 	return result;
 #else
@@ -313,42 +345,36 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 }
 
 /*
- * strncoll_icu
+ * strncoll_icu_utf8
  *
  * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
  * database encoding. An argument length of -1 means the string is
  * NUL-terminated.
  */
+#ifdef HAVE_UCOL_STRCOLLUTF8
 int
-strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			 pg_locale_t locale)
+strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+				  pg_locale_t locale)
 {
 	int			result;
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		status = U_ZERO_ERROR;
-		result = ucol_strcollUTF8(locale->info.icu.ucol,
-								  arg1, len1,
-								  arg2, len2,
-								  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("collation failed: %s", u_errorName(status))));
-	}
-	else
-#endif
-	{
-		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
-	}
+	status = U_ZERO_ERROR;
+	result = ucol_strcollUTF8(locale->info.icu.ucol,
+							  arg1, len1,
+							  arg2, len2,
+							  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("collation failed: %s", u_errorName(status))));
 
 	return result;
 }
+#endif
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
@@ -399,37 +425,32 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
-strnxfrm_prefix_icu(char *dest, size_t destsize,
-					const char *src, ssize_t srclen,
-					pg_locale_t locale)
+strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+						 const char *src, ssize_t srclen,
+						 pg_locale_t locale)
 {
 	size_t		result;
+	UCharIterator iter;
+	uint32_t	state[2];
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UCharIterator iter;
-		uint32_t	state[2];
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		uiter_setUTF8(&iter, src, srclen);
-		state[0] = state[1] = 0;	/* won't need that again */
-		status = U_ZERO_ERROR;
-		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
-									  &iter,
-									  state,
-									  (uint8_t *) dest,
-									  destsize,
-									  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("sort key generation failed: %s",
-							u_errorName(status))));
-	}
-	else
-		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
-											 locale);
+	uiter_setUTF8(&iter, src, srclen);
+	state[0] = state[1] = 0;	/* won't need that again */
+	status = U_ZERO_ERROR;
+	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+								  &iter,
+								  state,
+								  (uint8_t *) dest,
+								  destsize,
+								  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("sort key generation failed: %s",
+						u_errorName(status))));
 
 	return result;
 }
@@ -504,7 +525,7 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 }
 
 /*
- * strncoll_icu_no_utf8
+ * strncoll_icu
  *
  * Convert the arguments from the database encoding to UChar strings, then
  * call ucol_strcoll(). An argument length of -1 means that the string is
@@ -514,8 +535,8 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
  * caller should call that instead.
  */
 static int
-strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-					 const char *arg2, ssize_t len2, pg_locale_t locale)
+strncoll_icu(const char *arg1, ssize_t len1,
+			 const char *arg2, ssize_t len2, pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -528,6 +549,8 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 	int			result;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 #endif
@@ -561,9 +584,9 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 static size_t
-strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale)
+strnxfrm_prefix_icu(char *dest, size_t destsize,
+					const char *src, ssize_t srclen,
+					pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -576,6 +599,8 @@ strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
 	Size		result_bsize;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	init_icu_converter();
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 374ac37ba0..c7be6dd4f9 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -32,10 +32,10 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
+static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
+static size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 static locale_t make_libc_collator(const char *collate,
@@ -48,6 +48,27 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static const struct collate_methods collate_methods_libc = {
+	.strncoll = strncoll_libc,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+
+	/*
+	 * Unfortunately, it seems that strxfrm() for non-C collations is broken
+	 * on many common platforms; testing of multiple versions of glibc reveals
+	 * that, for many locales, strcoll() and strxfrm() do not return
+	 * consistent results. While no other libc other than Cygwin has so far
+	 * been shown to have a problem, we take the conservative course of action
+	 * for right now and disable this categorically.  (Users who are certain
+	 * this isn't a problem on their system can define TRUST_STRXFRM.)
+	 */
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -103,6 +124,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
 	result->info.lt = loc;
+	if (!result->collate_is_c)
+	{
+#ifdef WIN32
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->collate = &collate_methods_libc_win32_utf8;
+		else
+#endif
+			result->collate = &collate_methods_libc;
+	}
 
 	return result;
 }
@@ -200,12 +230,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 
 	Assert(locale->provider == COLLPROVIDER_LIBC);
 
-#ifdef WIN32
-	/* check for this case before doing the work for nul-termination */
-	if (GetDatabaseEncoding() == PG_UTF8)
-		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
-#endif							/* WIN32 */
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 37ecf95193..2f05dffcdd 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -60,6 +60,36 @@ extern struct lconv *PGLC_localeconv(void);
 extern void cache_locale_time(void);
 
 
+struct pg_locale_struct;
+typedef struct pg_locale_struct *pg_locale_t;
+
+/* methods that define collation behavior */
+struct collate_methods
+{
+	/* required */
+	int			(*strncoll) (const char *arg1, ssize_t len1,
+							 const char *arg2, ssize_t len2,
+							 pg_locale_t locale);
+
+	/* required */
+	size_t		(*strnxfrm) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* optional */
+	size_t		(*strnxfrm_prefix) (char *dest, size_t destsize,
+									const char *src, ssize_t srclen,
+									pg_locale_t locale);
+
+	/*
+	 * If the strnxfrm method is not trusted to return the correct results,
+	 * set strxfrm_is_safe to false. It set to false, the method will not be
+	 * used in most cases, but the planner still expects it to be there for
+	 * estimation purposes (where incorrect results are acceptable).
+	 */
+	bool		strxfrm_is_safe;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -82,6 +112,9 @@ struct pg_locale_struct
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
+
+	const struct collate_methods *collate;	/* NULL if collate_is_c */
+
 	union
 	{
 		struct
-- 
2.45.2

v7-0004-Control-case-mapping-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v7-0004-Control-case-mapping-behavior-with-a-method-table.patchDownload

From 76dfe6e6aa0328069949bdd54b28f42a54175994 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 12:12:51 -0700
Subject: [PATCH v7 4/9] Control case mapping behavior with a method table.

Previously, case mapping (LOWER(), INITCAP(), UPPER()) behavior
branched based on the provider.

A method table is less error-prone and easier to hook.
---
 src/backend/utils/adt/formatting.c     | 445 ++++---------------------
 src/backend/utils/adt/pg_locale.c      | 101 ++++++
 src/backend/utils/adt/pg_locale_icu.c  | 140 +++++++-
 src/backend/utils/adt/pg_locale_libc.c | 302 +++++++++++++++++
 src/include/utils/pg_locale.h          |  29 +-
 5 files changed, 619 insertions(+), 398 deletions(-)

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 85a7dd4561..6a0571f93e 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1570,52 +1570,6 @@ str_numth(char *dest, char *num, int type)
  *			upper/lower/initcap functions
  *****************************************************************************/
 
-#ifdef USE_ICU
-
-typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
-									 const UChar *src, int32_t srcLength,
-									 const char *locale,
-									 UErrorCode *pErrorCode);
-
-static int32_t
-icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
-				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
-{
-	UErrorCode	status;
-	int32_t		len_dest;
-
-	len_dest = len_source;		/* try first with same length */
-	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-	status = U_ZERO_ERROR;
-	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
-	if (status == U_BUFFER_OVERFLOW_ERROR)
-	{
-		/* try again with adjusted length */
-		pfree(*buff_dest);
-		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-		status = U_ZERO_ERROR;
-		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
-	}
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("case conversion failed: %s", u_errorName(status))));
-	return len_dest;
-}
-
-static int32_t
-u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
-						const UChar *src, int32_t srcLength,
-						const char *locale,
-						UErrorCode *pErrorCode)
-{
-	return u_strToTitle(dest, destCapacity, src, srcLength,
-						NULL, locale, pErrorCode);
-}
-
-#endif							/* USE_ICU */
-
 /*
  * If the system provides the needed functions for wide-character manipulation
  * (which are all standardized by C99), then we implement upper/lower/initcap
@@ -1663,101 +1617,28 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar;
-			int32_t		len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToLower, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strlower(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strlower(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that tolower_l() will not be so broken as
-				 * to need an isupper_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
@@ -1800,147 +1681,33 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToUpper, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
-		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strupper(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strupper(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l() will not be so broken as
-				 * to need an islower_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
 }
 
-struct WordBoundaryState
-{
-	const char *str;
-	size_t		len;
-	size_t		offset;
-	bool		init;
-	bool		prev_alnum;
-};
-
-/*
- * Simple word boundary iterator that draws boundaries each time the result of
- * pg_u_isalnum() changes.
- */
-static size_t
-initcap_wbnext(void *state)
-{
-	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
-
-	while (wbstate->offset < wbstate->len &&
-		   wbstate->str[wbstate->offset] != '\0')
-	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
-										wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
-
-		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
-		{
-			size_t		prev_offset = wbstate->offset;
-
-			wbstate->init = true;
-			wbstate->offset += unicode_utf8len(u);
-			wbstate->prev_alnum = curr_alnum;
-			return prev_offset;
-		}
-
-		wbstate->offset += unicode_utf8len(u);
-	}
-
-	return wbstate->len;
-}
-
 /*
  * collation-aware, wide-character-aware initcap function
  *
@@ -1951,7 +1718,6 @@ char *
 str_initcap(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
-	int			wasalnum = false;
 	pg_locale_t mylocale;
 
 	if (!buff)
@@ -1979,125 +1745,28 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToTitle_default_BI, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-			struct WordBoundaryState wbstate = {
-				.str = src,
-				.len = srclen,
-				.offset = 0,
-				.init = false,
-				.prev_alnum = false,
-			};
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strtitle(dst, dstsize, src, srclen,
-									  initcap_wbnext, &wbstate);
-			if (needed + 1 > dstsize)
-			{
-				/* reset iterator */
-				wbstate.offset = 0;
-				wbstate.init = false;
-
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strtitle(dst, dstsize, src, srclen,
-										  initcap_wbnext, &wbstate);
-				Assert(needed + 1 == dstsize);
-			}
-
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-				{
-					if (wasalnum)
-						workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
-					else
-						workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-					wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
-				}
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
 
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l()/tolower_l() will not be so
-				 * broken as to need guard tests.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (wasalnum)
-						*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-					else
-						*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-					wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7efb8813bd..f9a8edf3e0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -59,6 +59,8 @@
 #include "catalog/pg_database.h"
 #include "common/hashfn.h"
 #include "common/string.h"
+#include "common/unicode_case.h"
+#include "common/unicode_category.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -165,6 +167,83 @@ static pg_locale_t last_collation_cache_locale = NULL;
 static char *IsoLocaleName(const char *);
 #endif
 
+struct WordBoundaryState
+{
+	const char *str;
+	size_t		len;
+	size_t		offset;
+	bool		init;
+	bool		prev_alnum;
+};
+
+/*
+ * Simple word boundary iterator that draws boundaries each time the result of
+ * pg_u_isalnum() changes.
+ */
+static size_t
+initcap_wbnext(void *state)
+{
+	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
+
+	while (wbstate->offset < wbstate->len &&
+		   wbstate->str[wbstate->offset] != '\0')
+	{
+		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+										wbstate->offset);
+		bool		curr_alnum = pg_u_isalnum(u, true);
+
+		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
+		{
+			size_t		prev_offset = wbstate->offset;
+
+			wbstate->init = true;
+			wbstate->offset += unicode_utf8len(u);
+			wbstate->prev_alnum = curr_alnum;
+			return prev_offset;
+		}
+
+		wbstate->offset += unicode_utf8len(u);
+	}
+
+	return wbstate->len;
+}
+
+static size_t
+strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strlower(dest, destsize, src, srclen);
+}
+
+static size_t
+strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	struct WordBoundaryState wbstate = {
+		.str = src,
+		.len = srclen,
+		.offset = 0,
+		.init = false,
+		.prev_alnum = false,
+	};
+
+	return unicode_strtitle(dest, destsize, src, srclen,
+							initcap_wbnext, &wbstate);
+}
+
+static size_t
+strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strupper(dest, destsize, src, srclen);
+}
+
+static const struct casemap_methods casemap_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+};
+
 /*
  * pg_perm_setlocale
  *
@@ -1237,6 +1316,7 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	result->casemap = &casemap_methods_builtin;
 
 	return result;
 }
@@ -1512,6 +1592,27 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+size_t
+pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strlower(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strtitle(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->casemap->strupper(dst, dstsize, src, srclen, locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 11ec9d4e4b..3f4ada38bd 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,11 @@ static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
+typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
+									 const UChar *src, int32_t srcLength,
+									 const char *locale,
+									 UErrorCode *pErrorCode);
+
 /*
  * Converter object for converting between ICU's UChar strings and C strings
  * in database encoding.  Since the database encoding doesn't change, we only
@@ -60,6 +65,16 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
+
+static size_t strlower_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strtitle_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strupper_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -80,8 +95,19 @@ static size_t uchar_length(UConverter *converter,
 static int32_t uchar_convert(UConverter *converter,
 							 UChar *dest, int32_t destlen,
 							 const char *src, int32_t srclen);
+static int32_t icu_to_uchar(UChar **buff_uchar, const char *buff,
+							size_t nbytes);
+static size_t icu_from_uchar(char *dest, size_t destsize,
+							 const UChar *buff_uchar, int32_t len_uchar);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+static int32_t icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+								UChar **buff_dest, UChar *buff_source,
+								int32_t len_source);
+static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+									   const UChar *src, int32_t srcLength,
+									   const char *locale,
+									   UErrorCode *pErrorCode);
 
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
@@ -101,6 +127,11 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static const struct casemap_methods casemap_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+};
 #endif
 
 pg_locale_t
@@ -171,6 +202,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->casemap = &casemap_methods_icu;
 
 	return result;
 #else
@@ -344,6 +376,66 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
+static size_t
+strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToLower, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToTitle_default_BI, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToUpper, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
 /*
  * strncoll_icu_utf8
  *
@@ -467,7 +559,7 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
  * The result string is nul-terminated, though most callers rely on the
  * result length instead.
  */
-int32_t
+static int32_t
 icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
 {
 	int32_t		len_uchar;
@@ -494,8 +586,8 @@ icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
  *
  * The result string is nul-terminated.
  */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+static size_t
+icu_from_uchar(char *dest, size_t destsize, const UChar *buff_uchar, int32_t len_uchar)
 {
 	UErrorCode	status;
 	int32_t		len_result;
@@ -510,10 +602,11 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 				(errmsg("%s failed: %s", "ucnv_fromUChars",
 						u_errorName(status))));
 
-	*result = palloc(len_result + 1);
+	if (len_result + 1 > destsize)
+		return len_result;
 
 	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+	len_result = ucnv_fromUChars(icu_converter, dest, len_result + 1,
 								 buff_uchar, len_uchar, &status);
 	if (U_FAILURE(status) ||
 		status == U_STRING_NOT_TERMINATED_WARNING)
@@ -524,6 +617,43 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	return len_result;
 }
 
+static int32_t
+icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
+{
+	UErrorCode	status;
+	int32_t		len_dest;
+
+	len_dest = len_source;		/* try first with same length */
+	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+	status = U_ZERO_ERROR;
+	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+					mylocale->info.icu.locale, &status);
+	if (status == U_BUFFER_OVERFLOW_ERROR)
+	{
+		/* try again with adjusted length */
+		pfree(*buff_dest);
+		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+		status = U_ZERO_ERROR;
+		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+						mylocale->info.icu.locale, &status);
+	}
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("case conversion failed: %s", u_errorName(status))));
+	return len_dest;
+}
+
+static int32_t
+u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+						const UChar *src, int32_t srcLength,
+						const char *locale,
+						UErrorCode *pErrorCode)
+{
+	return u_strToTitle(dest, destCapacity, src, srcLength,
+						NULL, locale, pErrorCode);
+}
+
 /*
  * strncoll_icu
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index c7be6dd4f9..798903ed2b 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,6 +11,9 @@
 
 #include "postgres.h"
 
+#include <limits.h>
+#include <wctype.h>
+
 #include "access/htup_details.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
@@ -48,6 +51,25 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static size_t strlower_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strlower_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -69,6 +91,279 @@ static const struct collate_methods collate_methods_libc = {
 #endif
 };
 
+#ifdef WIN32
+static const struct collate_methods collate_methods_libc_win32_utf8 = {
+	.strncoll = strncoll_libc_win32_utf8,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+#endif
+
+static const struct casemap_methods casemap_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+};
+
+static const struct casemap_methods casemap_methods_libc_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+};
+
+static size_t
+strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		if (srclen + 1 > destsize)
+			return srclen;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that tolower_l() will not be so broken as to need
+		 * an isupper_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = tolower_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		int			wasalnum = false;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l()/tolower_l() will not be so broken
+		 * as to need guard tests.  When using the default collation, we apply
+		 * the traditional Postgres behavior that forces ASCII-style treatment
+		 * of I/i, but in non-default collations you get exactly what the
+		 * collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (wasalnum)
+				*p = tolower_l((unsigned char) *p, loc);
+			else
+				*p = toupper_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, loc);
+		}
+	}
+
+	return srclen;
+}
+
+static size_t
+strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	int			wasalnum = false;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+	{
+		if (wasalnum)
+			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		else
+			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		wasalnum = iswalnum_l(workspace[curr_char], loc);
+	}
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l() will not be so broken as to need
+		 * an islower_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = toupper_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -133,6 +428,13 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (pg_database_encoding_max_length() > 1)
+			result->casemap = &casemap_methods_libc_mb;
+		else
+			result->casemap = &casemap_methods_libc_sb;
+	}
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 2f05dffcdd..bbc10e0c3d 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -90,6 +90,20 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+/* methods that define string case mapping behavior */
+struct casemap_methods
+{
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -114,6 +128,7 @@ struct pg_locale_struct
 	bool		ctype_is_c;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct casemap_methods *casemap;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -138,6 +153,15 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern size_t pg_strlower(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strtitle(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strupper(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
@@ -157,11 +181,6 @@ extern const char *builtin_validate_locale(int encoding, const char *locale);
 extern void icu_validate_locale(const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
-#ifdef USE_ICU
-extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
-extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
-
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
 						 pg_locale_t locale);
-- 
2.45.2

v7-0005-Control-ctype-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v7-0005-Control-ctype-behavior-with-a-method-table.patchDownload

From 341e3f31d22b5609584cccb8b7def798cc94c983 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 14:30:07 -0700
Subject: [PATCH v7 5/9] Control ctype behavior with a method table.

Previously, ctype behavior (pattern matching) behavior branched based
on the provider.

A method table is less error-prone and easier to hook.
---
 src/backend/regex/regc_pg_locale.c     | 388 +++++--------------------
 src/backend/utils/adt/like.c           |  22 +-
 src/backend/utils/adt/like_support.c   |   7 +-
 src/backend/utils/adt/pg_locale.c      | 101 +++++++
 src/backend/utils/adt/pg_locale_icu.c  |  52 ++++
 src/backend/utils/adt/pg_locale_libc.c | 158 ++++++++++
 src/include/utils/pg_locale.h          |  46 +++
 src/tools/pgindent/typedefs.list       |   1 -
 8 files changed, 448 insertions(+), 327 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index b75784b6ce..e898634fdf 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,33 +63,18 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 static Oid	pg_regex_collation;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
-#define PG_ISDIGIT	0x01
-#define PG_ISALPHA	0x02
-#define PG_ISALNUM	(PG_ISDIGIT | PG_ISALPHA)
-#define PG_ISUPPER	0x04
-#define PG_ISLOWER	0x08
-#define PG_ISGRAPH	0x10
-#define PG_ISPRINT	0x20
-#define PG_ISPUNCT	0x40
-#define PG_ISSPACE	0x80
-
-static const unsigned char pg_char_properties[128] = {
+static const unsigned char char_properties_tbl[128] = {
 	 /* NUL */ 0,
 	 /* ^A */ 0,
 	 /* ^B */ 0,
@@ -232,7 +217,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -253,8 +237,8 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
 		collation = C_COLLATION_OID;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -271,32 +255,11 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
+			locale = &dummy_c_locale;
 			collation = C_COLLATION_OID;
 		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
-		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 	pg_regex_collation = collation;
 }
@@ -304,82 +267,31 @@ pg_set_regex_collation(Oid collation)
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISDIGIT));
+	else
+		return char_properties(c, PG_ISDIGIT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALPHA));
+	else
+		return char_properties(c, PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALNUM));
+	else
+		return char_properties(c, PG_ISDIGIT | PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
@@ -394,219 +306,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISUPPER));
+	else
+		return char_properties(c, PG_ISUPPER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISLOWER));
+	else
+		return char_properties(c, PG_ISLOWER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISGRAPH));
+	else
+		return char_properties(c, PG_ISGRAPH, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPRINT));
+	else
+		return char_properties(c, PG_ISPRINT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPUNCT));
+	else
+		return char_properties(c, PG_ISPUNCT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISSPACE));
+	else
+		return char_properties(c, PG_ISSPACE, pg_regex_locale) != 0;
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -732,37 +512,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 0152723b2a..5b679bcad8 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -96,7 +96,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	if (locale->ctype_is_c)
 		return pg_ascii_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -201,7 +201,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -216,14 +226,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 8b15509a3b..bf718f1a3d 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1498,13 +1498,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f9a8edf3e0..7b99d00304 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -244,6 +244,58 @@ static const struct casemap_methods casemap_methods_builtin = {
 	.strupper = strupper_builtin,
 };
 
+static int
+char_properties_builtin(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && pg_u_isdigit(wc, true))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && pg_u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && pg_u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && pg_u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && pg_u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && pg_u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && pg_u_ispunct(wc, true))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && pg_u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.char_properties = char_properties_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
+
 /*
  * pg_perm_setlocale
  *
@@ -1317,6 +1369,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
 	result->casemap = &casemap_methods_builtin;
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
@@ -1747,6 +1801,53 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_properties()
+ *
+ * Out of the properties specified in the given mask, return a new mask of the
+ * properties true for the given character.
+ */
+int
+char_properties(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	return locale->ctype->char_properties(wc, mask, locale);
+}
+
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 3f4ada38bd..95230179a1 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -109,6 +109,50 @@ static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 									   const char *locale,
 									   UErrorCode *pErrorCode);
 
+static int
+char_properties_icu(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && u_isdigit(wc))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && u_ispunct(wc))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -132,6 +176,13 @@ static const struct casemap_methods casemap_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 };
+
+static const struct ctype_methods ctype_methods_icu = {
+	.char_properties = char_properties_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -203,6 +254,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	else
 		result->collate = &collate_methods_icu;
 	result->casemap = &casemap_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 798903ed2b..d65403af8d 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -70,6 +70,15 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static int	char_properties_libc_1byte(pg_wchar wc, int mask,
+									   pg_locale_t locale);
+static int	char_properties_libc_wide(pg_wchar wc, int mask,
+									  pg_locale_t locale);
+static pg_wchar toupper_libc_1byte(pg_wchar wc, pg_locale_t locale);
+static pg_wchar toupper_libc_wide(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_1byte(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_wide(pg_wchar wc, pg_locale_t locale);
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -104,6 +113,24 @@ static const struct collate_methods collate_methods_libc_win32_utf8 = {
 };
 #endif
 
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
 static const struct casemap_methods casemap_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
@@ -116,6 +143,23 @@ static const struct casemap_methods casemap_methods_libc_mb = {
 	.strupper = strupper_libc_mb,
 };
 
+static const struct ctype_methods ctype_methods_libc_1byte = {
+	.char_properties = char_properties_libc_1byte,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_1byte,
+	.wc_tolower = tolower_libc_1byte,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_wide = {
+	.char_properties = char_properties_libc_wide,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_wide,
+	.wc_tolower = tolower_libc_wide,
+};
+
 static size_t
 strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
@@ -435,6 +479,13 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		else
 			result->casemap = &casemap_methods_libc_sb;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_wide;
+		else
+			result->ctype = &ctype_methods_libc_1byte;
+	}
 
 	return result;
 }
@@ -718,6 +769,113 @@ report_newlocale_failure(const char *localename)
 						localename) : 0)));
 }
 
+static int
+char_properties_libc_1byte(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc > (pg_wchar) UCHAR_MAX)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static int
+char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	/* if wchar_t cannot represent the value, just return 0 */
+	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static pg_wchar
+toupper_libc_1byte(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_wide(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_1byte(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_wide(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index bbc10e0c3d..a5abf48bff 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,10 +12,25 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
 
+/*
+ * Character properties for regular expressions.
+ */
+#define PG_ISDIGIT     0x01
+#define PG_ISALPHA     0x02
+#define PG_ISALNUM     (PG_ISDIGIT | PG_ISALPHA)
+#define PG_ISUPPER     0x04
+#define PG_ISLOWER     0x08
+#define PG_ISGRAPH     0x10
+#define PG_ISPRINT     0x20
+#define PG_ISPUNCT     0x40
+#define PG_ISSPACE     0x80
+
 #ifdef USE_ICU
 /*
  * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
@@ -104,6 +119,32 @@ struct casemap_methods
 							 pg_locale_t locale);
 };
 
+struct ctype_methods
+{
+	/* required */
+	int			(*char_properties) (pg_wchar wc, int mask, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/* required */
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -129,6 +170,7 @@ struct pg_locale_struct
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct casemap_methods *casemap;	/* NULL if ctype_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -153,6 +195,10 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern int	char_properties(pg_wchar wc, int mask, pg_locale_t locale);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dest, size_t destsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57de1acff3..bbc1ac179e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1821,7 +1821,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.45.2

v7-0006-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v7-0006-Remove-provider-field-from-pg_locale_t.patchDownload

From 5bc11b016c5014fbcdd2c8fac8e2cbd1e4f0cde5 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v7 6/9] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale.c      |  1 -
 src/backend/utils/adt/pg_locale_icu.c  | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c |  6 ------
 src/include/utils/pg_locale.h          |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7b99d00304..87a732bd2d 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1364,7 +1364,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 95230179a1..65c120056d 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -245,7 +245,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -503,8 +502,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -532,8 +529,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -578,8 +573,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -730,8 +723,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -780,8 +771,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index d65403af8d..7fbfd2ee91 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -456,7 +456,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -581,8 +580,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -637,8 +634,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -682,7 +677,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a5abf48bff..deb035cfd0 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -163,7 +163,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.45.2

v7-0007-Make-provider-data-in-pg_locale_t-an-opaque-point.patchtext/x-patch; charset=UTF-8; name=v7-0007-Make-provider-data-in-pg_locale_t-an-opaque-point.patchDownload

From 8b5b56787952e395db0595b8cf91e87911430173 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v7 7/9] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale.c      |  10 +-
 src/backend/utils/adt/pg_locale_icu.c  |  40 ++++++--
 src/backend/utils/adt/pg_locale_libc.c | 131 ++++++++++++++++---------
 src/include/utils/pg_locale.h          |  16 +--
 4 files changed, 126 insertions(+), 71 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 87a732bd2d..55b103d4dc 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -128,6 +128,11 @@ static pg_locale_t default_locale = NULL;
 static bool CurrentLocaleConvValid = false;
 static bool CurrentLCTimeValid = false;
 
+struct builtin_provider
+{
+	const char *locale;
+};
+
 /* Cache for collation-related knowledge */
 
 typedef struct
@@ -1331,6 +1336,7 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
 	pg_locale_t result;
+	struct builtin_provider *builtin;
 
 	if (collid == DEFAULT_COLLATION_OID)
 	{
@@ -1362,8 +1368,10 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	builtin_validate_locale(GetDatabaseEncoding(), locstr);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	builtin = MemoryContextAlloc(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	result->provider_data = (void *) builtin;
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 65c120056d..72e7b60e1c 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -39,6 +39,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static int	strncoll_icu(const char *arg1, ssize_t len1,
@@ -192,6 +198,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -243,8 +250,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -501,11 +512,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -529,6 +541,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -542,7 +556,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -573,12 +587,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -669,11 +685,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -681,7 +699,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -723,6 +741,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -745,7 +765,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -771,6 +791,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -790,7 +812,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 7fbfd2ee91..ab6952183c 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -33,6 +34,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -118,17 +124,21 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static const struct casemap_methods casemap_methods_libc_sb = {
@@ -169,7 +179,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -186,7 +196,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = tolower_l((unsigned char) *p, loc);
+			*p = tolower_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -196,7 +206,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -218,7 +229,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -249,7 +260,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -266,10 +277,10 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (wasalnum)
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -280,7 +291,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -305,10 +317,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -340,7 +352,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -354,7 +366,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = toupper_l((unsigned char) *p, loc);
+			*p = toupper_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -364,7 +376,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -386,7 +399,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -414,6 +427,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -452,16 +466,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -580,6 +597,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -610,7 +629,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -634,8 +653,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -644,7 +665,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -677,6 +698,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -721,7 +744,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -768,27 +791,29 @@ char_properties_libc_1byte(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc > (pg_wchar) UCHAR_MAX)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -799,6 +824,8 @@ char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
@@ -806,21 +833,21 @@ char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
 	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -829,10 +856,12 @@ char_properties_libc_wide(pg_wchar wc, int mask, pg_locale_t locale)
 static pg_wchar
 toupper_libc_1byte(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -840,10 +869,12 @@ toupper_libc_1byte(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_wide(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -851,10 +882,12 @@ toupper_libc_wide(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_1byte(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -862,10 +895,12 @@ tolower_libc_1byte(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_wide(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -957,8 +992,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1017,8 +1054,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index deb035cfd0..e8a6e0d364 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -171,21 +171,7 @@ struct pg_locale_struct
 	const struct casemap_methods *casemap;	/* NULL if ctype_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 typedef struct pg_locale_struct *pg_locale_t;
-- 
2.45.2

v7-0008-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v7-0008-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From 0a0408b1ad3cac29033bd8c13c0ac3563e4c01a9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v7 8/9] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  |  4 ++++
 src/backend/utils/adt/formatting.c    |  4 ----
 src/backend/utils/adt/pg_locale.c     |  4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 13 +++++++++++++
 src/backend/utils/adt/varlena.c       |  4 ++++
 src/include/utils/pg_locale.h         | 17 -----------------
 6 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53b6a479aa..afc2330f51 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 6a0571f93e..387009a4a9 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -71,10 +71,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/unicode_case.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 55b103d4dc..c4cf0e05e5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 72e7b60e1c..61019110fb 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,7 +13,20 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
+
+/*
+ * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
+ * (see
+ * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
+ */
+#if U_ICU_VERSION_MAJOR_NUM >= 53
+#define HAVE_UCOL_STRCOLLUTF8 1
+#else
+#undef HAVE_UCOL_STRCOLLUTF8
+#endif
+
 #endif
 
 #include "access/htup_details.h"
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 533bebc1c7..37b3506f06 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e8a6e0d364..cbc045f126 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /*
  * Character properties for regular expressions.
  */
@@ -31,19 +27,6 @@
 #define PG_ISPUNCT     0x40
 #define PG_ISSPACE     0x80
 
-#ifdef USE_ICU
-/*
- * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
- * (see
- * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
- */
-#if U_ICU_VERSION_MAJOR_NUM >= 53
-#define HAVE_UCOL_STRCOLLUTF8 1
-#else
-#undef HAVE_UCOL_STRCOLLUTF8
-#endif
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.45.2

v7-0009-Introduce-hooks-for-creating-custom-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v7-0009-Introduce-hooks-for-creating-custom-pg_locale_t.patchDownload

From e12efc82aab0278c948529e4ce1725123ce390ba Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 16:10:28 -0700
Subject: [PATCH v7 9/9] Introduce hooks for creating custom pg_locale_t.

Now that collation, case mapping, and ctype behavior is controlled
with a method table, we can hook the behavior.

The hooks can provide their own arbitrary method table, which may be
based on a different version of ICU than what Postgres was built with,
or entirely unrelated to ICU/libc.
---
 src/backend/utils/adt/pg_locale.c | 68 +++++++++++++++++++++----------
 src/include/utils/pg_locale.h     | 24 +++++++++++
 src/tools/pgindent/typedefs.list  |  3 ++
 3 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c4cf0e05e5..ca3acd8165 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -98,6 +98,9 @@
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
+create_pg_locale_hook_type create_pg_locale_hook = NULL;
+collation_version_hook_type collation_version_hook = NULL;
+
 /* pg_locale_icu.c */
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
@@ -1395,7 +1398,7 @@ create_pg_locale(Oid collid, MemoryContext context)
 	/* We haven't computed this yet in this session, so do it */
 	HeapTuple	tp;
 	Form_pg_collation collform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 	Datum		datum;
 	bool		isnull;
 
@@ -1404,15 +1407,21 @@ create_pg_locale(Oid collid, MemoryContext context)
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(collid, context);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(collid, context);
+
+	if (result == NULL)
+	{
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(collid, context);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	}
 
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
@@ -1469,7 +1478,7 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 
 	Assert(default_locale == NULL);
 
@@ -1479,18 +1488,25 @@ init_database_collation(void)
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
-										  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
-									  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(DEFAULT_COLLATION_OID,
 									   TopMemoryContext);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+
+	if (result == NULL)
+	{
+		if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+											  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+										   TopMemoryContext);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+	}
 
 	ReleaseSysCache(tup);
 
@@ -1559,6 +1575,14 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collation_version_hook != NULL)
+	{
+		char	   *version;
+
+		if (collation_version_hook(collprovider, collcollate, &version))
+			return version;
+	}
+
 	/*
 	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
 	 * and are not expected to change, but track the version anyway.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cbc045f126..058fdb2c74 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -159,6 +159,30 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+/*
+ * Hooks to enable custom locale providers.
+ */
+
+/*
+ * Hook create_pg_locale(). Return result (allocated in the given context) to
+ * override; or return NULL to return control to create_pg_locale(). When
+ * creating the default database collation, collid is DEFAULT_COLLATION_OID.
+ */
+typedef pg_locale_t (*create_pg_locale_hook_type) (Oid collid,
+												   MemoryContext context);
+
+/*
+ * Hook get_collation_actual_version(). Set *version out parameter and return
+ * true to override; or return false to return control to
+ * get_collation_actual_version().
+ */
+typedef bool (*collation_version_hook_type) (char collprovider,
+											 const char *collcollate,
+											 char **version);
+
+extern PGDLLIMPORT create_pg_locale_hook_type create_pg_locale_hook;
+extern PGDLLIMPORT collation_version_hook_type collation_version_hook;
+
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bbc1ac179e..8a1295e378 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3372,6 +3372,7 @@ cmpEntriesArg
 codes_t
 collation_cache_entry
 collation_cache_hash
+collation_version_hook_type
 color
 colormaprange
 compare_context
@@ -3388,6 +3389,7 @@ core_yyscan_t
 corrupt_items
 cost_qual_eval_context
 cp_hash_func
+create_pg_locale_hook_type
 create_upper_paths_hook_type
 createdb_failure_params
 crosstab_HashEnt
@@ -3399,6 +3401,7 @@ datetkn
 dce_uuid_t
 dclist_head
 decimal
+default_pg_locale_hook_type
 deparse_columns
 deparse_context
 deparse_expr_cxt
-- 
2.45.2

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Andreas Karlsson (#5)

Re: Collation & ctype method table, and extension hooks

On Thu, 2024-10-24 at 10:05 +0200, Andreas Karlsson wrote:

Why is there no pg_locale_builtin.c?

Just that it would be a fairly small file, but I'm fine with doing
that.

I think adding an assert to create_pg_locale() which enforces valid
there is always a combination of ctype_is_c and casemap would be
good,
similar to the collate field.

Good idea.

Why are casemap and ctype_methods not the same struct? They seem very
closely related.

The code impact was in fairly different places, so it seemed like a
nice way to break it out. I could combine them, but it would be a
fairly large patch.

This commit makes me tempted to handle the ctype_is_c logic for
character classes also in callbacks and remove the if in functions
like
pg_wc_ispunct(). But this si something that would need to be
benchmarked.

That's a good idea. The reason collate_is_c is important is because
there are quite a few caller-specific optimizations, but that doesn't
seem to be true of ctype_is_c.

I wonder if the bitmask idea isn't terrible for the branch predictor
and
that me may want one function per character class, but this is yet
again
something we need to benchmark.

Agreed -- a lot of work has gone into optimizing the regex code, and we
don't want a perf regression there. But I'm also not sure exactly which
kinds of tests I should be running for that.

Is there a reason we allocate the icu_provider in
create_pg_locale_icu
with MemoryContextAllocZero when we intialize everything anyway? And
similar for other providers.

Allocating and zeroing is a good defense against new optional methods
and fields which can safely default to zero.

= v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patch

Looks good but seems like a quite painful API to use.

How is it painful and can we make it better?

* Have a CREATE LOCALE PROVIDER command and make "provider" an Oid
rather than a char ('b'/'i'/'c'). The v6 patches brings us close to
this point, but I'm not sure if we want to go this far in v18.

Probably necessary but I hate all the DDL commands the way to SQL
standard is written forces us to add.

There is some precedent for a DDL-like thing without new grammar:
pg_replication_origin_create(). I don't have a strong opinion on
whether to do that or not.

Regards,
Jeff Davis

Andreas Karlsson

andreas@proxel.se

about 1 year ago

In reply to: Jeff Davis (#6)

Re: Collation & ctype method table, and extension hooks

On 10/26/24 12:42 AM, Jeff Davis wrote:

On Thu, 2024-10-24 at 10:05 +0200, Andreas Karlsson wrote:

Why is there no pg_locale_builtin.c?

Just that it would be a fairly small file, but I'm fine with doing
that.

I think adding such a small file would make life easier for people new
to the collation part of the code base. It would be a nice symmetry
between collation providers and where code for them can be found.

Why are casemap and ctype_methods not the same struct? They seem very
closely related.

The code impact was in fairly different places, so it seemed like a
nice way to break it out. I could combine them, but it would be a
fairly large patch.

For me combining them would make the intention of the code easier to
understand since aren't the casemap functions just a set of "ctype_methods"?

This commit makes me tempted to handle the ctype_is_c logic for
character classes also in callbacks and remove the if in functions
like
pg_wc_ispunct(). But this si something that would need to be
benchmarked.

That's a good idea. The reason collate_is_c is important is because
there are quite a few caller-specific optimizations, but that doesn't
seem to be true of ctype_is_c.

Yeah, that was my though too but I have not confirmed it.

I wonder if the bitmask idea isn't terrible for the branch predictor
and
that me may want one function per character class, but this is yet
again
something we need to benchmark.

Agreed -- a lot of work has gone into optimizing the regex code, and we
don't want a perf regression there. But I'm also not sure exactly which
kinds of tests I should be running for that.

I think we should at least try to find the worst case to see how big the
performance hit for that is. And then after that try to figure out a
more typical case benchmark.

= v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patch

Looks good but seems like a quite painful API to use.

How is it painful and can we make it better?

The painful part was mostly just a reference to that without a catalog
table where new providers can be added we would need to add collations
for our new custom provider on some already existing provider and then
do for example some pattern matching on the name of the new collation.
Really ugly but works.

I am thinking of implementing ICU4x as an external extension to try out
the hook, but for the in-core contrib module we likely want to use
something which does not require an external dependency. Or what do you
think?

Andreas

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Andreas Karlsson (#7)

Re: Collation & ctype method table, and extension hooks

On Fri, 2024-11-01 at 14:08 +0100, Andreas Karlsson wrote:

Agreed -- a lot of work has gone into optimizing the regex code,
and we
don't want a perf regression there. But I'm also not sure exactly
which
kinds of tests I should be running for that.

I think we should at least try to find the worst case to see how big
the
performance hit for that is. And then after that try to figure out a
more typical case benchmark.

What I had in mind was:

* a large table with a single ~100KiB text field
* a scan with a case insensitive regex that uses some character
classes

Does that sound like a worst case?

The painful part was mostly just a reference to that without a
catalog
table where new providers can be added we would need to add
collations
for our new custom provider on some already existing provider and
then
do for example some pattern matching on the name of the new
collation.
Really ugly but works.

To add a catalog table for the locale providers, the main challenge is
around the database default collation and, relatedly, initdb. Do you
have some ideas around that?

Regards,
Jeff Davis

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Andreas Karlsson (#7)

7 attachment(s)

Re: Collation & ctype method table, and extension hooks

On Fri, 2024-11-01 at 14:08 +0100, Andreas Karlsson wrote:

I think adding such a small file would make life easier for people
new
to the collation part of the code base. It would be a nice symmetry
between collation providers and where code for them can be found.

Done.

For me combining them would make the intention of the code easier to
understand since aren't the casemap functions just a set of
"ctype_methods"?

Done.

There is a bit of weirdness in libc because:

* Single byte encodings use the single-byte isupper(), toupper(), etc.
* UTF8 encoding uses wide character iswupper(), towupper(), etc.
* Non-UTF8 multibyte encodings use isupper() for pattern matching but
towupper() for case mapping

that weirdness existed before, but it's a bit more obvious what's
happening now.

This commit makes me tempted to handle the ctype_is_c logic for
character classes also in callbacks and remove the if in
functions
like
pg_wc_ispunct(). But this si something that would need to be
benchmarked.

I like this idea, but it can be a follow up.

Attached new patchset.

I also tried some performance tests again. I used smalltext (a table of
10M ~30-character strings) and bigtext (a table of 32768 rows, each
containing the 100KiB source of https://en.wikipedia.org/wiki/Diacritic
). And I then ran the following regex on each:

select count(*) from thetable
where t ~
'[[:digit:]][[:space:]][[:punct:]][[:alpha:]][[:lower:]][[:upper:]]';

for "C", "en_US", and "en-US-x-icu". The timings for smalltext were
indistinguishable between master and the patched version. The timings
for bigtext were pretty noisy so it's hard to tell if there was a
regression or not, but I saw some evidence in the profile that
char_properties has a cost (~1%). I'm not sure if that's a significant
concern or not.

Which API do you think is the right one? Individual functions testing
individual properties, or something like char_properties() that can
test several at once?

Regards,
Jeff Davis

Attachments:

v8-0001-Perform-provider-specific-initialization-code-in-.patchtext/x-patch; charset=UTF-8; name=v8-0001-Perform-provider-specific-initialization-code-in-.patchDownload

From 99f7ab6ccaf6a23e1fccd6eb0c20f2292088155d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 15:49:32 -0700
Subject: [PATCH v8 1/7] Perform provider-specific initialization code in new
 functions.

---
 src/backend/utils/adt/Makefile            |   1 +
 src/backend/utils/adt/meson.build         |   1 +
 src/backend/utils/adt/pg_locale.c         | 155 +++-------------------
 src/backend/utils/adt/pg_locale_builtin.c |  70 ++++++++++
 src/backend/utils/adt/pg_locale_icu.c     |  97 +++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c    |  74 ++++++++++-
 6 files changed, 254 insertions(+), 144 deletions(-)
 create mode 100644 src/backend/utils/adt/pg_locale_builtin.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 85e5eaf32e..35e8c01aab 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -79,6 +79,7 @@ OBJS = \
 	orderedsetaggs.o \
 	partitionfuncs.o \
 	pg_locale.o \
+	pg_locale_builtin.o \
 	pg_locale_icu.o \
 	pg_locale_libc.o \
 	pg_lsn.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index f73f294b8f..e86d6dc8e0 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -66,6 +66,7 @@ backend_sources += files(
   'orderedsetaggs.c',
   'partitionfuncs.c',
   'pg_locale.c',
+  'pg_locale_builtin.c',
   'pg_locale_icu.c',
   'pg_locale_libc.c',
   'pg_lsn.c',
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index d4e89663ec..ec5f509c4e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -92,8 +92,6 @@
 /* pg_locale_icu.c */
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -106,8 +104,9 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 #endif
 
 /* pg_locale_libc.c */
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
+extern pg_locale_t create_pg_locale_builtin(Oid collid, MemoryContext context);
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -138,7 +137,7 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
-static struct pg_locale_struct default_locale;
+static pg_locale_t default_locale = NULL;
 
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
@@ -1213,7 +1212,6 @@ IsoLocaleName(const char *winlocname)
 
 #endif							/* WIN32 && LC_MESSAGES */
 
-
 /*
  * Create a new pg_locale_t struct for the given collation oid.
  */
@@ -1226,75 +1224,17 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Datum		datum;
 	bool		isnull;
 
-	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	result->provider = collform->collprovider;
-	result->deterministic = collform->collisdeterministic;
-
 	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-	{
-		const char *locstr;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		locstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = true;
-		result->ctype_is_c = (strcmp(locstr, "C") == 0);
-
-		builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-		result->info.builtin.locale = MemoryContextStrdup(context,
-														  locstr);
-	}
+		result = create_pg_locale_builtin(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		const char *iculocstr;
-		const char *icurules;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		iculocstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = false;
-		result->ctype_is_c = false;
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_LIBC)
-	{
-		const char *collcollate;
-		const char *collctype;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-		collctype = TextDatumGetCString(datum);
-
-		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
-			(strcmp(collcollate, "POSIX") == 0);
-		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
-			(strcmp(collctype, "POSIX") == 0);
-
-		result->info.lt = make_libc_collator(collcollate, collctype);
-	}
+		result = create_pg_locale_libc(collid, context);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
@@ -1354,7 +1294,9 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	Datum		datum;
+	pg_locale_t result;
+
+	Assert(default_locale == NULL);
 
 	/* Fetch our pg_database row normally, via syscache */
 	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
@@ -1363,80 +1305,21 @@ init_database_collation(void)
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
 	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-	{
-		char	   *datlocale;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		builtin_validate_locale(dbform->encoding, datlocale);
-
-		default_locale.collate_is_c = true;
-		default_locale.ctype_is_c = (strcmp(datlocale, "C") == 0);
-
-		default_locale.info.builtin.locale = MemoryContextStrdup(
-																 TopMemoryContext, datlocale);
-	}
+		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		char	   *datlocale;
-		char	   *icurules;
-		bool		isnull;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = false;
-		default_locale.ctype_is_c = false;
-
-		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		default_locale.info.icu.locale = MemoryContextStrdup(TopMemoryContext, datlocale);
-		default_locale.info.icu.ucol = make_icu_collator(datlocale, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+									  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-	{
-		const char *datcollate;
-		const char *datctype;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datcollate);
-		datcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
-		datctype = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = (strcmp(datcollate, "C") == 0) ||
-			(strcmp(datcollate, "POSIX") == 0);
-		default_locale.ctype_is_c = (strcmp(datctype, "C") == 0) ||
-			(strcmp(datctype, "POSIX") == 0);
-
-		default_locale.info.lt = make_libc_collator(datcollate, datctype);
-	}
+		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+									   TopMemoryContext);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
 
-
-	default_locale.provider = dbform->datlocprovider;
-
-	/*
-	 * Default locale is currently always deterministic.  Nondeterministic
-	 * locales currently don't support pattern matching, which would break a
-	 * lot of things if applied globally.
-	 */
-	default_locale.deterministic = true;
-
 	ReleaseSysCache(tup);
+
+	default_locale = result;
 }
 
 /*
@@ -1454,7 +1337,7 @@ pg_newlocale_from_collation(Oid collid)
 	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return &default_locale;
+		return default_locale;
 
 	if (!OidIsValid(collid))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
new file mode 100644
index 0000000000..4246971a4d
--- /dev/null
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -0,0 +1,70 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL locale utilities for builtin provider
+ *
+ * Portions Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * src/backend/utils/adt/pg_locale_builtin.c
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_database.h"
+#include "catalog/pg_collation.h"
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/pg_locale.h"
+#include "utils/syscache.h"
+
+extern pg_locale_t create_pg_locale_builtin(Oid collid,
+											MemoryContext context);
+
+pg_locale_t
+create_pg_locale_builtin(Oid collid, MemoryContext context)
+{
+	const char *locstr;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+
+	builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	result->provider = COLLPROVIDER_BUILTIN;
+	result->deterministic = true;
+	result->collate_is_c = true;
+	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+	return result;
+}
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2a87e25dfb..73eb430d75 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -12,14 +12,20 @@
 #include "postgres.h"
 
 #ifdef USE_ICU
-
 #include <unicode/ucnv.h>
 #include <unicode/ustring.h>
+#endif
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
@@ -29,9 +35,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+
+#ifdef USE_ICU
+
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -49,6 +57,8 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
  */
 static UConverter *icu_converter = NULL;
 
+static UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
 static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 								 const char *arg2, ssize_t len2,
 								 pg_locale_t locale);
@@ -63,6 +73,85 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+#endif
+
+pg_locale_t
+create_pg_locale_icu(Oid collid, MemoryContext context)
+{
+#ifdef USE_ICU
+	bool		deterministic;
+	const char *iculocstr;
+	const char *icurules = NULL;
+	UCollator  *collator;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+
+		/* default database collation is always deterministic */
+		deterministic = true;
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(DATABASEOID, tp,
+								Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		Form_pg_collation collform;
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		collform = (Form_pg_collation) GETSTRUCT(tp);
+		deterministic = collform->collisdeterministic;
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(COLLOID, tp,
+								Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+	collator = make_icu_collator(iculocstr, icurules);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+	result->info.icu.ucol = collator;
+	result->provider = COLLPROVIDER_ICU;
+	result->deterministic = deterministic;
+	result->collate_is_c = false;
+	result->ctype_is_c = false;
+
+	return result;
+#else
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build")));
+
+	return NULL;
+#endif
+}
+
+#ifdef USE_ICU
 
 /*
  * Wrapper around ucol_open() to handle API differences for older ICU
@@ -160,7 +249,7 @@ pg_ucol_open(const char *loc_str)
  *
  * Ensure that no path leaks a UCollator.
  */
-UCollator *
+static UCollator *
 make_icu_collator(const char *iculocstr, const char *icurules)
 {
 	if (!icurules)
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 83f310fc71..374ac37ba0 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,10 +11,16 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
@@ -24,15 +30,16 @@
  */
 #define		TEXTBUFLEN			1024
 
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
+
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
-
+static locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
 static void report_newlocale_failure(const char *localename);
 
 #ifdef WIN32
@@ -41,6 +48,65 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+pg_locale_t
+create_pg_locale_libc(Oid collid, MemoryContext context)
+{
+	const char *collate;
+	const char *ctype;
+	locale_t	loc;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+
+	loc = make_libc_collator(collate, ctype);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->provider = COLLPROVIDER_LIBC;
+	result->deterministic = true;
+	result->collate_is_c = (strcmp(collate, "C") == 0) ||
+		(strcmp(collate, "POSIX") == 0);
+	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
+		(strcmp(ctype, "POSIX") == 0);
+	result->info.lt = loc;
+
+	return result;
+}
+
 /*
  * Create a locale_t with the given collation and ctype.
  *
@@ -49,7 +115,7 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
  *
  * Ensure that no path leaks a locale_t.
  */
-locale_t
+static locale_t
 make_libc_collator(const char *collate, const char *ctype)
 {
 	locale_t	loc = 0;
-- 
2.34.1

v8-0002-Control-collation-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v8-0002-Control-collation-behavior-with-a-method-table.patchDownload

From eefd65d1d05111cd12a93902e8acf009d2f4c39f Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 11:27:29 -0700
Subject: [PATCH v8 2/7] Control collation behavior with a method table.

Previously, behavior branched based on the provider.

A method table is less error prone and easier to hook.
---
 src/backend/utils/adt/pg_locale.c      | 123 +++------------------
 src/backend/utils/adt/pg_locale_icu.c  | 147 +++++++++++++++----------
 src/backend/utils/adt/pg_locale_libc.c |  40 +++++--
 src/include/utils/pg_locale.h          |  33 ++++++
 4 files changed, 167 insertions(+), 176 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index ec5f509c4e..00eca68717 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -92,27 +92,12 @@
 /* pg_locale_icu.c */
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
-						 const char *arg2, ssize_t len2,
-						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
-						   const char *src, ssize_t srclen,
-						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
-								  const char *src, ssize_t srclen,
-								  pg_locale_t locale);
 #endif
 
 /* pg_locale_libc.c */
 extern pg_locale_t create_pg_locale_builtin(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
-						  const char *arg2, ssize_t len2,
-						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale);
 
 /* GUC settings */
 char	   *locale_messages;
@@ -1239,6 +1224,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
 
+	Assert((result->collate_is_c && result->collate == NULL) ||
+		   (!result->collate_is_c && result->collate != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1490,19 +1478,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 int
 pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, -1, arg2, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, -1, arg2, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, -1, arg2, -1, locale);
 }
 
 /*
@@ -1523,51 +1499,25 @@ int
 pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 			pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, len1, arg2, len2, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, len1, arg2, len2, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, len1, arg2, len2, locale);
 }
 
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
  *
- * Unfortunately, it seems that strxfrm() for non-C collations is broken on
- * many common platforms; testing of multiple versions of glibc reveals that,
- * for many locales, strcoll() and strxfrm() do not return consistent
- * results. While no other libc other than Cygwin has so far been shown to
- * have a problem, we take the conservative course of action for right now and
- * disable this categorically.  (Users who are certain this isn't a problem on
- * their system can define TRUST_STRXFRM.)
  *
  * No similar problem is known for the ICU provider.
  */
 bool
 pg_strxfrm_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-#ifdef TRUST_STRXFRM
-		return true;
-#else
-		return false;
-#endif
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	/*
+	 * locale->collate->strnxfrm is still a required method, even if it may
+	 * have the wrong behavior, because the planner uses it for estimates in
+	 * some cases.
+	 */
+	return locale->collate->strxfrm_is_safe;
 }
 
 /*
@@ -1578,19 +1528,7 @@ pg_strxfrm_enabled(pg_locale_t locale)
 size_t
 pg_strxfrm(char *dest, const char *src, size_t destsize, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1616,19 +1554,7 @@ size_t
 pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, srclen, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, srclen, locale);
 }
 
 /*
@@ -1638,15 +1564,7 @@ pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 bool
 pg_strxfrm_prefix_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-		return false;
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	return (locale->collate->strnxfrm_prefix != NULL);
 }
 
 /*
@@ -1658,7 +1576,7 @@ size_t
 pg_strxfrm_prefix(char *dest, const char *src, size_t destsize,
 				  pg_locale_t locale)
 {
-	return pg_strnxfrm_prefix(dest, destsize, src, -1, locale);
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1683,16 +1601,7 @@ size_t
 pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 				   ssize_t srclen, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-#ifdef USE_ICU
-	if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_prefix_icu(dest, destsize, src, -1, locale);
-	else
-#endif
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 73eb430d75..11ec9d4e4b 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -40,13 +40,14 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
+
+static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
+static size_t strnxfrm_icu(char *dest, size_t destsize,
 						   const char *src, ssize_t srclen,
 						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
@@ -59,12 +60,20 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
-static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-								 const char *arg2, ssize_t len2,
-								 pg_locale_t locale);
-static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-										  const char *src, ssize_t srclen,
-										  pg_locale_t locale);
+static int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+#ifdef HAVE_UCOL_STRCOLLUTF8
+static int	strncoll_icu_utf8(const char *arg1, ssize_t len1,
+							  const char *arg2, ssize_t len2,
+							  pg_locale_t locale);
+#endif
+static size_t strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+									   const char *src, ssize_t srclen,
+									   pg_locale_t locale);
 static void init_icu_converter(void);
 static size_t uchar_length(UConverter *converter,
 						   const char *str, int32_t len);
@@ -73,6 +82,25 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+
+static const struct collate_methods collate_methods_icu = {
+	.strncoll = strncoll_icu,
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu,
+	.strxfrm_is_safe = true,
+};
+
+static const struct collate_methods collate_methods_icu_utf8 = {
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	.strncoll = strncoll_icu_utf8,
+#else
+	.strncoll = strncoll_icu,
+#endif
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu_utf8,
+	.strxfrm_is_safe = true,
+};
+
 #endif
 
 pg_locale_t
@@ -139,6 +167,10 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
+	if (GetDatabaseEncoding() == PG_UTF8)
+		result->collate = &collate_methods_icu_utf8;
+	else
+		result->collate = &collate_methods_icu;
 
 	return result;
 #else
@@ -313,42 +345,36 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 }
 
 /*
- * strncoll_icu
+ * strncoll_icu_utf8
  *
  * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
  * database encoding. An argument length of -1 means the string is
  * NUL-terminated.
  */
+#ifdef HAVE_UCOL_STRCOLLUTF8
 int
-strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			 pg_locale_t locale)
+strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+				  pg_locale_t locale)
 {
 	int			result;
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		status = U_ZERO_ERROR;
-		result = ucol_strcollUTF8(locale->info.icu.ucol,
-								  arg1, len1,
-								  arg2, len2,
-								  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("collation failed: %s", u_errorName(status))));
-	}
-	else
-#endif
-	{
-		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
-	}
+	status = U_ZERO_ERROR;
+	result = ucol_strcollUTF8(locale->info.icu.ucol,
+							  arg1, len1,
+							  arg2, len2,
+							  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("collation failed: %s", u_errorName(status))));
 
 	return result;
 }
+#endif
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
@@ -399,37 +425,32 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
-strnxfrm_prefix_icu(char *dest, size_t destsize,
-					const char *src, ssize_t srclen,
-					pg_locale_t locale)
+strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+						 const char *src, ssize_t srclen,
+						 pg_locale_t locale)
 {
 	size_t		result;
+	UCharIterator iter;
+	uint32_t	state[2];
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UCharIterator iter;
-		uint32_t	state[2];
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		uiter_setUTF8(&iter, src, srclen);
-		state[0] = state[1] = 0;	/* won't need that again */
-		status = U_ZERO_ERROR;
-		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
-									  &iter,
-									  state,
-									  (uint8_t *) dest,
-									  destsize,
-									  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("sort key generation failed: %s",
-							u_errorName(status))));
-	}
-	else
-		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
-											 locale);
+	uiter_setUTF8(&iter, src, srclen);
+	state[0] = state[1] = 0;	/* won't need that again */
+	status = U_ZERO_ERROR;
+	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+								  &iter,
+								  state,
+								  (uint8_t *) dest,
+								  destsize,
+								  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("sort key generation failed: %s",
+						u_errorName(status))));
 
 	return result;
 }
@@ -504,7 +525,7 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 }
 
 /*
- * strncoll_icu_no_utf8
+ * strncoll_icu
  *
  * Convert the arguments from the database encoding to UChar strings, then
  * call ucol_strcoll(). An argument length of -1 means that the string is
@@ -514,8 +535,8 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
  * caller should call that instead.
  */
 static int
-strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-					 const char *arg2, ssize_t len2, pg_locale_t locale)
+strncoll_icu(const char *arg1, ssize_t len1,
+			 const char *arg2, ssize_t len2, pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -528,6 +549,8 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 	int			result;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 #endif
@@ -561,9 +584,9 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 static size_t
-strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale)
+strnxfrm_prefix_icu(char *dest, size_t destsize,
+					const char *src, ssize_t srclen,
+					pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -576,6 +599,8 @@ strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
 	Size		result_bsize;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	init_icu_converter();
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 374ac37ba0..c7be6dd4f9 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -32,10 +32,10 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
+static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
+static size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 static locale_t make_libc_collator(const char *collate,
@@ -48,6 +48,27 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static const struct collate_methods collate_methods_libc = {
+	.strncoll = strncoll_libc,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+
+	/*
+	 * Unfortunately, it seems that strxfrm() for non-C collations is broken
+	 * on many common platforms; testing of multiple versions of glibc reveals
+	 * that, for many locales, strcoll() and strxfrm() do not return
+	 * consistent results. While no other libc other than Cygwin has so far
+	 * been shown to have a problem, we take the conservative course of action
+	 * for right now and disable this categorically.  (Users who are certain
+	 * this isn't a problem on their system can define TRUST_STRXFRM.)
+	 */
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -103,6 +124,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
 	result->info.lt = loc;
+	if (!result->collate_is_c)
+	{
+#ifdef WIN32
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->collate = &collate_methods_libc_win32_utf8;
+		else
+#endif
+			result->collate = &collate_methods_libc;
+	}
 
 	return result;
 }
@@ -200,12 +230,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 
 	Assert(locale->provider == COLLPROVIDER_LIBC);
 
-#ifdef WIN32
-	/* check for this case before doing the work for nul-termination */
-	if (GetDatabaseEncoding() == PG_UTF8)
-		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
-#endif							/* WIN32 */
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 37ecf95193..2f05dffcdd 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -60,6 +60,36 @@ extern struct lconv *PGLC_localeconv(void);
 extern void cache_locale_time(void);
 
 
+struct pg_locale_struct;
+typedef struct pg_locale_struct *pg_locale_t;
+
+/* methods that define collation behavior */
+struct collate_methods
+{
+	/* required */
+	int			(*strncoll) (const char *arg1, ssize_t len1,
+							 const char *arg2, ssize_t len2,
+							 pg_locale_t locale);
+
+	/* required */
+	size_t		(*strnxfrm) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* optional */
+	size_t		(*strnxfrm_prefix) (char *dest, size_t destsize,
+									const char *src, ssize_t srclen,
+									pg_locale_t locale);
+
+	/*
+	 * If the strnxfrm method is not trusted to return the correct results,
+	 * set strxfrm_is_safe to false. It set to false, the method will not be
+	 * used in most cases, but the planner still expects it to be there for
+	 * estimation purposes (where incorrect results are acceptable).
+	 */
+	bool		strxfrm_is_safe;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -82,6 +112,9 @@ struct pg_locale_struct
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
+
+	const struct collate_methods *collate;	/* NULL if collate_is_c */
+
 	union
 	{
 		struct
-- 
2.34.1

v8-0003-Control-ctype-behavior-internally-with-a-method-t.patchtext/x-patch; charset=UTF-8; name=v8-0003-Control-ctype-behavior-internally-with-a-method-t.patchDownload

From bfd26c4c54cf1a1b430dcdd9188cacb38b30b7f6 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 12:12:51 -0700
Subject: [PATCH v8 3/7] Control ctype behavior internally with a method table.

Previously, pattern matching and case mapping behavior branched based
on the provider.

Refactor to use a method table, which is less error-prone and easier
to hook.
---
 src/backend/regex/regc_pg_locale.c        | 388 ++++--------------
 src/backend/utils/adt/formatting.c        | 445 +++------------------
 src/backend/utils/adt/like.c              |  22 +-
 src/backend/utils/adt/like_support.c      |   7 +-
 src/backend/utils/adt/pg_locale.c         |  71 ++++
 src/backend/utils/adt/pg_locale_builtin.c | 129 ++++++
 src/backend/utils/adt/pg_locale_icu.c     | 188 ++++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 465 ++++++++++++++++++++++
 src/include/utils/pg_locale.h             |  71 +++-
 src/tools/pgindent/typedefs.list          |   1 -
 10 files changed, 1062 insertions(+), 725 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index b75784b6ce..e898634fdf 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,33 +63,18 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 static Oid	pg_regex_collation;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
-#define PG_ISDIGIT	0x01
-#define PG_ISALPHA	0x02
-#define PG_ISALNUM	(PG_ISDIGIT | PG_ISALPHA)
-#define PG_ISUPPER	0x04
-#define PG_ISLOWER	0x08
-#define PG_ISGRAPH	0x10
-#define PG_ISPRINT	0x20
-#define PG_ISPUNCT	0x40
-#define PG_ISSPACE	0x80
-
-static const unsigned char pg_char_properties[128] = {
+static const unsigned char char_properties_tbl[128] = {
 	 /* NUL */ 0,
 	 /* ^A */ 0,
 	 /* ^B */ 0,
@@ -232,7 +217,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -253,8 +237,8 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
 		collation = C_COLLATION_OID;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -271,32 +255,11 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
+			locale = &dummy_c_locale;
 			collation = C_COLLATION_OID;
 		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
-		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 	pg_regex_collation = collation;
 }
@@ -304,82 +267,31 @@ pg_set_regex_collation(Oid collation)
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISDIGIT));
+	else
+		return char_properties(c, PG_ISDIGIT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALPHA));
+	else
+		return char_properties(c, PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALNUM));
+	else
+		return char_properties(c, PG_ISDIGIT | PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
@@ -394,219 +306,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISUPPER));
+	else
+		return char_properties(c, PG_ISUPPER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISLOWER));
+	else
+		return char_properties(c, PG_ISLOWER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISGRAPH));
+	else
+		return char_properties(c, PG_ISGRAPH, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPRINT));
+	else
+		return char_properties(c, PG_ISPRINT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPUNCT));
+	else
+		return char_properties(c, PG_ISPUNCT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISSPACE));
+	else
+		return char_properties(c, PG_ISSPACE, pg_regex_locale) != 0;
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -732,37 +512,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 85a7dd4561..6a0571f93e 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1570,52 +1570,6 @@ str_numth(char *dest, char *num, int type)
  *			upper/lower/initcap functions
  *****************************************************************************/
 
-#ifdef USE_ICU
-
-typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
-									 const UChar *src, int32_t srcLength,
-									 const char *locale,
-									 UErrorCode *pErrorCode);
-
-static int32_t
-icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
-				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
-{
-	UErrorCode	status;
-	int32_t		len_dest;
-
-	len_dest = len_source;		/* try first with same length */
-	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-	status = U_ZERO_ERROR;
-	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
-	if (status == U_BUFFER_OVERFLOW_ERROR)
-	{
-		/* try again with adjusted length */
-		pfree(*buff_dest);
-		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-		status = U_ZERO_ERROR;
-		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
-	}
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("case conversion failed: %s", u_errorName(status))));
-	return len_dest;
-}
-
-static int32_t
-u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
-						const UChar *src, int32_t srcLength,
-						const char *locale,
-						UErrorCode *pErrorCode)
-{
-	return u_strToTitle(dest, destCapacity, src, srcLength,
-						NULL, locale, pErrorCode);
-}
-
-#endif							/* USE_ICU */
-
 /*
  * If the system provides the needed functions for wide-character manipulation
  * (which are all standardized by C99), then we implement upper/lower/initcap
@@ -1663,101 +1617,28 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar;
-			int32_t		len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToLower, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strlower(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strlower(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that tolower_l() will not be so broken as
-				 * to need an isupper_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
@@ -1800,147 +1681,33 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToUpper, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
-		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strupper(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strupper(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l() will not be so broken as
-				 * to need an islower_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
 }
 
-struct WordBoundaryState
-{
-	const char *str;
-	size_t		len;
-	size_t		offset;
-	bool		init;
-	bool		prev_alnum;
-};
-
-/*
- * Simple word boundary iterator that draws boundaries each time the result of
- * pg_u_isalnum() changes.
- */
-static size_t
-initcap_wbnext(void *state)
-{
-	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
-
-	while (wbstate->offset < wbstate->len &&
-		   wbstate->str[wbstate->offset] != '\0')
-	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
-										wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
-
-		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
-		{
-			size_t		prev_offset = wbstate->offset;
-
-			wbstate->init = true;
-			wbstate->offset += unicode_utf8len(u);
-			wbstate->prev_alnum = curr_alnum;
-			return prev_offset;
-		}
-
-		wbstate->offset += unicode_utf8len(u);
-	}
-
-	return wbstate->len;
-}
-
 /*
  * collation-aware, wide-character-aware initcap function
  *
@@ -1951,7 +1718,6 @@ char *
 str_initcap(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
-	int			wasalnum = false;
 	pg_locale_t mylocale;
 
 	if (!buff)
@@ -1979,125 +1745,28 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToTitle_default_BI, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-			struct WordBoundaryState wbstate = {
-				.str = src,
-				.len = srclen,
-				.offset = 0,
-				.init = false,
-				.prev_alnum = false,
-			};
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strtitle(dst, dstsize, src, srclen,
-									  initcap_wbnext, &wbstate);
-			if (needed + 1 > dstsize)
-			{
-				/* reset iterator */
-				wbstate.offset = 0;
-				wbstate.init = false;
-
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strtitle(dst, dstsize, src, srclen,
-										  initcap_wbnext, &wbstate);
-				Assert(needed + 1 == dstsize);
-			}
-
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-				{
-					if (wasalnum)
-						workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
-					else
-						workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-					wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
-				}
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
 
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l()/tolower_l() will not be so
-				 * broken as to need guard tests.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (wasalnum)
-						*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-					else
-						*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-					wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 0152723b2a..5b679bcad8 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -96,7 +96,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	if (locale->ctype_is_c)
 		return pg_ascii_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -201,7 +201,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -216,14 +226,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 8b15509a3b..bf718f1a3d 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1498,13 +1498,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 00eca68717..2ebe4c00bf 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1227,6 +1227,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
 
+	Assert((result->ctype_is_c && result->ctype == NULL) ||
+		   (!result->ctype_is_c && result->ctype != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1470,6 +1473,27 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+size_t
+pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
+}
+
 /*
  * pg_strcoll
  *
@@ -1604,6 +1628,53 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_properties()
+ *
+ * Out of the properties specified in the given mask, return a new mask of the
+ * properties true for the given character.
+ */
+int
+char_properties(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	return locale->ctype->char_properties(wc, mask, locale);
+}
+
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 4246971a4d..5f90355355 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -13,6 +13,8 @@
 
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
+#include "common/unicode_case.h"
+#include "common/unicode_category.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -23,6 +25,131 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 
+struct WordBoundaryState
+{
+	const char *str;
+	size_t		len;
+	size_t		offset;
+	bool		init;
+	bool		prev_alnum;
+};
+
+/*
+ * Simple word boundary iterator that draws boundaries each time the result of
+ * pg_u_isalnum() changes.
+ */
+static size_t
+initcap_wbnext(void *state)
+{
+	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
+
+	while (wbstate->offset < wbstate->len &&
+		   wbstate->str[wbstate->offset] != '\0')
+	{
+		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+										wbstate->offset);
+		bool		curr_alnum = pg_u_isalnum(u, true);
+
+		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
+		{
+			size_t		prev_offset = wbstate->offset;
+
+			wbstate->init = true;
+			wbstate->offset += unicode_utf8len(u);
+			wbstate->prev_alnum = curr_alnum;
+			return prev_offset;
+		}
+
+		wbstate->offset += unicode_utf8len(u);
+	}
+
+	return wbstate->len;
+}
+
+static size_t
+strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strlower(dest, destsize, src, srclen);
+}
+
+static size_t
+strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	struct WordBoundaryState wbstate = {
+		.str = src,
+		.len = srclen,
+		.offset = 0,
+		.init = false,
+		.prev_alnum = false,
+	};
+
+	return unicode_strtitle(dest, destsize, src, srclen,
+							initcap_wbnext, &wbstate);
+}
+
+static size_t
+strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strupper(dest, destsize, src, srclen);
+}
+
+static int
+char_properties_builtin(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && pg_u_isdigit(wc, true))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && pg_u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && pg_u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && pg_u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && pg_u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && pg_u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && pg_u_ispunct(wc, true))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && pg_u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+	.char_properties = char_properties_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
@@ -65,6 +192,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 11ec9d4e4b..5251163b1b 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,11 @@ static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 
+typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
+									 const UChar *src, int32_t srcLength,
+									 const char *locale,
+									 UErrorCode *pErrorCode);
+
 /*
  * Converter object for converting between ICU's UChar strings and C strings
  * in database encoding.  Since the database encoding doesn't change, we only
@@ -60,6 +65,16 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
+
+static size_t strlower_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strtitle_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strupper_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -80,8 +95,63 @@ static size_t uchar_length(UConverter *converter,
 static int32_t uchar_convert(UConverter *converter,
 							 UChar *dest, int32_t destlen,
 							 const char *src, int32_t srclen);
+static int32_t icu_to_uchar(UChar **buff_uchar, const char *buff,
+							size_t nbytes);
+static size_t icu_from_uchar(char *dest, size_t destsize,
+							 const UChar *buff_uchar, int32_t len_uchar);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+static int32_t icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+								UChar **buff_dest, UChar *buff_source,
+								int32_t len_source);
+static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+									   const UChar *src, int32_t srcLength,
+									   const char *locale,
+									   UErrorCode *pErrorCode);
+
+static int
+char_properties_icu(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && u_isdigit(wc))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && u_ispunct(wc))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
 
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
@@ -101,6 +171,15 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static const struct ctype_methods ctype_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+	.char_properties = char_properties_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -171,6 +250,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -344,6 +424,66 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
+static size_t
+strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToLower, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToTitle_default_BI, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToUpper, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
 /*
  * strncoll_icu_utf8
  *
@@ -467,7 +607,7 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
  * The result string is nul-terminated, though most callers rely on the
  * result length instead.
  */
-int32_t
+static int32_t
 icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
 {
 	int32_t		len_uchar;
@@ -494,8 +634,8 @@ icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
  *
  * The result string is nul-terminated.
  */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+static size_t
+icu_from_uchar(char *dest, size_t destsize, const UChar *buff_uchar, int32_t len_uchar)
 {
 	UErrorCode	status;
 	int32_t		len_result;
@@ -510,10 +650,11 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 				(errmsg("%s failed: %s", "ucnv_fromUChars",
 						u_errorName(status))));
 
-	*result = palloc(len_result + 1);
+	if (len_result + 1 > destsize)
+		return len_result;
 
 	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+	len_result = ucnv_fromUChars(icu_converter, dest, len_result + 1,
 								 buff_uchar, len_uchar, &status);
 	if (U_FAILURE(status) ||
 		status == U_STRING_NOT_TERMINATED_WARNING)
@@ -524,6 +665,43 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	return len_result;
 }
 
+static int32_t
+icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
+{
+	UErrorCode	status;
+	int32_t		len_dest;
+
+	len_dest = len_source;		/* try first with same length */
+	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+	status = U_ZERO_ERROR;
+	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+					mylocale->info.icu.locale, &status);
+	if (status == U_BUFFER_OVERFLOW_ERROR)
+	{
+		/* try again with adjusted length */
+		pfree(*buff_dest);
+		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+		status = U_ZERO_ERROR;
+		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+						mylocale->info.icu.locale, &status);
+	}
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("case conversion failed: %s", u_errorName(status))));
+	return len_dest;
+}
+
+static int32_t
+u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+						const UChar *src, int32_t srcLength,
+						const char *locale,
+						UErrorCode *pErrorCode)
+{
+	return u_strToTitle(dest, destCapacity, src, srcLength,
+						NULL, locale, pErrorCode);
+}
+
 /*
  * strncoll_icu
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index c7be6dd4f9..861c4f68e4 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,6 +11,9 @@
 
 #include "postgres.h"
 
+#include <limits.h>
+#include <wctype.h>
+
 #include "access/htup_details.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
@@ -48,6 +51,34 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static size_t strlower_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strlower_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+
+static int	char_properties_libc_sb(pg_wchar wc, int mask,
+									   pg_locale_t locale);
+static int	char_properties_libc_mb(pg_wchar wc, int mask,
+									  pg_locale_t locale);
+static pg_wchar toupper_libc_sb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar toupper_libc_mb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_sb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_mb(pg_wchar wc, pg_locale_t locale);
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -69,6 +100,324 @@ static const struct collate_methods collate_methods_libc = {
 #endif
 };
 
+#ifdef WIN32
+static const struct collate_methods collate_methods_libc_win32_utf8 = {
+	.strncoll = strncoll_libc_win32_utf8,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+#endif
+
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
+static const struct ctype_methods ctype_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+	.char_properties = char_properties_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+/*
+ * Non-UTF8 multibyte encodings use multibyte semantics for case mapping, but
+ * single-byte semantics for pattern matching.
+ */
+static const struct ctype_methods ctype_methods_libc_other_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.char_properties = char_properties_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_utf8 = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.char_properties = char_properties_libc_mb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_mb,
+	.wc_tolower = tolower_libc_mb,
+};
+
+static size_t
+strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		if (srclen + 1 > destsize)
+			return srclen;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that tolower_l() will not be so broken as to need
+		 * an isupper_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = tolower_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		int			wasalnum = false;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l()/tolower_l() will not be so broken
+		 * as to need guard tests.  When using the default collation, we apply
+		 * the traditional Postgres behavior that forces ASCII-style treatment
+		 * of I/i, but in non-default collations you get exactly what the
+		 * collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (wasalnum)
+				*p = tolower_l((unsigned char) *p, loc);
+			else
+				*p = toupper_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, loc);
+		}
+	}
+
+	return srclen;
+}
+
+static size_t
+strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	int			wasalnum = false;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+	{
+		if (wasalnum)
+			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		else
+			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		wasalnum = iswalnum_l(workspace[curr_char], loc);
+	}
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l() will not be so broken as to need
+		 * an islower_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = toupper_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -133,6 +482,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_utf8;
+		else if (pg_database_encoding_max_length() > 1)
+			result->ctype = &ctype_methods_libc_other_mb;
+		else
+			result->ctype = &ctype_methods_libc_sb;
+	}
 
 	return result;
 }
@@ -416,6 +774,113 @@ report_newlocale_failure(const char *localename)
 						localename) : 0)));
 }
 
+static int
+char_properties_libc_sb(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc > (pg_wchar) UCHAR_MAX)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static int
+char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	/* if wchar_t cannot represent the value, just return 0 */
+	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static pg_wchar
+toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 2f05dffcdd..c71cf38020 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,10 +12,25 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
 
+/*
+ * Character properties for regular expressions.
+ */
+#define PG_ISDIGIT     0x01
+#define PG_ISALPHA     0x02
+#define PG_ISALNUM     (PG_ISDIGIT | PG_ISALPHA)
+#define PG_ISUPPER     0x04
+#define PG_ISLOWER     0x08
+#define PG_ISGRAPH     0x10
+#define PG_ISPRINT     0x20
+#define PG_ISPUNCT     0x40
+#define PG_ISSPACE     0x80
+
 #ifdef USE_ICU
 /*
  * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
@@ -90,6 +105,43 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+struct ctype_methods
+{
+	/* case mapping: LOWER()/INITCAP()/UPPER() */
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* required */
+	int			(*char_properties) (pg_wchar wc, int mask, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/* required */
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -114,6 +166,7 @@ struct pg_locale_struct
 	bool		ctype_is_c;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -138,6 +191,19 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern int	char_properties(pg_wchar wc, int mask, pg_locale_t locale);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
+extern size_t pg_strlower(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strtitle(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strupper(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
@@ -157,11 +223,6 @@ extern const char *builtin_validate_locale(int encoding, const char *locale);
 extern void icu_validate_locale(const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
-#ifdef USE_ICU
-extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
-extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
-
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
 						 pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 08521d51a9..416a2cc76b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1822,7 +1822,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.34.1

v8-0004-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v8-0004-Remove-provider-field-from-pg_locale_t.patchDownload

From 7275cd18cfba39377cdc8e22e361a135c4c2eba7 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v8 4/7] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale_builtin.c |  1 -
 src/backend/utils/adt/pg_locale_icu.c     | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c    |  6 ------
 src/include/utils/pg_locale.h             |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 5f90355355..b4abde1afd 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -188,7 +188,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 5251163b1b..7a2ca8ac84 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -242,7 +242,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -499,8 +498,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -528,8 +525,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -574,8 +569,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -726,8 +719,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -776,8 +767,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 861c4f68e4..7708e9de96 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -466,7 +466,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -586,8 +585,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -642,8 +639,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -687,7 +682,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index c71cf38020..98a860d0b3 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -160,7 +160,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.34.1

v8-0005-Make-provider-data-in-pg_locale_t-an-opaque-point.patchtext/x-patch; charset=UTF-8; name=v8-0005-Make-provider-data-in-pg_locale_t-an-opaque-point.patchDownload

From e1c6414fcabd075b33a4ee1edbd3231b657c9e65 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v8 5/7] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale_builtin.c |  11 +-
 src/backend/utils/adt/pg_locale_icu.c     |  40 +++++--
 src/backend/utils/adt/pg_locale_libc.c    | 131 ++++++++++++++--------
 src/include/utils/pg_locale.h             |  16 +--
 4 files changed, 127 insertions(+), 71 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index b4abde1afd..a1eb0e459c 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -25,6 +25,11 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 
+struct builtin_provider
+{
+	const char *locale;
+};
+
 struct WordBoundaryState
 {
 	const char *str;
@@ -154,6 +159,7 @@ pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
+	struct builtin_provider *builtin;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -187,7 +193,10 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	builtin = MemoryContextAllocZero(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	result->provider_data = (void *) builtin;
+
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 7a2ca8ac84..57f366f741 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -39,6 +39,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static int	strncoll_icu(const char *arg1, ssize_t len1,
@@ -189,6 +195,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -240,8 +247,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -497,11 +508,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -525,6 +537,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -538,7 +552,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -569,12 +583,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -665,11 +681,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -677,7 +695,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -719,6 +737,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -741,7 +761,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -767,6 +787,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -786,7 +808,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 7708e9de96..8738a0221b 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -33,6 +34,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -118,17 +124,21 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static const struct ctype_methods ctype_methods_libc_sb = {
@@ -179,7 +189,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -196,7 +206,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = tolower_l((unsigned char) *p, loc);
+			*p = tolower_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -206,7 +216,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -228,7 +239,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -259,7 +270,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -276,10 +287,10 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (wasalnum)
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -290,7 +301,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -315,10 +327,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -350,7 +362,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -364,7 +376,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = toupper_l((unsigned char) *p, loc);
+			*p = toupper_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -374,7 +386,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -396,7 +409,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -424,6 +437,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -462,16 +476,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -585,6 +602,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -615,7 +634,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -639,8 +658,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -649,7 +670,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -682,6 +703,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -726,7 +749,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -773,27 +796,29 @@ char_properties_libc_sb(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc > (pg_wchar) UCHAR_MAX)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -804,6 +829,8 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
@@ -811,21 +838,21 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -834,10 +861,12 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -845,10 +874,12 @@ toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -856,10 +887,12 @@ toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -867,10 +900,12 @@ tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -962,8 +997,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1022,8 +1059,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 98a860d0b3..c4c19a3c82 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -167,21 +167,7 @@ struct pg_locale_struct
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 typedef struct pg_locale_struct *pg_locale_t;
-- 
2.34.1

v8-0006-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v8-0006-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From 85f5b5f2bb9148e514181527c03e9d96bad68a5d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v8 6/7] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  |  4 ++++
 src/backend/utils/adt/formatting.c    |  4 ----
 src/backend/utils/adt/pg_locale.c     |  4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 13 +++++++++++++
 src/backend/utils/adt/varlena.c       |  4 ++++
 src/include/utils/pg_locale.h         | 17 -----------------
 6 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53b6a479aa..afc2330f51 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 6a0571f93e..387009a4a9 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -71,10 +71,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/unicode_case.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2ebe4c00bf..d40ecf2357 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 57f366f741..b8a455d730 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,7 +13,20 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
+
+/*
+ * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
+ * (see
+ * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
+ */
+#if U_ICU_VERSION_MAJOR_NUM >= 53
+#define HAVE_UCOL_STRCOLLUTF8 1
+#else
+#undef HAVE_UCOL_STRCOLLUTF8
+#endif
+
 #endif
 
 #include "access/htup_details.h"
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 533bebc1c7..37b3506f06 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index c4c19a3c82..24f7ee4b61 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /*
  * Character properties for regular expressions.
  */
@@ -31,19 +27,6 @@
 #define PG_ISPUNCT     0x40
 #define PG_ISSPACE     0x80
 
-#ifdef USE_ICU
-/*
- * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
- * (see
- * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
- */
-#if U_ICU_VERSION_MAJOR_NUM >= 53
-#define HAVE_UCOL_STRCOLLUTF8 1
-#else
-#undef HAVE_UCOL_STRCOLLUTF8
-#endif
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.34.1

v8-0007-Introduce-hooks-for-creating-custom-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v8-0007-Introduce-hooks-for-creating-custom-pg_locale_t.patchDownload

From 0db38e3a2f29a3638ead67ebcd98a0a414c2ad1e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 16:10:28 -0700
Subject: [PATCH v8 7/7] Introduce hooks for creating custom pg_locale_t.

Now that collation, case mapping, and ctype behavior is controlled
with a method table, we can hook the behavior.

The hooks can provide their own arbitrary method table, which may be
based on a different version of ICU than what Postgres was built with,
or entirely unrelated to ICU/libc.
---
 src/backend/utils/adt/pg_locale.c | 68 +++++++++++++++++++++----------
 src/include/utils/pg_locale.h     | 24 +++++++++++
 src/tools/pgindent/typedefs.list  |  2 +
 3 files changed, 72 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index d40ecf2357..c3fa68ead5 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -103,6 +103,9 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
+create_pg_locale_hook_type create_pg_locale_hook = NULL;
+collation_version_hook_type collation_version_hook = NULL;
+
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1209,7 +1212,7 @@ create_pg_locale(Oid collid, MemoryContext context)
 {
 	HeapTuple	tp;
 	Form_pg_collation collform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 	Datum		datum;
 	bool		isnull;
 
@@ -1218,15 +1221,21 @@ create_pg_locale(Oid collid, MemoryContext context)
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(collid, context);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(collid, context);
+
+	if (result == NULL)
+	{
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(collid, context);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	}
 
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
@@ -1289,7 +1298,7 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 
 	Assert(default_locale == NULL);
 
@@ -1299,18 +1308,25 @@ init_database_collation(void)
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
-										  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
-									  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(DEFAULT_COLLATION_OID,
 									   TopMemoryContext);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+
+	if (result == NULL)
+	{
+		if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+											  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+										   TopMemoryContext);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+	}
 
 	ReleaseSysCache(tup);
 
@@ -1379,6 +1395,14 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collation_version_hook != NULL)
+	{
+		char	   *version;
+
+		if (collation_version_hook(collprovider, collcollate, &version))
+			return version;
+	}
+
 	/*
 	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
 	 * and are not expected to change, but track the version anyway.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 24f7ee4b61..7bd00a16a6 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -155,6 +155,30 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+/*
+ * Hooks to enable custom locale providers.
+ */
+
+/*
+ * Hook create_pg_locale(). Return result (allocated in the given context) to
+ * override; or return NULL to return control to create_pg_locale(). When
+ * creating the default database collation, collid is DEFAULT_COLLATION_OID.
+ */
+typedef pg_locale_t (*create_pg_locale_hook_type) (Oid collid,
+												   MemoryContext context);
+
+/*
+ * Hook get_collation_actual_version(). Set *version out parameter and return
+ * true to override; or return false to return control to
+ * get_collation_actual_version().
+ */
+typedef bool (*collation_version_hook_type) (char collprovider,
+											 const char *collcollate,
+											 char **version);
+
+extern PGDLLIMPORT create_pg_locale_hook_type create_pg_locale_hook;
+extern PGDLLIMPORT collation_version_hook_type collation_version_hook;
+
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 416a2cc76b..1d4247267b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3375,6 +3375,7 @@ cmpEntriesArg
 codes_t
 collation_cache_entry
 collation_cache_hash
+collation_version_hook_type
 color
 colormaprange
 compare_context
@@ -3391,6 +3392,7 @@ core_yyscan_t
 corrupt_items
 cost_qual_eval_context
 cp_hash_func
+create_pg_locale_hook_type
 create_upper_paths_hook_type
 createdb_failure_params
 crosstab_HashEnt
-- 
2.34.1

#10

Andreas Karlsson

andreas@proxel.se

about 1 year ago

In reply to: Jeff Davis (#9)

11 attachment(s)

Re: Collation & ctype method table, and extension hooks

Hi,

I have not looked at the later patches in the series yet as I got
sidetracked while reviewing and decided to clean up some related
collation things which I added to the patch set (feel free to ignore
them if you want). The goal of my added patches is to move provider
specific code into fewer places and not have provider specific logic all
over the codebase.

I feel your first patch in the series is something you can just commit.
It looks good and is simple, obvious refactoring. In theory we could
share the code which does the lookup in the catalog table but I do not
think it would be worth it. I fixed a small issue with it and the
function prototypes in pg_collation.c.

I will look at the rest of your patches later.

My patches:

= v9-0002-Move-check-for-ucol_strcollUTF8-to-pg_locale_icu..patch

Broken out from v9-0010-Don-t-include-ICU-headers-in-pg_locale.h.patch.

= v9-0003-Move-code-for-collation-version-into-provider-spe.patch

Moves some code from pg_collate.c into provider specific files.

= v9-0004-Move-ICU-database-encoding-check-into-validation-.patch

Makes the ICU code more similar to the built-in provider plus reduces
some code duplication. I feel we could go one step further and also only
normalize built-in when "if (!IsBinaryUpgrade && dblocale !=
src_locale)" but I leave that for another patch if that is something we
actually want to unify.

= v9-0005-Move-provider-specific-code-when-looking-up-local.patch

I did not like how namespace.c had knowledge of ICU.

Andreas

Attachments:

v9-0001-Perform-provider-specific-initialization-code-in-.patchtext/x-patch; charset=UTF-8; name=v9-0001-Perform-provider-specific-initialization-code-in-.patchDownload

From 74fd96d5acdd3a80b1bd6277bb7d12986f8e3659 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 15:49:32 -0700
Subject: [PATCH v9 01/11] Perform provider-specific initialization code in new
 functions.

---
 src/backend/utils/adt/Makefile            |   1 +
 src/backend/utils/adt/meson.build         |   1 +
 src/backend/utils/adt/pg_locale.c         | 157 +++-------------------
 src/backend/utils/adt/pg_locale_builtin.c |  70 ++++++++++
 src/backend/utils/adt/pg_locale_icu.c     |  97 ++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c    |  74 +++++++++-
 6 files changed, 256 insertions(+), 144 deletions(-)
 create mode 100644 src/backend/utils/adt/pg_locale_builtin.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 85e5eaf32eb..35e8c01aab9 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -79,6 +79,7 @@ OBJS = \
 	orderedsetaggs.o \
 	partitionfuncs.o \
 	pg_locale.o \
+	pg_locale_builtin.o \
 	pg_locale_icu.o \
 	pg_locale_libc.o \
 	pg_lsn.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index f73f294b8f5..e86d6dc8e0a 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -66,6 +66,7 @@ backend_sources += files(
   'orderedsetaggs.c',
   'partitionfuncs.c',
   'pg_locale.c',
+  'pg_locale_builtin.c',
   'pg_locale_icu.c',
   'pg_locale_libc.c',
   'pg_lsn.c',
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9412cad3ac5..5388057503c 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -89,11 +89,13 @@
 
 #define		MAX_L10N_DATA		80
 
+/* pg_locale_builtin.c */
+extern pg_locale_t create_pg_locale_builtin(Oid collid, MemoryContext context);
+
 /* pg_locale_icu.c */
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -106,8 +108,7 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 #endif
 
 /* pg_locale_libc.c */
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -138,7 +139,7 @@ char	   *localized_full_months[12 + 1];
 /* is the databases's LC_CTYPE the C locale? */
 bool		database_ctype_is_c = false;
 
-static struct pg_locale_struct default_locale;
+static pg_locale_t default_locale = NULL;
 
 /* indicates whether locale information cache is valid */
 static bool CurrentLocaleConvValid = false;
@@ -1194,7 +1195,6 @@ IsoLocaleName(const char *winlocname)
 
 #endif							/* WIN32 && LC_MESSAGES */
 
-
 /*
  * Create a new pg_locale_t struct for the given collation oid.
  */
@@ -1207,75 +1207,17 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Datum		datum;
 	bool		isnull;
 
-	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 	if (!HeapTupleIsValid(tp))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	result->provider = collform->collprovider;
-	result->deterministic = collform->collisdeterministic;
-
 	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-	{
-		const char *locstr;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		locstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = true;
-		result->ctype_is_c = (strcmp(locstr, "C") == 0);
-
-		builtin_validate_locale(GetDatabaseEncoding(), locstr);
-
-		result->info.builtin.locale = MemoryContextStrdup(context,
-														  locstr);
-	}
+		result = create_pg_locale_builtin(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		const char *iculocstr;
-		const char *icurules;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colllocale);
-		iculocstr = TextDatumGetCString(datum);
-
-		result->collate_is_c = false;
-		result->ctype_is_c = false;
-
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-		result->info.icu.ucol = make_icu_collator(iculocstr, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(collid, context);
 	else if (collform->collprovider == COLLPROVIDER_LIBC)
-	{
-		const char *collcollate;
-		const char *collctype;
-
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collcollate);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_collctype);
-		collctype = TextDatumGetCString(datum);
-
-		result->collate_is_c = (strcmp(collcollate, "C") == 0) ||
-			(strcmp(collcollate, "POSIX") == 0);
-		result->ctype_is_c = (strcmp(collctype, "C") == 0) ||
-			(strcmp(collctype, "POSIX") == 0);
-
-		result->info.lt = make_libc_collator(collcollate, collctype);
-	}
+		result = create_pg_locale_libc(collid, context);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
@@ -1335,7 +1277,9 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	Datum		datum;
+	pg_locale_t result;
+
+	Assert(default_locale == NULL);
 
 	/* Fetch our pg_database row normally, via syscache */
 	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
@@ -1344,80 +1288,21 @@ init_database_collation(void)
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
 	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-	{
-		char	   *datlocale;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		builtin_validate_locale(dbform->encoding, datlocale);
-
-		default_locale.collate_is_c = true;
-		default_locale.ctype_is_c = (strcmp(datlocale, "C") == 0);
-
-		default_locale.info.builtin.locale = MemoryContextStrdup(TopMemoryContext,
-																 datlocale);
-	}
+		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-	{
-#ifdef USE_ICU
-		char	   *datlocale;
-		char	   *icurules;
-		bool		isnull;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datlocale);
-		datlocale = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = false;
-		default_locale.ctype_is_c = false;
-
-		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
-		if (!isnull)
-			icurules = TextDatumGetCString(datum);
-		else
-			icurules = NULL;
-
-		default_locale.info.icu.locale = MemoryContextStrdup(TopMemoryContext, datlocale);
-		default_locale.info.icu.ucol = make_icu_collator(datlocale, icurules);
-#else
-		/* could get here if a collation was created by a build with ICU */
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("ICU is not supported in this build")));
-#endif
-	}
+		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+									  TopMemoryContext);
 	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-	{
-		const char *datcollate;
-		const char *datctype;
-
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datcollate);
-		datcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
-		datctype = TextDatumGetCString(datum);
-
-		default_locale.collate_is_c = (strcmp(datcollate, "C") == 0) ||
-			(strcmp(datcollate, "POSIX") == 0);
-		default_locale.ctype_is_c = (strcmp(datctype, "C") == 0) ||
-			(strcmp(datctype, "POSIX") == 0);
-
-		default_locale.info.lt = make_libc_collator(datcollate, datctype);
-	}
+		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+									   TopMemoryContext);
 	else
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
 
-
-	default_locale.provider = dbform->datlocprovider;
-
-	/*
-	 * Default locale is currently always deterministic.  Nondeterministic
-	 * locales currently don't support pattern matching, which would break a
-	 * lot of things if applied globally.
-	 */
-	default_locale.deterministic = true;
-
 	ReleaseSysCache(tup);
+
+	default_locale = result;
 }
 
 /*
@@ -1435,7 +1320,7 @@ pg_newlocale_from_collation(Oid collid)
 	bool		found;
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return &default_locale;
+		return default_locale;
 
 	if (!OidIsValid(collid))
 		elog(ERROR, "cache lookup failed for collation %u", collid);
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
new file mode 100644
index 00000000000..4246971a4d8
--- /dev/null
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -0,0 +1,70 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL locale utilities for builtin provider
+ *
+ * Portions Copyright (c) 2002-2024, PostgreSQL Global Development Group
+ *
+ * src/backend/utils/adt/pg_locale_builtin.c
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "catalog/pg_database.h"
+#include "catalog/pg_collation.h"
+#include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/memutils.h"
+#include "utils/pg_locale.h"
+#include "utils/syscache.h"
+
+extern pg_locale_t create_pg_locale_builtin(Oid collid,
+											MemoryContext context);
+
+pg_locale_t
+create_pg_locale_builtin(Oid collid, MemoryContext context)
+{
+	const char *locstr;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		locstr = TextDatumGetCString(datum);
+		ReleaseSysCache(tp);
+	}
+
+	builtin_validate_locale(GetDatabaseEncoding(), locstr);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	result->provider = COLLPROVIDER_BUILTIN;
+	result->deterministic = true;
+	result->collate_is_c = true;
+	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+
+	return result;
+}
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2a87e25dfb1..73eb430d750 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -12,14 +12,20 @@
 #include "postgres.h"
 
 #ifdef USE_ICU
-
 #include <unicode/ucnv.h>
 #include <unicode/ustring.h>
+#endif
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
@@ -29,9 +35,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+
+#ifdef USE_ICU
+
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern UCollator *make_icu_collator(const char *iculocstr,
-									const char *icurules);
 extern int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -49,6 +57,8 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
  */
 static UConverter *icu_converter = NULL;
 
+static UCollator *make_icu_collator(const char *iculocstr,
+									const char *icurules);
 static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 								 const char *arg2, ssize_t len2,
 								 pg_locale_t locale);
@@ -63,6 +73,85 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+#endif
+
+pg_locale_t
+create_pg_locale_icu(Oid collid, MemoryContext context)
+{
+#ifdef USE_ICU
+	bool		deterministic;
+	const char *iculocstr;
+	const char *icurules = NULL;
+	UCollator  *collator;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+
+		/* default database collation is always deterministic */
+		deterministic = true;
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datlocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(DATABASEOID, tp,
+								Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		Form_pg_collation collform;
+		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+		collform = (Form_pg_collation) GETSTRUCT(tp);
+		deterministic = collform->collisdeterministic;
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_colllocale);
+		iculocstr = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(COLLOID, tp,
+								Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+	collator = make_icu_collator(iculocstr, icurules);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
+	result->info.icu.ucol = collator;
+	result->provider = COLLPROVIDER_ICU;
+	result->deterministic = deterministic;
+	result->collate_is_c = false;
+	result->ctype_is_c = false;
+
+	return result;
+#else
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build")));
+
+	return NULL;
+#endif
+}
+
+#ifdef USE_ICU
 
 /*
  * Wrapper around ucol_open() to handle API differences for older ICU
@@ -160,7 +249,7 @@ pg_ucol_open(const char *loc_str)
  *
  * Ensure that no path leaks a UCollator.
  */
-UCollator *
+static UCollator *
 make_icu_collator(const char *iculocstr, const char *icurules)
 {
 	if (!icurules)
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 83f310fc71c..374ac37ba0a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,10 +11,16 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
+#include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
 #include "utils/formatting.h"
+#include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/syscache.h"
 
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
@@ -24,15 +30,16 @@
  */
 #define		TEXTBUFLEN			1024
 
-extern locale_t make_libc_collator(const char *collate,
-								   const char *ctype);
+extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
+
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
-
+static locale_t make_libc_collator(const char *collate,
+								   const char *ctype);
 static void report_newlocale_failure(const char *localename);
 
 #ifdef WIN32
@@ -41,6 +48,65 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+pg_locale_t
+create_pg_locale_libc(Oid collid, MemoryContext context)
+{
+	const char *collate;
+	const char *ctype;
+	locale_t	loc;
+	pg_locale_t result;
+
+	if (collid == DEFAULT_COLLATION_OID)
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+									   Anum_pg_database_datctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+	else
+	{
+		HeapTuple	tp;
+		Datum		datum;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", collid);
+
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collcollate);
+		collate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttrNotNull(COLLOID, tp,
+									   Anum_pg_collation_collctype);
+		ctype = TextDatumGetCString(datum);
+
+		ReleaseSysCache(tp);
+	}
+
+
+	loc = make_libc_collator(collate, ctype);
+
+	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+	result->provider = COLLPROVIDER_LIBC;
+	result->deterministic = true;
+	result->collate_is_c = (strcmp(collate, "C") == 0) ||
+		(strcmp(collate, "POSIX") == 0);
+	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
+		(strcmp(ctype, "POSIX") == 0);
+	result->info.lt = loc;
+
+	return result;
+}
+
 /*
  * Create a locale_t with the given collation and ctype.
  *
@@ -49,7 +115,7 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
  *
  * Ensure that no path leaks a locale_t.
  */
-locale_t
+static locale_t
 make_libc_collator(const char *collate, const char *ctype)
 {
 	locale_t	loc = 0;
-- 
2.45.2

v9-0002-Move-check-for-ucol_strcollUTF8-to-pg_locale_icu..patchtext/x-patch; charset=UTF-8; name=v9-0002-Move-check-for-ucol_strcollUTF8-to-pg_locale_icu..patchDownload

From 220c19384f2b4fa326964706c283e12d2a1db574 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 00:55:41 +0100
Subject: [PATCH v9 02/11] Move check for ucol_strcollUTF8 to pg_locale_icu.c

The result of the check is only used by pg_locale_icu.c.
---
 src/backend/utils/adt/pg_locale_icu.c | 12 ++++++++++++
 src/include/utils/pg_locale.h         | 13 -------------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 73eb430d750..2c6b950ec18 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -14,6 +14,18 @@
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
 #include <unicode/ustring.h>
+
+/*
+ * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
+ * (see
+ * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
+ */
+#if U_ICU_VERSION_MAJOR_NUM >= 53
+#define HAVE_UCOL_STRCOLLUTF8 1
+#else
+#undef HAVE_UCOL_STRCOLLUTF8
+#endif
+
 #endif
 
 #include "access/htup_details.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 37ecf951937..8a8008f9d84 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -16,19 +16,6 @@
 #include <unicode/ucol.h>
 #endif
 
-#ifdef USE_ICU
-/*
- * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
- * (see
- * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
- */
-#if U_ICU_VERSION_MAJOR_NUM >= 53
-#define HAVE_UCOL_STRCOLLUTF8 1
-#else
-#undef HAVE_UCOL_STRCOLLUTF8
-#endif
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.45.2

v9-0003-Move-code-for-collation-version-into-provider-spe.patchtext/x-patch; charset=UTF-8; name=v9-0003-Move-code-for-collation-version-into-provider-spe.patchDownload

From 3f2cca2d06b46b169d43a89a4ecc71c4dc2f07f9 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 04:44:09 +0100
Subject: [PATCH v9 03/11] Move code for collation version into provider
 specific files

---
 src/backend/utils/adt/pg_locale.c         | 106 +++-------------------
 src/backend/utils/adt/pg_locale_builtin.c |  24 +++++
 src/backend/utils/adt/pg_locale_icu.c     |  17 ++++
 src/backend/utils/adt/pg_locale_libc.c    |  74 +++++++++++++++
 4 files changed, 126 insertions(+), 95 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5388057503c..ebad2d530fa 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,10 +69,6 @@
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
 
-#ifdef __GLIBC__
-#include <gnu/libc-version.h>
-#endif
-
 #ifdef WIN32
 #include <shlwapi.h>
 #endif
@@ -91,6 +87,7 @@
 
 /* pg_locale_builtin.c */
 extern pg_locale_t create_pg_locale_builtin(Oid collid, MemoryContext context);
+extern char *get_collation_actual_version_builtin(const char *collcollate);
 
 /* pg_locale_icu.c */
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
@@ -105,6 +102,7 @@ extern size_t strnxfrm_icu(char *dest, size_t destsize,
 extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
+extern char *get_collation_actual_version_icu(const char *collcollate);
 #endif
 
 /* pg_locale_libc.c */
@@ -115,6 +113,7 @@ extern int	strncoll_libc(const char *arg1, ssize_t len1,
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
 char	   *locale_messages;
@@ -1367,100 +1366,17 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
-	/*
-	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
-	 * and are not expected to change, but track the version anyway.
-	 *
-	 * Note that the character semantics may change for some locales, but the
-	 * collation version only tracks changes to sort order.
-	 */
 	if (collprovider == COLLPROVIDER_BUILTIN)
-	{
-		if (strcmp(collcollate, "C") == 0)
-			return "1";
-		else if (strcmp(collcollate, "C.UTF-8") == 0)
-			return "1";
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-					 errmsg("invalid locale name \"%s\" for builtin provider",
-							collcollate)));
-	}
-
+		collversion = get_collation_actual_version_builtin(collcollate);
 #ifdef USE_ICU
-	if (collprovider == COLLPROVIDER_ICU)
-	{
-		UCollator  *collator;
-		UVersionInfo versioninfo;
-		char		buf[U_MAX_VERSION_STRING_LENGTH];
-
-		collator = pg_ucol_open(collcollate);
-
-		ucol_getVersion(collator, versioninfo);
-		ucol_close(collator);
-
-		u_versionToString(versioninfo, buf);
-		collversion = pstrdup(buf);
-	}
-	else
-#endif
-		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
-	{
-#if defined(__GLIBC__)
-		/* Use the glibc version because we don't have anything better. */
-		collversion = pstrdup(gnu_get_libc_version());
-#elif defined(LC_VERSION_MASK)
-		locale_t	loc;
-
-		/* Look up FreeBSD collation version. */
-		loc = newlocale(LC_COLLATE_MASK, collcollate, NULL);
-		if (loc)
-		{
-			collversion =
-				pstrdup(querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
-			freelocale(loc);
-		}
-		else
-			ereport(ERROR,
-					(errmsg("could not load locale \"%s\"", collcollate)));
-#elif defined(WIN32)
-		/*
-		 * If we are targeting Windows Vista and above, we can ask for a name
-		 * given a collation name (earlier versions required a location code
-		 * that we don't have).
-		 */
-		NLSVERSIONINFOEX version = {sizeof(NLSVERSIONINFOEX)};
-		WCHAR		wide_collcollate[LOCALE_NAME_MAX_LENGTH];
-
-		MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wide_collcollate,
-							LOCALE_NAME_MAX_LENGTH);
-		if (!GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version))
-		{
-			/*
-			 * GetNLSVersionEx() wants a language tag such as "en-US", not a
-			 * locale name like "English_United States.1252".  Until those
-			 * values can be prevented from entering the system, or 100%
-			 * reliably converted to the more useful tag format, tolerate the
-			 * resulting error and report that we have no version data.
-			 */
-			if (GetLastError() == ERROR_INVALID_PARAMETER)
-				return NULL;
-
-			ereport(ERROR,
-					(errmsg("could not get collation version for locale \"%s\": error code %lu",
-							collcollate,
-							GetLastError())));
-		}
-		collversion = psprintf("%lu.%lu,%lu.%lu",
-							   (version.dwNLSVersion >> 8) & 0xFFFF,
-							   version.dwNLSVersion & 0xFF,
-							   (version.dwDefinedVersion >> 8) & 0xFFFF,
-							   version.dwDefinedVersion & 0xFF);
+	else if (collprovider == COLLPROVIDER_ICU)
+		collversion = get_collation_actual_version_icu(collcollate);
 #endif
-	}
+	else if (collprovider == COLLPROVIDER_LIBC)
+		collversion = get_collation_actual_version_libc(collcollate);
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(collprovider);
 
 	return collversion;
 }
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 4246971a4d8..2e2d78758e1 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -22,6 +22,7 @@
 
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
+extern char *get_collation_actual_version_builtin(const char *collcollate);
 
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
@@ -68,3 +69,26 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	return result;
 }
+
+char *
+get_collation_actual_version_builtin(const char *collcollate)
+{
+	/*
+	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
+	 * and are not expected to change, but track the version anyway.
+	 *
+	 * Note that the character semantics may change for some locales, but the
+	 * collation version only tracks changes to sort order.
+	 */
+	if (strcmp(collcollate, "C") == 0)
+		return "1";
+	else if (strcmp(collcollate, "C.UTF-8") == 0)
+		return "1";
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("invalid locale name \"%s\" for builtin provider",
+						collcollate)));
+
+	return NULL;				/* keep compiler quiet */
+}
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2c6b950ec18..158c00a8130 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -61,6 +61,7 @@ extern size_t strnxfrm_icu(char *dest, size_t destsize,
 extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
+extern char *get_collation_actual_version_icu(const char *collcollate);
 
 /*
  * Converter object for converting between ICU's UChar strings and C strings
@@ -446,6 +447,22 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	return result;
 }
 
+char *
+get_collation_actual_version_icu(const char *collcollate)
+{
+	UCollator  *collator;
+	UVersionInfo versioninfo;
+	char		buf[U_MAX_VERSION_STRING_LENGTH];
+
+	collator = pg_ucol_open(collcollate);
+
+	ucol_getVersion(collator, versioninfo);
+	ucol_close(collator);
+
+	u_versionToString(versioninfo, buf);
+	return pstrdup(buf);
+}
+
 /*
  * Convert a string in the database encoding into a string of UChars.
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 374ac37ba0a..fdf5f784551 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -22,6 +22,14 @@
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
 
+#ifdef __GLIBC__
+#include <gnu/libc-version.h>
+#endif
+
+#ifdef WIN32
+#include <shlwapi.h>
+#endif
+
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
  * allocations in typical cases. This should be large enough that most strings
@@ -38,6 +46,7 @@ extern int	strncoll_libc(const char *arg1, ssize_t len1,
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+extern char *get_collation_actual_version_libc(const char *collcollate);
 static locale_t make_libc_collator(const char *collate,
 								   const char *ctype);
 static void report_newlocale_failure(const char *localename);
@@ -283,6 +292,71 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result;
 }
 
+char *
+get_collation_actual_version_libc(const char *collcollate)
+{
+	char	   *collversion = NULL;
+
+	if (pg_strcasecmp("C", collcollate) != 0 &&
+		pg_strncasecmp("C.", collcollate, 2) != 0 &&
+		pg_strcasecmp("POSIX", collcollate) != 0)
+	{
+#if defined(__GLIBC__)
+		/* Use the glibc version because we don't have anything better. */
+		collversion = pstrdup(gnu_get_libc_version());
+#elif defined(LC_VERSION_MASK)
+		locale_t	loc;
+
+		/* Look up FreeBSD collation version. */
+		loc = newlocale(LC_COLLATE_MASK, collcollate, NULL);
+		if (loc)
+		{
+			collversion =
+				pstrdup(querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
+			freelocale(loc);
+		}
+		else
+			ereport(ERROR,
+					(errmsg("could not load locale \"%s\"", collcollate)));
+#elif defined(WIN32)
+		/*
+		 * If we are targeting Windows Vista and above, we can ask for a name
+		 * given a collation name (earlier versions required a location code
+		 * that we don't have).
+		 */
+		NLSVERSIONINFOEX version = {sizeof(NLSVERSIONINFOEX)};
+		WCHAR		wide_collcollate[LOCALE_NAME_MAX_LENGTH];
+
+		MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wide_collcollate,
+							LOCALE_NAME_MAX_LENGTH);
+		if (!GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version))
+		{
+			/*
+			 * GetNLSVersionEx() wants a language tag such as "en-US", not a
+			 * locale name like "English_United States.1252".  Until those
+			 * values can be prevented from entering the system, or 100%
+			 * reliably converted to the more useful tag format, tolerate the
+			 * resulting error and report that we have no version data.
+			 */
+			if (GetLastError() == ERROR_INVALID_PARAMETER)
+				return NULL;
+
+			ereport(ERROR,
+					(errmsg("could not get collation version for locale \"%s\": error code %lu",
+							collcollate,
+							GetLastError())));
+		}
+		collversion = psprintf("%lu.%lu,%lu.%lu",
+							   (version.dwNLSVersion >> 8) & 0xFFFF,
+							   version.dwNLSVersion & 0xFF,
+							   (version.dwDefinedVersion >> 8) & 0xFFFF,
+							   version.dwDefinedVersion & 0xFF);
+#endif
+	}
+
+	return collversion;
+}
+
 /*
  * strncoll_libc_win32_utf8
  *
-- 
2.45.2

v9-0004-Move-ICU-database-encoding-check-into-validation-.patchtext/x-patch; charset=UTF-8; name=v9-0004-Move-ICU-database-encoding-check-into-validation-.patchDownload

From 8055d1b1bf0968265e354ed3fd72933466db0bd7 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 05:49:03 +0100
Subject: [PATCH v9 04/11] Move ICU database encoding check into validation
 function

This removes some duplicated code while also makes the code for
validating an ICU collation more similar to the code for built-in
collation.
---
 src/backend/commands/collationcmds.c | 16 ++--------------
 src/backend/commands/dbcommands.c    |  8 +-------
 src/backend/utils/adt/pg_locale.c    | 13 ++++++++++++-
 src/include/utils/pg_locale.h        |  2 +-
 4 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53b6a479aa4..8001f5ed082 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -297,7 +297,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				}
 			}
 
-			icu_validate_locale(colllocale);
+			icu_validate_locale(GetDatabaseEncoding(), colllocale);
 		}
 
 		/*
@@ -322,23 +322,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		}
 		else if (collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
 			/*
 			 * We could create ICU collations with collencoding == database
 			 * encoding, but it seems better to use -1 so that it matches the
-			 * way initdb would create ICU collations.  However, only allow
-			 * one to be created when the current database's encoding is
-			 * supported.  Otherwise the collation is useless, plus we get
-			 * surprising behaviors like not being able to drop the collation.
-			 *
-			 * Skip this test when !USE_ICU, because the error we want to
-			 * throw for that isn't thrown till later.
+			 * way initdb would create ICU collations.
 			 */
-			if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("current database's encoding is not supported with this provider")));
-#endif
 			collencoding = -1;
 		}
 		else
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index aa91a396967..fd5e887c3ae 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1116,12 +1116,6 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	}
 	else if (dblocprovider == COLLPROVIDER_ICU)
 	{
-		if (!(is_encoding_supported_by_icu(encoding)))
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("encoding \"%s\" is not supported with ICU provider",
-							pg_encoding_to_char(encoding))));
-
 		/*
 		 * This would happen if template0 uses the libc provider but the new
 		 * database uses icu.
@@ -1151,7 +1145,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			}
 		}
 
-		icu_validate_locale(dblocale);
+		icu_validate_locale(encoding, dblocale);
 	}
 
 	/* for libc, locale comes from datcollate and datctype */
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index ebad2d530fa..8369b9f7893 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1716,7 +1716,7 @@ icu_language_tag(const char *loc_str, int elevel)
  * Perform best-effort check that the locale is a valid one.
  */
 void
-icu_validate_locale(const char *loc_str)
+icu_validate_locale(int encoding, const char *loc_str)
 {
 #ifdef USE_ICU
 	UCollator  *collator;
@@ -1725,6 +1725,17 @@ icu_validate_locale(const char *loc_str)
 	bool		found = false;
 	int			elevel = icu_validation_level;
 
+	/*
+	 * Only allow locales to be created when the encoding is supported.
+	 * Otherwise the collation is useless, plus we get surprising behaviors
+	 * like not being able to drop the collation.
+	 */
+	if (!(is_encoding_supported_by_icu(encoding)))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("encoding \"%s\" is not supported with ICU provider",
+						pg_encoding_to_char(encoding))));
+
 	/* no validation */
 	if (elevel < 0)
 		return;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 8a8008f9d84..22935c9830e 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -108,7 +108,7 @@ extern size_t pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 
 extern int	builtin_locale_encoding(const char *locale);
 extern const char *builtin_validate_locale(int encoding, const char *locale);
-extern void icu_validate_locale(const char *loc_str);
+extern void icu_validate_locale(int encoding, const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
 #ifdef USE_ICU
-- 
2.45.2

v9-0005-Move-provider-specific-code-when-looking-up-local.patchtext/x-patch; charset=UTF-8; name=v9-0005-Move-provider-specific-code-when-looking-up-local.patchDownload

From e2add1bca0d4c8900fad0e75a248586f440ad0ad Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 05:49:20 +0100
Subject: [PATCH v9 05/11] Move provider specific code when looking up locales
 into pg_locale.c

---
 src/backend/catalog/namespace.c   | 14 ++++----------
 src/backend/utils/adt/pg_locale.c |  9 +++++++++
 src/include/utils/pg_locale.h     |  1 +
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 30807f91904..6ad40a96334 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -57,6 +57,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/pg_locale.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
@@ -2346,17 +2347,10 @@ lookup_collation(const char *collname, Oid collnamespace, int32 encoding)
 	if (!HeapTupleIsValid(colltup))
 		return InvalidOid;
 	collform = (Form_pg_collation) GETSTRUCT(colltup);
-	if (collform->collprovider == COLLPROVIDER_ICU)
-	{
-		if (is_encoding_supported_by_icu(encoding))
-			collid = collform->oid;
-		else
-			collid = InvalidOid;
-	}
-	else
-	{
+	if (is_encoding_supported_by_collprovider(collform->collprovider, encoding))
 		collid = collform->oid;
-	}
+	else
+		collid = InvalidOid;
 	ReleaseSysCache(colltup);
 	return collid;
 }
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 8369b9f7893..eb23c521899 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1381,6 +1381,15 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+bool
+is_encoding_supported_by_collprovider(char collprovider, int encoding)
+{
+	if (collprovider == COLLPROVIDER_ICU)
+		return is_encoding_supported_by_icu(encoding);
+	else
+		return true;
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 22935c9830e..8adab0b5f30 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -92,6 +92,7 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern bool is_encoding_supported_by_collprovider(char collprovider, int encoding);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
-- 
2.45.2

v9-0006-Control-collation-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v9-0006-Control-collation-behavior-with-a-method-table.patchDownload

From 51d31c065f915e4cc9fc498f1384a504814b4abf Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 11:27:29 -0700
Subject: [PATCH v9 06/11] Control collation behavior with a method table.

Previously, behavior branched based on the provider.

A method table is less error prone and easier to hook.
---
 src/backend/utils/adt/pg_locale.c      | 124 +++------------------
 src/backend/utils/adt/pg_locale_icu.c  | 147 +++++++++++++++----------
 src/backend/utils/adt/pg_locale_libc.c |  40 +++++--
 src/include/utils/pg_locale.h          |  33 ++++++
 4 files changed, 167 insertions(+), 177 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index eb23c521899..5643ef45ed3 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -93,27 +93,11 @@ extern char *get_collation_actual_version_builtin(const char *collcollate);
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
-						 const char *arg2, ssize_t len2,
-						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
-						   const char *src, ssize_t srclen,
-						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
-								  const char *src, ssize_t srclen,
-								  pg_locale_t locale);
 extern char *get_collation_actual_version_icu(const char *collcollate);
 #endif
 
 /* pg_locale_libc.c */
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
-						  const char *arg2, ssize_t len2,
-						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale);
-extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
 char	   *locale_messages;
@@ -1221,6 +1205,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 		/* shouldn't happen */
 		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
 
+	Assert((result->collate_is_c && result->collate == NULL) ||
+		   (!result->collate_is_c && result->collate != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1398,19 +1385,7 @@ is_encoding_supported_by_collprovider(char collprovider, int encoding)
 int
 pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, -1, arg2, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, -1, arg2, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, -1, arg2, -1, locale);
 }
 
 /*
@@ -1431,51 +1406,25 @@ int
 pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 			pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, len1, arg2, len2, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, len1, arg2, len2, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, len1, arg2, len2, locale);
 }
 
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
  *
- * Unfortunately, it seems that strxfrm() for non-C collations is broken on
- * many common platforms; testing of multiple versions of glibc reveals that,
- * for many locales, strcoll() and strxfrm() do not return consistent
- * results. While no other libc other than Cygwin has so far been shown to
- * have a problem, we take the conservative course of action for right now and
- * disable this categorically.  (Users who are certain this isn't a problem on
- * their system can define TRUST_STRXFRM.)
  *
  * No similar problem is known for the ICU provider.
  */
 bool
 pg_strxfrm_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-#ifdef TRUST_STRXFRM
-		return true;
-#else
-		return false;
-#endif
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	/*
+	 * locale->collate->strnxfrm is still a required method, even if it may
+	 * have the wrong behavior, because the planner uses it for estimates in
+	 * some cases.
+	 */
+	return locale->collate->strxfrm_is_safe;
 }
 
 /*
@@ -1486,19 +1435,7 @@ pg_strxfrm_enabled(pg_locale_t locale)
 size_t
 pg_strxfrm(char *dest, const char *src, size_t destsize, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1524,19 +1461,7 @@ size_t
 pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, srclen, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, srclen, locale);
 }
 
 /*
@@ -1546,15 +1471,7 @@ pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 bool
 pg_strxfrm_prefix_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-		return false;
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	return (locale->collate->strnxfrm_prefix != NULL);
 }
 
 /*
@@ -1566,7 +1483,7 @@ size_t
 pg_strxfrm_prefix(char *dest, const char *src, size_t destsize,
 				  pg_locale_t locale)
 {
-	return pg_strnxfrm_prefix(dest, destsize, src, -1, locale);
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1591,16 +1508,7 @@ size_t
 pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 				   ssize_t srclen, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-#ifdef USE_ICU
-	if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_prefix_icu(dest, destsize, src, -1, locale);
-	else
-#endif
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 158c00a8130..4b7a897e930 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -52,13 +52,14 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
+
+static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
+static size_t strnxfrm_icu(char *dest, size_t destsize,
 						   const char *src, ssize_t srclen,
 						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 extern char *get_collation_actual_version_icu(const char *collcollate);
@@ -72,12 +73,20 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
-static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-								 const char *arg2, ssize_t len2,
-								 pg_locale_t locale);
-static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-										  const char *src, ssize_t srclen,
-										  pg_locale_t locale);
+static int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+#ifdef HAVE_UCOL_STRCOLLUTF8
+static int	strncoll_icu_utf8(const char *arg1, ssize_t len1,
+							  const char *arg2, ssize_t len2,
+							  pg_locale_t locale);
+#endif
+static size_t strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+									   const char *src, ssize_t srclen,
+									   pg_locale_t locale);
 static void init_icu_converter(void);
 static size_t uchar_length(UConverter *converter,
 						   const char *str, int32_t len);
@@ -86,6 +95,25 @@ static int32_t uchar_convert(UConverter *converter,
 							 const char *src, int32_t srclen);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+
+static const struct collate_methods collate_methods_icu = {
+	.strncoll = strncoll_icu,
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu,
+	.strxfrm_is_safe = true,
+};
+
+static const struct collate_methods collate_methods_icu_utf8 = {
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	.strncoll = strncoll_icu_utf8,
+#else
+	.strncoll = strncoll_icu,
+#endif
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu_utf8,
+	.strxfrm_is_safe = true,
+};
+
 #endif
 
 pg_locale_t
@@ -152,6 +180,10 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
+	if (GetDatabaseEncoding() == PG_UTF8)
+		result->collate = &collate_methods_icu_utf8;
+	else
+		result->collate = &collate_methods_icu;
 
 	return result;
 #else
@@ -326,42 +358,36 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 }
 
 /*
- * strncoll_icu
+ * strncoll_icu_utf8
  *
  * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
  * database encoding. An argument length of -1 means the string is
  * NUL-terminated.
  */
+#ifdef HAVE_UCOL_STRCOLLUTF8
 int
-strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			 pg_locale_t locale)
+strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+				  pg_locale_t locale)
 {
 	int			result;
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		status = U_ZERO_ERROR;
-		result = ucol_strcollUTF8(locale->info.icu.ucol,
-								  arg1, len1,
-								  arg2, len2,
-								  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("collation failed: %s", u_errorName(status))));
-	}
-	else
-#endif
-	{
-		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
-	}
+	status = U_ZERO_ERROR;
+	result = ucol_strcollUTF8(locale->info.icu.ucol,
+							  arg1, len1,
+							  arg2, len2,
+							  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("collation failed: %s", u_errorName(status))));
 
 	return result;
 }
+#endif
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
@@ -412,37 +438,32 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
-strnxfrm_prefix_icu(char *dest, size_t destsize,
-					const char *src, ssize_t srclen,
-					pg_locale_t locale)
+strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+						 const char *src, ssize_t srclen,
+						 pg_locale_t locale)
 {
 	size_t		result;
+	UCharIterator iter;
+	uint32_t	state[2];
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UCharIterator iter;
-		uint32_t	state[2];
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		uiter_setUTF8(&iter, src, srclen);
-		state[0] = state[1] = 0;	/* won't need that again */
-		status = U_ZERO_ERROR;
-		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
-									  &iter,
-									  state,
-									  (uint8_t *) dest,
-									  destsize,
-									  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("sort key generation failed: %s",
-							u_errorName(status))));
-	}
-	else
-		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
-											 locale);
+	uiter_setUTF8(&iter, src, srclen);
+	state[0] = state[1] = 0;	/* won't need that again */
+	status = U_ZERO_ERROR;
+	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+								  &iter,
+								  state,
+								  (uint8_t *) dest,
+								  destsize,
+								  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("sort key generation failed: %s",
+						u_errorName(status))));
 
 	return result;
 }
@@ -533,7 +554,7 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 }
 
 /*
- * strncoll_icu_no_utf8
+ * strncoll_icu
  *
  * Convert the arguments from the database encoding to UChar strings, then
  * call ucol_strcoll(). An argument length of -1 means that the string is
@@ -543,8 +564,8 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
  * caller should call that instead.
  */
 static int
-strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-					 const char *arg2, ssize_t len2, pg_locale_t locale)
+strncoll_icu(const char *arg1, ssize_t len1,
+			 const char *arg2, ssize_t len2, pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -557,6 +578,8 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 	int			result;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 #endif
@@ -590,9 +613,9 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 static size_t
-strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale)
+strnxfrm_prefix_icu(char *dest, size_t destsize,
+					const char *src, ssize_t srclen,
+					pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -605,6 +628,8 @@ strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
 	Size		result_bsize;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	init_icu_converter();
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index fdf5f784551..cb519cfb521 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -40,10 +40,10 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
+static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
+static size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 extern char *get_collation_actual_version_libc(const char *collcollate);
@@ -57,6 +57,27 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static const struct collate_methods collate_methods_libc = {
+	.strncoll = strncoll_libc,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+
+	/*
+	 * Unfortunately, it seems that strxfrm() for non-C collations is broken
+	 * on many common platforms; testing of multiple versions of glibc reveals
+	 * that, for many locales, strcoll() and strxfrm() do not return
+	 * consistent results. While no other libc other than Cygwin has so far
+	 * been shown to have a problem, we take the conservative course of action
+	 * for right now and disable this categorically.  (Users who are certain
+	 * this isn't a problem on their system can define TRUST_STRXFRM.)
+	 */
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -112,6 +133,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
 	result->info.lt = loc;
+	if (!result->collate_is_c)
+	{
+#ifdef WIN32
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->collate = &collate_methods_libc_win32_utf8;
+		else
+#endif
+			result->collate = &collate_methods_libc;
+	}
 
 	return result;
 }
@@ -209,12 +239,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 
 	Assert(locale->provider == COLLPROVIDER_LIBC);
 
-#ifdef WIN32
-	/* check for this case before doing the work for nul-termination */
-	if (GetDatabaseEncoding() == PG_UTF8)
-		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
-#endif							/* WIN32 */
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 8adab0b5f30..028eec63901 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -47,6 +47,36 @@ extern struct lconv *PGLC_localeconv(void);
 extern void cache_locale_time(void);
 
 
+struct pg_locale_struct;
+typedef struct pg_locale_struct *pg_locale_t;
+
+/* methods that define collation behavior */
+struct collate_methods
+{
+	/* required */
+	int			(*strncoll) (const char *arg1, ssize_t len1,
+							 const char *arg2, ssize_t len2,
+							 pg_locale_t locale);
+
+	/* required */
+	size_t		(*strnxfrm) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* optional */
+	size_t		(*strnxfrm_prefix) (char *dest, size_t destsize,
+									const char *src, ssize_t srclen,
+									pg_locale_t locale);
+
+	/*
+	 * If the strnxfrm method is not trusted to return the correct results,
+	 * set strxfrm_is_safe to false. It set to false, the method will not be
+	 * used in most cases, but the planner still expects it to be there for
+	 * estimation purposes (where incorrect results are acceptable).
+	 */
+	bool		strxfrm_is_safe;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -69,6 +99,9 @@ struct pg_locale_struct
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
+
+	const struct collate_methods *collate;	/* NULL if collate_is_c */
+
 	union
 	{
 		struct
-- 
2.45.2

v9-0007-Control-ctype-behavior-internally-with-a-method-t.patchtext/x-patch; charset=UTF-8; name=v9-0007-Control-ctype-behavior-internally-with-a-method-t.patchDownload

From cc0a6ebd31a8d5591ae0edc0d9d7c0e3206668f8 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 12:12:51 -0700
Subject: [PATCH v9 07/11] Control ctype behavior internally with a method
 table.

Previously, pattern matching and case mapping behavior branched based
on the provider.

Refactor to use a method table, which is less error-prone and easier
to hook.
---
 src/backend/regex/regc_pg_locale.c        | 388 ++++--------------
 src/backend/utils/adt/formatting.c        | 445 +++------------------
 src/backend/utils/adt/like.c              |  22 +-
 src/backend/utils/adt/like_support.c      |   7 +-
 src/backend/utils/adt/pg_locale.c         |  71 ++++
 src/backend/utils/adt/pg_locale_builtin.c | 129 ++++++
 src/backend/utils/adt/pg_locale_icu.c     | 188 ++++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 465 ++++++++++++++++++++++
 src/include/utils/pg_locale.h             |  71 +++-
 src/tools/pgindent/typedefs.list          |   1 -
 10 files changed, 1062 insertions(+), 725 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index b75784b6ce5..e898634fdf6 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,33 +63,18 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 static Oid	pg_regex_collation;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
-#define PG_ISDIGIT	0x01
-#define PG_ISALPHA	0x02
-#define PG_ISALNUM	(PG_ISDIGIT | PG_ISALPHA)
-#define PG_ISUPPER	0x04
-#define PG_ISLOWER	0x08
-#define PG_ISGRAPH	0x10
-#define PG_ISPRINT	0x20
-#define PG_ISPUNCT	0x40
-#define PG_ISSPACE	0x80
-
-static const unsigned char pg_char_properties[128] = {
+static const unsigned char char_properties_tbl[128] = {
 	 /* NUL */ 0,
 	 /* ^A */ 0,
 	 /* ^B */ 0,
@@ -232,7 +217,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -253,8 +237,8 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
 		collation = C_COLLATION_OID;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -271,32 +255,11 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
+			locale = &dummy_c_locale;
 			collation = C_COLLATION_OID;
 		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
-		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 	pg_regex_collation = collation;
 }
@@ -304,82 +267,31 @@ pg_set_regex_collation(Oid collation)
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISDIGIT));
+	else
+		return char_properties(c, PG_ISDIGIT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALPHA));
+	else
+		return char_properties(c, PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALNUM));
+	else
+		return char_properties(c, PG_ISDIGIT | PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
@@ -394,219 +306,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISUPPER));
+	else
+		return char_properties(c, PG_ISUPPER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISLOWER));
+	else
+		return char_properties(c, PG_ISLOWER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISGRAPH));
+	else
+		return char_properties(c, PG_ISGRAPH, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPRINT));
+	else
+		return char_properties(c, PG_ISPRINT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPUNCT));
+	else
+		return char_properties(c, PG_ISPUNCT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISSPACE));
+	else
+		return char_properties(c, PG_ISSPACE, pg_regex_locale) != 0;
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -732,37 +512,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 85a7dd45619..6a0571f93e6 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1570,52 +1570,6 @@ str_numth(char *dest, char *num, int type)
  *			upper/lower/initcap functions
  *****************************************************************************/
 
-#ifdef USE_ICU
-
-typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
-									 const UChar *src, int32_t srcLength,
-									 const char *locale,
-									 UErrorCode *pErrorCode);
-
-static int32_t
-icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
-				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
-{
-	UErrorCode	status;
-	int32_t		len_dest;
-
-	len_dest = len_source;		/* try first with same length */
-	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-	status = U_ZERO_ERROR;
-	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
-	if (status == U_BUFFER_OVERFLOW_ERROR)
-	{
-		/* try again with adjusted length */
-		pfree(*buff_dest);
-		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-		status = U_ZERO_ERROR;
-		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
-	}
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("case conversion failed: %s", u_errorName(status))));
-	return len_dest;
-}
-
-static int32_t
-u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
-						const UChar *src, int32_t srcLength,
-						const char *locale,
-						UErrorCode *pErrorCode)
-{
-	return u_strToTitle(dest, destCapacity, src, srcLength,
-						NULL, locale, pErrorCode);
-}
-
-#endif							/* USE_ICU */
-
 /*
  * If the system provides the needed functions for wide-character manipulation
  * (which are all standardized by C99), then we implement upper/lower/initcap
@@ -1663,101 +1617,28 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar;
-			int32_t		len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToLower, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strlower(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strlower(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that tolower_l() will not be so broken as
-				 * to need an isupper_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
@@ -1800,147 +1681,33 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToUpper, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
-		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strupper(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strupper(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
 
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l() will not be so broken as
-				 * to need an islower_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-					*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
 }
 
-struct WordBoundaryState
-{
-	const char *str;
-	size_t		len;
-	size_t		offset;
-	bool		init;
-	bool		prev_alnum;
-};
-
-/*
- * Simple word boundary iterator that draws boundaries each time the result of
- * pg_u_isalnum() changes.
- */
-static size_t
-initcap_wbnext(void *state)
-{
-	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
-
-	while (wbstate->offset < wbstate->len &&
-		   wbstate->str[wbstate->offset] != '\0')
-	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
-										wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
-
-		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
-		{
-			size_t		prev_offset = wbstate->offset;
-
-			wbstate->init = true;
-			wbstate->offset += unicode_utf8len(u);
-			wbstate->prev_alnum = curr_alnum;
-			return prev_offset;
-		}
-
-		wbstate->offset += unicode_utf8len(u);
-	}
-
-	return wbstate->len;
-}
-
 /*
  * collation-aware, wide-character-aware initcap function
  *
@@ -1951,7 +1718,6 @@ char *
 str_initcap(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
-	int			wasalnum = false;
 	pg_locale_t mylocale;
 
 	if (!buff)
@@ -1979,125 +1745,28 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToTitle_default_BI, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-			struct WordBoundaryState wbstate = {
-				.str = src,
-				.len = srclen,
-				.offset = 0,
-				.init = false,
-				.prev_alnum = false,
-			};
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strtitle(dst, dstsize, src, srclen,
-									  initcap_wbnext, &wbstate);
-			if (needed + 1 > dstsize)
-			{
-				/* reset iterator */
-				wbstate.offset = 0;
-				wbstate.init = false;
-
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strtitle(dst, dstsize, src, srclen,
-										  initcap_wbnext, &wbstate);
-				Assert(needed + 1 == dstsize);
-			}
-
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-				{
-					if (wasalnum)
-						workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
-					else
-						workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-					wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
-				}
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
 
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l()/tolower_l() will not be so
-				 * broken as to need guard tests.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (wasalnum)
-						*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-					else
-						*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-					wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 7b3d1b5be71..1e5f07dfcab 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -96,7 +96,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	if (locale->ctype_is_c)
 		return pg_ascii_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -207,7 +207,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -222,14 +232,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index ee71ca89ffd..c172f7e55fc 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1495,13 +1495,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5643ef45ed3..9d27567cab7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1208,6 +1208,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
 
+	Assert((result->ctype_is_c && result->ctype == NULL) ||
+		   (!result->ctype_is_c && result->ctype != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1377,6 +1380,27 @@ is_encoding_supported_by_collprovider(char collprovider, int encoding)
 		return true;
 }
 
+size_t
+pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
+}
+
 /*
  * pg_strcoll
  *
@@ -1511,6 +1535,53 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_properties()
+ *
+ * Out of the properties specified in the given mask, return a new mask of the
+ * properties true for the given character.
+ */
+int
+char_properties(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	return locale->ctype->char_properties(wc, mask, locale);
+}
+
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 2e2d78758e1..50efcb5e3d3 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -13,6 +13,8 @@
 
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
+#include "common/unicode_case.h"
+#include "common/unicode_category.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -24,6 +26,131 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
 
+struct WordBoundaryState
+{
+	const char *str;
+	size_t		len;
+	size_t		offset;
+	bool		init;
+	bool		prev_alnum;
+};
+
+/*
+ * Simple word boundary iterator that draws boundaries each time the result of
+ * pg_u_isalnum() changes.
+ */
+static size_t
+initcap_wbnext(void *state)
+{
+	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
+
+	while (wbstate->offset < wbstate->len &&
+		   wbstate->str[wbstate->offset] != '\0')
+	{
+		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+										wbstate->offset);
+		bool		curr_alnum = pg_u_isalnum(u, true);
+
+		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
+		{
+			size_t		prev_offset = wbstate->offset;
+
+			wbstate->init = true;
+			wbstate->offset += unicode_utf8len(u);
+			wbstate->prev_alnum = curr_alnum;
+			return prev_offset;
+		}
+
+		wbstate->offset += unicode_utf8len(u);
+	}
+
+	return wbstate->len;
+}
+
+static size_t
+strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strlower(dest, destsize, src, srclen);
+}
+
+static size_t
+strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	struct WordBoundaryState wbstate = {
+		.str = src,
+		.len = srclen,
+		.offset = 0,
+		.init = false,
+		.prev_alnum = false,
+	};
+
+	return unicode_strtitle(dest, destsize, src, srclen,
+							initcap_wbnext, &wbstate);
+}
+
+static size_t
+strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strupper(dest, destsize, src, srclen);
+}
+
+static int
+char_properties_builtin(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && pg_u_isdigit(wc, true))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && pg_u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && pg_u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && pg_u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && pg_u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && pg_u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && pg_u_ispunct(wc, true))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && pg_u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+	.char_properties = char_properties_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
@@ -66,6 +193,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 4b7a897e930..839b905c560 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -64,6 +64,11 @@ static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  pg_locale_t locale);
 extern char *get_collation_actual_version_icu(const char *collcollate);
 
+typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
+									 const UChar *src, int32_t srcLength,
+									 const char *locale,
+									 UErrorCode *pErrorCode);
+
 /*
  * Converter object for converting between ICU's UChar strings and C strings
  * in database encoding.  Since the database encoding doesn't change, we only
@@ -73,6 +78,16 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
+
+static size_t strlower_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strtitle_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
+static size_t strupper_icu(char *dest, size_t destsize,
+						   const char *src, ssize_t srclen,
+						   pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -93,8 +108,63 @@ static size_t uchar_length(UConverter *converter,
 static int32_t uchar_convert(UConverter *converter,
 							 UChar *dest, int32_t destlen,
 							 const char *src, int32_t srclen);
+static int32_t icu_to_uchar(UChar **buff_uchar, const char *buff,
+							size_t nbytes);
+static size_t icu_from_uchar(char *dest, size_t destsize,
+							 const UChar *buff_uchar, int32_t len_uchar);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+static int32_t icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+								UChar **buff_dest, UChar *buff_source,
+								int32_t len_source);
+static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+									   const UChar *src, int32_t srcLength,
+									   const char *locale,
+									   UErrorCode *pErrorCode);
+
+static int
+char_properties_icu(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && u_isdigit(wc))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && u_ispunct(wc))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
 
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
@@ -114,6 +184,15 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static const struct ctype_methods ctype_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+	.char_properties = char_properties_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -184,6 +263,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -357,6 +437,66 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
+static size_t
+strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToLower, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToTitle_default_BI, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+static size_t
+strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToUpper, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
 /*
  * strncoll_icu_utf8
  *
@@ -496,7 +636,7 @@ get_collation_actual_version_icu(const char *collcollate)
  * The result string is nul-terminated, though most callers rely on the
  * result length instead.
  */
-int32_t
+static int32_t
 icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
 {
 	int32_t		len_uchar;
@@ -523,8 +663,8 @@ icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
  *
  * The result string is nul-terminated.
  */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+static size_t
+icu_from_uchar(char *dest, size_t destsize, const UChar *buff_uchar, int32_t len_uchar)
 {
 	UErrorCode	status;
 	int32_t		len_result;
@@ -539,10 +679,11 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 				(errmsg("%s failed: %s", "ucnv_fromUChars",
 						u_errorName(status))));
 
-	*result = palloc(len_result + 1);
+	if (len_result + 1 > destsize)
+		return len_result;
 
 	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+	len_result = ucnv_fromUChars(icu_converter, dest, len_result + 1,
 								 buff_uchar, len_uchar, &status);
 	if (U_FAILURE(status) ||
 		status == U_STRING_NOT_TERMINATED_WARNING)
@@ -553,6 +694,43 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	return len_result;
 }
 
+static int32_t
+icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
+{
+	UErrorCode	status;
+	int32_t		len_dest;
+
+	len_dest = len_source;		/* try first with same length */
+	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+	status = U_ZERO_ERROR;
+	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+					mylocale->info.icu.locale, &status);
+	if (status == U_BUFFER_OVERFLOW_ERROR)
+	{
+		/* try again with adjusted length */
+		pfree(*buff_dest);
+		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+		status = U_ZERO_ERROR;
+		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+						mylocale->info.icu.locale, &status);
+	}
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("case conversion failed: %s", u_errorName(status))));
+	return len_dest;
+}
+
+static int32_t
+u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+						const UChar *src, int32_t srcLength,
+						const char *locale,
+						UErrorCode *pErrorCode)
+{
+	return u_strToTitle(dest, destCapacity, src, srcLength,
+						NULL, locale, pErrorCode);
+}
+
 /*
  * strncoll_icu
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index cb519cfb521..38f9164ad98 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,6 +11,9 @@
 
 #include "postgres.h"
 
+#include <limits.h>
+#include <wctype.h>
+
 #include "access/htup_details.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
@@ -57,6 +60,34 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static size_t strlower_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strlower_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+
+static int	char_properties_libc_sb(pg_wchar wc, int mask,
+									   pg_locale_t locale);
+static int	char_properties_libc_mb(pg_wchar wc, int mask,
+									  pg_locale_t locale);
+static pg_wchar toupper_libc_sb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar toupper_libc_mb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_sb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_mb(pg_wchar wc, pg_locale_t locale);
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -78,6 +109,324 @@ static const struct collate_methods collate_methods_libc = {
 #endif
 };
 
+#ifdef WIN32
+static const struct collate_methods collate_methods_libc_win32_utf8 = {
+	.strncoll = strncoll_libc_win32_utf8,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+#endif
+
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
+static const struct ctype_methods ctype_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+	.char_properties = char_properties_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+/*
+ * Non-UTF8 multibyte encodings use multibyte semantics for case mapping, but
+ * single-byte semantics for pattern matching.
+ */
+static const struct ctype_methods ctype_methods_libc_other_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.char_properties = char_properties_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_utf8 = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.char_properties = char_properties_libc_mb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_mb,
+	.wc_tolower = tolower_libc_mb,
+};
+
+static size_t
+strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		if (srclen + 1 > destsize)
+			return srclen;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that tolower_l() will not be so broken as to need
+		 * an isupper_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = tolower_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		int			wasalnum = false;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l()/tolower_l() will not be so broken
+		 * as to need guard tests.  When using the default collation, we apply
+		 * the traditional Postgres behavior that forces ASCII-style treatment
+		 * of I/i, but in non-default collations you get exactly what the
+		 * collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (wasalnum)
+				*p = tolower_l((unsigned char) *p, loc);
+			else
+				*p = toupper_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, loc);
+		}
+	}
+
+	return srclen;
+}
+
+static size_t
+strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	int			wasalnum = false;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+	{
+		if (wasalnum)
+			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		else
+			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		wasalnum = iswalnum_l(workspace[curr_char], loc);
+	}
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l() will not be so broken as to need
+		 * an islower_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+			*p = toupper_l((unsigned char) *p, loc);
+	}
+
+	return srclen;
+}
+
+static size_t
+strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -142,6 +491,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_utf8;
+		else if (pg_database_encoding_max_length() > 1)
+			result->ctype = &ctype_methods_libc_other_mb;
+		else
+			result->ctype = &ctype_methods_libc_sb;
+	}
 
 	return result;
 }
@@ -490,6 +848,113 @@ report_newlocale_failure(const char *localename)
 						localename) : 0)));
 }
 
+static int
+char_properties_libc_sb(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc > (pg_wchar) UCHAR_MAX)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static int
+char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	/* if wchar_t cannot represent the value, just return 0 */
+	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static pg_wchar
+toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 028eec63901..7a509596178 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,10 +12,25 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
 
+/*
+ * Character properties for regular expressions.
+ */
+#define PG_ISDIGIT     0x01
+#define PG_ISALPHA     0x02
+#define PG_ISALNUM     (PG_ISDIGIT | PG_ISALPHA)
+#define PG_ISUPPER     0x04
+#define PG_ISLOWER     0x08
+#define PG_ISGRAPH     0x10
+#define PG_ISPRINT     0x20
+#define PG_ISPUNCT     0x40
+#define PG_ISSPACE     0x80
+
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
@@ -77,6 +92,43 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+struct ctype_methods
+{
+	/* case mapping: LOWER()/INITCAP()/UPPER() */
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* required */
+	int			(*char_properties) (pg_wchar wc, int mask, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/* required */
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -101,6 +153,7 @@ struct pg_locale_struct
 	bool		ctype_is_c;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -126,6 +179,19 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 extern bool is_encoding_supported_by_collprovider(char collprovider, int encoding);
+extern int	char_properties(pg_wchar wc, int mask, pg_locale_t locale);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
+extern size_t pg_strlower(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strtitle(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strupper(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
@@ -145,11 +211,6 @@ extern const char *builtin_validate_locale(int encoding, const char *locale);
 extern void icu_validate_locale(int encoding, const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
-#ifdef USE_ICU
-extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
-extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
-
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
 						 pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2d4c870423a..94b041ec9e9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1823,7 +1823,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.45.2

v9-0008-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v9-0008-Remove-provider-field-from-pg_locale_t.patchDownload

From 553188c7b4963dc68be23b7070441b8a4a9573db Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v9 08/11] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale_builtin.c |  1 -
 src/backend/utils/adt/pg_locale_icu.c     | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c    |  6 ------
 src/include/utils/pg_locale.h             |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 50efcb5e3d3..630adac1bcb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -189,7 +189,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 839b905c560..44c9f66d630 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -255,7 +255,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -512,8 +511,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -541,8 +538,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -587,8 +582,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -755,8 +748,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -805,8 +796,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 38f9164ad98..a789b88432c 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -475,7 +475,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -595,8 +594,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -651,8 +648,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -761,7 +756,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 7a509596178..1ec8d6a0db7 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -147,7 +147,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.45.2

v9-0009-Make-provider-data-in-pg_locale_t-an-opaque-point.patchtext/x-patch; charset=UTF-8; name=v9-0009-Make-provider-data-in-pg_locale_t-an-opaque-point.patchDownload

From 5775c0fe326342169925d7e6f86c254d02adad72 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v9 09/11] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale_builtin.c |  11 +-
 src/backend/utils/adt/pg_locale_icu.c     |  40 +++++--
 src/backend/utils/adt/pg_locale_libc.c    | 131 ++++++++++++++--------
 src/include/utils/pg_locale.h             |  16 +--
 4 files changed, 127 insertions(+), 71 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 630adac1bcb..7dbc6faf430 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -26,6 +26,11 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
 
+struct builtin_provider
+{
+	const char *locale;
+};
+
 struct WordBoundaryState
 {
 	const char *str;
@@ -155,6 +160,7 @@ pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
+	struct builtin_provider *builtin;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -188,7 +194,10 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	builtin = MemoryContextAllocZero(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	result->provider_data = (void *) builtin;
+
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 44c9f66d630..2627b853484 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static int	strncoll_icu(const char *arg1, ssize_t len1,
@@ -202,6 +208,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -253,8 +260,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -510,11 +521,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -538,6 +550,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -551,7 +565,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -582,12 +596,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -694,11 +710,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -706,7 +724,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -748,6 +766,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -770,7 +790,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -796,6 +816,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -815,7 +837,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index a789b88432c..ec01325137b 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -41,6 +42,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -127,17 +133,21 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static const struct ctype_methods ctype_methods_libc_sb = {
@@ -188,7 +198,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -205,7 +215,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = tolower_l((unsigned char) *p, loc);
+			*p = tolower_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -215,7 +225,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -237,7 +248,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -268,7 +279,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -285,10 +296,10 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (wasalnum)
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -299,7 +310,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -324,10 +336,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -359,7 +371,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -373,7 +385,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		 * what the collation says.
 		 */
 		for (p = dest; *p; p++)
-			*p = toupper_l((unsigned char) *p, loc);
+			*p = toupper_l((unsigned char) *p, libc->lt);
 	}
 
 	return srclen;
@@ -383,7 +395,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -405,7 +418,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -433,6 +446,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -471,16 +485,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -594,6 +611,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -624,7 +643,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -648,8 +667,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -658,7 +679,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -756,6 +777,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -800,7 +823,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -847,27 +870,29 @@ char_properties_libc_sb(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc > (pg_wchar) UCHAR_MAX)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -878,6 +903,8 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
@@ -885,21 +912,21 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -908,10 +935,12 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -919,10 +948,12 @@ toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -930,10 +961,12 @@ toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -941,10 +974,12 @@ tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -1036,8 +1071,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1096,8 +1133,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 1ec8d6a0db7..a5b425fd455 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -154,21 +154,7 @@ struct pg_locale_struct
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 typedef struct pg_locale_struct *pg_locale_t;
-- 
2.45.2

v9-0010-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v9-0010-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From 955eeef949c1760872e13a172f62ac8a7458071c Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v9 10/11] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  | 4 ++++
 src/backend/utils/adt/formatting.c    | 4 ----
 src/backend/utils/adt/pg_locale.c     | 4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 1 +
 src/backend/utils/adt/varlena.c       | 4 ++++
 src/include/utils/pg_locale.h         | 4 ----
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8001f5ed082..bfec533dbd0 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 6a0571f93e6..387009a4a9e 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -71,10 +71,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/unicode_case.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9d27567cab7..71cd8647d35 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2627b853484..e6eb31da083 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 533bebc1c7b..37b3506f06c 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a5b425fd455..1225f22131c 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /*
  * Character properties for regular expressions.
  */
-- 
2.45.2

v9-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v9-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patchDownload

From a4d74d22b450359d303a5f08c8a965fb4215bcc5 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 16:10:28 -0700
Subject: [PATCH v9 11/11] Introduce hooks for creating custom pg_locale_t.

Now that collation, case mapping, and ctype behavior is controlled
with a method table, we can hook the behavior.

The hooks can provide their own arbitrary method table, which may be
based on a different version of ICU than what Postgres was built with,
or entirely unrelated to ICU/libc.
---
 src/backend/utils/adt/pg_locale.c | 75 ++++++++++++++++++++++---------
 src/include/utils/pg_locale.h     | 24 ++++++++++
 src/tools/pgindent/typedefs.list  |  2 +
 3 files changed, 79 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 71cd8647d35..79135eb493a 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -103,6 +103,9 @@ extern char *get_collation_actual_version_icu(const char *collcollate);
 /* pg_locale_libc.c */
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
+create_pg_locale_hook_type create_pg_locale_hook = NULL;
+collation_version_hook_type collation_version_hook = NULL;
+
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1190,7 +1193,7 @@ create_pg_locale(Oid collid, MemoryContext context)
 {
 	HeapTuple	tp;
 	Form_pg_collation collform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 	Datum		datum;
 	bool		isnull;
 
@@ -1199,15 +1202,21 @@ create_pg_locale(Oid collid, MemoryContext context)
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(collid, context);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(collid, context);
+
+	if (result == NULL)
+	{
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(collid, context);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	}
 
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
@@ -1270,7 +1279,7 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 
 	Assert(default_locale == NULL);
 
@@ -1280,18 +1289,25 @@ init_database_collation(void)
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
-										  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
-									  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(DEFAULT_COLLATION_OID,
 									   TopMemoryContext);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+
+	if (result == NULL)
+	{
+		if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+											  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+										   TopMemoryContext);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+	}
 
 	ReleaseSysCache(tup);
 
@@ -1360,6 +1376,21 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collation_version_hook != NULL)
+	{
+		char	   *version;
+
+		if (collation_version_hook(collprovider, collcollate, &version))
+			return version;
+	}
+
+	/*
+	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
+	 * and are not expected to change, but track the version anyway.
+	 *
+	 * Note that the character semantics may change for some locales, but the
+	 * collation version only tracks changes to sort order.
+	 */
 	if (collprovider == COLLPROVIDER_BUILTIN)
 		collversion = get_collation_actual_version_builtin(collcollate);
 #ifdef USE_ICU
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 1225f22131c..d7c8c927075 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -155,6 +155,30 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+/*
+ * Hooks to enable custom locale providers.
+ */
+
+/*
+ * Hook create_pg_locale(). Return result (allocated in the given context) to
+ * override; or return NULL to return control to create_pg_locale(). When
+ * creating the default database collation, collid is DEFAULT_COLLATION_OID.
+ */
+typedef pg_locale_t (*create_pg_locale_hook_type) (Oid collid,
+												   MemoryContext context);
+
+/*
+ * Hook get_collation_actual_version(). Set *version out parameter and return
+ * true to override; or return false to return control to
+ * get_collation_actual_version().
+ */
+typedef bool (*collation_version_hook_type) (char collprovider,
+											 const char *collcollate,
+											 char **version);
+
+extern PGDLLIMPORT create_pg_locale_hook_type create_pg_locale_hook;
+extern PGDLLIMPORT collation_version_hook_type collation_version_hook;
+
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 94b041ec9e9..e39fe343d21 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3376,6 +3376,7 @@ cmpEntriesArg
 codes_t
 collation_cache_entry
 collation_cache_hash
+collation_version_hook_type
 color
 colormaprange
 compare_context
@@ -3392,6 +3393,7 @@ core_yyscan_t
 corrupt_items
 cost_qual_eval_context
 cp_hash_func
+create_pg_locale_hook_type
 create_upper_paths_hook_type
 createdb_failure_params
 crosstab_HashEnt
-- 
2.45.2

#11

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Andreas Karlsson (#10)

11 attachment(s)

Re: Collation & ctype method table, and extension hooks

On Mon, 2024-12-02 at 16:39 +0100, Andreas Karlsson wrote:

I feel your first patch in the series is something you can just
commit.

Done.

I combined your patches and mine into the attached v10 series.

I also split out the ctype methods patch into two, so that patch v10-
0005 moves all of the case mapping code into the appropriate provider
files. That should make the ctype methods patch (v10-0007) easier to
review.

Regards,
Jeff Davis

Attachments:

v10-0001-Move-check-for-ucol_strcollUTF8-to-pg_locale_icu.patchtext/x-patch; charset=UTF-8; name=v10-0001-Move-check-for-ucol_strcollUTF8-to-pg_locale_icu.patchDownload

From 41eb2906100d62caf554b5b3c3fd5ff7bd807f82 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 00:55:41 +0100
Subject: [PATCH v10 01/11] Move check for ucol_strcollUTF8 to pg_locale_icu.c

The result of the check is only used by pg_locale_icu.c.
---
 src/backend/utils/adt/pg_locale_icu.c | 12 ++++++++++++
 src/include/utils/pg_locale.h         | 13 -------------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 73eb430d750..2c6b950ec18 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -14,6 +14,18 @@
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
 #include <unicode/ustring.h>
+
+/*
+ * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
+ * (see
+ * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
+ */
+#if U_ICU_VERSION_MAJOR_NUM >= 53
+#define HAVE_UCOL_STRCOLLUTF8 1
+#else
+#undef HAVE_UCOL_STRCOLLUTF8
+#endif
+
 #endif
 
 #include "access/htup_details.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 4d2262b39aa..776f8f6f2fe 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -16,19 +16,6 @@
 #include <unicode/ucol.h>
 #endif
 
-#ifdef USE_ICU
-/*
- * ucol_strcollUTF8() was introduced in ICU 50, but it is buggy before ICU 53.
- * (see
- * <https://www.postgresql.org/message-id/flat/f1438ec6-22aa-4029-9a3b-26f79d330e72%40manitou-mail.org>)
- */
-#if U_ICU_VERSION_MAJOR_NUM >= 53
-#define HAVE_UCOL_STRCOLLUTF8 1
-#else
-#undef HAVE_UCOL_STRCOLLUTF8
-#endif
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.34.1

v10-0002-Move-code-for-collation-version-into-provider-sp.patchtext/x-patch; charset=UTF-8; name=v10-0002-Move-code-for-collation-version-into-provider-sp.patchDownload

From 6ad321ab8f32574e24d3f58e71366414c2cc1add Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 04:44:09 +0100
Subject: [PATCH v10 02/11] Move code for collation version into provider
 specific files

---
 src/backend/utils/adt/pg_locale.c         | 106 +++-------------------
 src/backend/utils/adt/pg_locale_builtin.c |  24 +++++
 src/backend/utils/adt/pg_locale_icu.c     |  17 ++++
 src/backend/utils/adt/pg_locale_libc.c    |  74 +++++++++++++++
 4 files changed, 126 insertions(+), 95 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 4cb56126e97..b2f198314a2 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,10 +69,6 @@
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
 
-#ifdef __GLIBC__
-#include <gnu/libc-version.h>
-#endif
-
 #ifdef WIN32
 #include <shlwapi.h>
 #endif
@@ -91,6 +87,7 @@
 
 /* pg_locale_builtin.c */
 extern pg_locale_t create_pg_locale_builtin(Oid collid, MemoryContext context);
+extern char *get_collation_actual_version_builtin(const char *collcollate);
 
 /* pg_locale_icu.c */
 #ifdef USE_ICU
@@ -104,6 +101,7 @@ extern size_t strnxfrm_icu(char *dest, size_t destsize,
 extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
+extern char *get_collation_actual_version_icu(const char *collcollate);
 #endif
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
@@ -115,6 +113,7 @@ extern int	strncoll_libc(const char *arg1, ssize_t len1,
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
 char	   *locale_messages;
@@ -1370,100 +1369,17 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
-	/*
-	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
-	 * and are not expected to change, but track the version anyway.
-	 *
-	 * Note that the character semantics may change for some locales, but the
-	 * collation version only tracks changes to sort order.
-	 */
 	if (collprovider == COLLPROVIDER_BUILTIN)
-	{
-		if (strcmp(collcollate, "C") == 0)
-			return "1";
-		else if (strcmp(collcollate, "C.UTF-8") == 0)
-			return "1";
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-					 errmsg("invalid locale name \"%s\" for builtin provider",
-							collcollate)));
-	}
-
+		collversion = get_collation_actual_version_builtin(collcollate);
 #ifdef USE_ICU
-	if (collprovider == COLLPROVIDER_ICU)
-	{
-		UCollator  *collator;
-		UVersionInfo versioninfo;
-		char		buf[U_MAX_VERSION_STRING_LENGTH];
-
-		collator = pg_ucol_open(collcollate);
-
-		ucol_getVersion(collator, versioninfo);
-		ucol_close(collator);
-
-		u_versionToString(versioninfo, buf);
-		collversion = pstrdup(buf);
-	}
-	else
-#endif
-		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
-	{
-#if defined(__GLIBC__)
-		/* Use the glibc version because we don't have anything better. */
-		collversion = pstrdup(gnu_get_libc_version());
-#elif defined(LC_VERSION_MASK)
-		locale_t	loc;
-
-		/* Look up FreeBSD collation version. */
-		loc = newlocale(LC_COLLATE_MASK, collcollate, NULL);
-		if (loc)
-		{
-			collversion =
-				pstrdup(querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
-			freelocale(loc);
-		}
-		else
-			ereport(ERROR,
-					(errmsg("could not load locale \"%s\"", collcollate)));
-#elif defined(WIN32)
-		/*
-		 * If we are targeting Windows Vista and above, we can ask for a name
-		 * given a collation name (earlier versions required a location code
-		 * that we don't have).
-		 */
-		NLSVERSIONINFOEX version = {sizeof(NLSVERSIONINFOEX)};
-		WCHAR		wide_collcollate[LOCALE_NAME_MAX_LENGTH];
-
-		MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wide_collcollate,
-							LOCALE_NAME_MAX_LENGTH);
-		if (!GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version))
-		{
-			/*
-			 * GetNLSVersionEx() wants a language tag such as "en-US", not a
-			 * locale name like "English_United States.1252".  Until those
-			 * values can be prevented from entering the system, or 100%
-			 * reliably converted to the more useful tag format, tolerate the
-			 * resulting error and report that we have no version data.
-			 */
-			if (GetLastError() == ERROR_INVALID_PARAMETER)
-				return NULL;
-
-			ereport(ERROR,
-					(errmsg("could not get collation version for locale \"%s\": error code %lu",
-							collcollate,
-							GetLastError())));
-		}
-		collversion = psprintf("%lu.%lu,%lu.%lu",
-							   (version.dwNLSVersion >> 8) & 0xFFFF,
-							   version.dwNLSVersion & 0xFF,
-							   (version.dwDefinedVersion >> 8) & 0xFFFF,
-							   version.dwDefinedVersion & 0xFF);
+	else if (collprovider == COLLPROVIDER_ICU)
+		collversion = get_collation_actual_version_icu(collcollate);
 #endif
-	}
+	else if (collprovider == COLLPROVIDER_LIBC)
+		collversion = get_collation_actual_version_libc(collcollate);
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(collprovider);
 
 	return collversion;
 }
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 4246971a4d8..2e2d78758e1 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -22,6 +22,7 @@
 
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
+extern char *get_collation_actual_version_builtin(const char *collcollate);
 
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
@@ -68,3 +69,26 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	return result;
 }
+
+char *
+get_collation_actual_version_builtin(const char *collcollate)
+{
+	/*
+	 * The only two supported locales (C and C.UTF-8) are both based on memcmp
+	 * and are not expected to change, but track the version anyway.
+	 *
+	 * Note that the character semantics may change for some locales, but the
+	 * collation version only tracks changes to sort order.
+	 */
+	if (strcmp(collcollate, "C") == 0)
+		return "1";
+	else if (strcmp(collcollate, "C.UTF-8") == 0)
+		return "1";
+	else
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("invalid locale name \"%s\" for builtin provider",
+						collcollate)));
+
+	return NULL;				/* keep compiler quiet */
+}
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 2c6b950ec18..158c00a8130 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -61,6 +61,7 @@ extern size_t strnxfrm_icu(char *dest, size_t destsize,
 extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
+extern char *get_collation_actual_version_icu(const char *collcollate);
 
 /*
  * Converter object for converting between ICU's UChar strings and C strings
@@ -446,6 +447,22 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	return result;
 }
 
+char *
+get_collation_actual_version_icu(const char *collcollate)
+{
+	UCollator  *collator;
+	UVersionInfo versioninfo;
+	char		buf[U_MAX_VERSION_STRING_LENGTH];
+
+	collator = pg_ucol_open(collcollate);
+
+	ucol_getVersion(collator, versioninfo);
+	ucol_close(collator);
+
+	u_versionToString(versioninfo, buf);
+	return pstrdup(buf);
+}
+
 /*
  * Convert a string in the database encoding into a string of UChars.
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 374ac37ba0a..fdf5f784551 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -22,6 +22,14 @@
 #include "utils/pg_locale.h"
 #include "utils/syscache.h"
 
+#ifdef __GLIBC__
+#include <gnu/libc-version.h>
+#endif
+
+#ifdef WIN32
+#include <shlwapi.h>
+#endif
+
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
  * allocations in typical cases. This should be large enough that most strings
@@ -38,6 +46,7 @@ extern int	strncoll_libc(const char *arg1, ssize_t len1,
 extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+extern char *get_collation_actual_version_libc(const char *collcollate);
 static locale_t make_libc_collator(const char *collate,
 								   const char *ctype);
 static void report_newlocale_failure(const char *localename);
@@ -283,6 +292,71 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result;
 }
 
+char *
+get_collation_actual_version_libc(const char *collcollate)
+{
+	char	   *collversion = NULL;
+
+	if (pg_strcasecmp("C", collcollate) != 0 &&
+		pg_strncasecmp("C.", collcollate, 2) != 0 &&
+		pg_strcasecmp("POSIX", collcollate) != 0)
+	{
+#if defined(__GLIBC__)
+		/* Use the glibc version because we don't have anything better. */
+		collversion = pstrdup(gnu_get_libc_version());
+#elif defined(LC_VERSION_MASK)
+		locale_t	loc;
+
+		/* Look up FreeBSD collation version. */
+		loc = newlocale(LC_COLLATE_MASK, collcollate, NULL);
+		if (loc)
+		{
+			collversion =
+				pstrdup(querylocale(LC_COLLATE_MASK | LC_VERSION_MASK, loc));
+			freelocale(loc);
+		}
+		else
+			ereport(ERROR,
+					(errmsg("could not load locale \"%s\"", collcollate)));
+#elif defined(WIN32)
+		/*
+		 * If we are targeting Windows Vista and above, we can ask for a name
+		 * given a collation name (earlier versions required a location code
+		 * that we don't have).
+		 */
+		NLSVERSIONINFOEX version = {sizeof(NLSVERSIONINFOEX)};
+		WCHAR		wide_collcollate[LOCALE_NAME_MAX_LENGTH];
+
+		MultiByteToWideChar(CP_ACP, 0, collcollate, -1, wide_collcollate,
+							LOCALE_NAME_MAX_LENGTH);
+		if (!GetNLSVersionEx(COMPARE_STRING, wide_collcollate, &version))
+		{
+			/*
+			 * GetNLSVersionEx() wants a language tag such as "en-US", not a
+			 * locale name like "English_United States.1252".  Until those
+			 * values can be prevented from entering the system, or 100%
+			 * reliably converted to the more useful tag format, tolerate the
+			 * resulting error and report that we have no version data.
+			 */
+			if (GetLastError() == ERROR_INVALID_PARAMETER)
+				return NULL;
+
+			ereport(ERROR,
+					(errmsg("could not get collation version for locale \"%s\": error code %lu",
+							collcollate,
+							GetLastError())));
+		}
+		collversion = psprintf("%lu.%lu,%lu.%lu",
+							   (version.dwNLSVersion >> 8) & 0xFFFF,
+							   version.dwNLSVersion & 0xFF,
+							   (version.dwDefinedVersion >> 8) & 0xFFFF,
+							   version.dwDefinedVersion & 0xFF);
+#endif
+	}
+
+	return collversion;
+}
+
 /*
  * strncoll_libc_win32_utf8
  *
-- 
2.34.1

v10-0003-Move-ICU-database-encoding-check-into-validation.patchtext/x-patch; charset=UTF-8; name=v10-0003-Move-ICU-database-encoding-check-into-validation.patchDownload

From a437302c67a912f5f732986182989a22983c8aa6 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 05:49:03 +0100
Subject: [PATCH v10 03/11] Move ICU database encoding check into validation
 function

This removes some duplicated code while also makes the code for
validating an ICU collation more similar to the code for built-in
collation.
---
 src/backend/commands/collationcmds.c | 16 ++--------------
 src/backend/commands/dbcommands.c    |  8 +-------
 src/backend/utils/adt/pg_locale.c    | 13 ++++++++++++-
 src/include/utils/pg_locale.h        |  2 +-
 4 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53b6a479aa4..8001f5ed082 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -297,7 +297,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				}
 			}
 
-			icu_validate_locale(colllocale);
+			icu_validate_locale(GetDatabaseEncoding(), colllocale);
 		}
 
 		/*
@@ -322,23 +322,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		}
 		else if (collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
 			/*
 			 * We could create ICU collations with collencoding == database
 			 * encoding, but it seems better to use -1 so that it matches the
-			 * way initdb would create ICU collations.  However, only allow
-			 * one to be created when the current database's encoding is
-			 * supported.  Otherwise the collation is useless, plus we get
-			 * surprising behaviors like not being able to drop the collation.
-			 *
-			 * Skip this test when !USE_ICU, because the error we want to
-			 * throw for that isn't thrown till later.
+			 * way initdb would create ICU collations.
 			 */
-			if (!is_encoding_supported_by_icu(GetDatabaseEncoding()))
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("current database's encoding is not supported with this provider")));
-#endif
 			collencoding = -1;
 		}
 		else
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index aa91a396967..fd5e887c3ae 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1116,12 +1116,6 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	}
 	else if (dblocprovider == COLLPROVIDER_ICU)
 	{
-		if (!(is_encoding_supported_by_icu(encoding)))
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("encoding \"%s\" is not supported with ICU provider",
-							pg_encoding_to_char(encoding))));
-
 		/*
 		 * This would happen if template0 uses the libc provider but the new
 		 * database uses icu.
@@ -1151,7 +1145,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			}
 		}
 
-		icu_validate_locale(dblocale);
+		icu_validate_locale(encoding, dblocale);
 	}
 
 	/* for libc, locale comes from datcollate and datctype */
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b2f198314a2..ce255a4b91f 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1719,7 +1719,7 @@ icu_language_tag(const char *loc_str, int elevel)
  * Perform best-effort check that the locale is a valid one.
  */
 void
-icu_validate_locale(const char *loc_str)
+icu_validate_locale(int encoding, const char *loc_str)
 {
 #ifdef USE_ICU
 	UCollator  *collator;
@@ -1728,6 +1728,17 @@ icu_validate_locale(const char *loc_str)
 	bool		found = false;
 	int			elevel = icu_validation_level;
 
+	/*
+	 * Only allow locales to be created when the encoding is supported.
+	 * Otherwise the collation is useless, plus we get surprising behaviors
+	 * like not being able to drop the collation.
+	 */
+	if (!(is_encoding_supported_by_icu(encoding)))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("encoding \"%s\" is not supported with ICU provider",
+						pg_encoding_to_char(encoding))));
+
 	/* no validation */
 	if (elevel < 0)
 		return;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 776f8f6f2fe..be9bb62c4b2 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -109,7 +109,7 @@ extern size_t pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 
 extern int	builtin_locale_encoding(const char *locale);
 extern const char *builtin_validate_locale(int encoding, const char *locale);
-extern void icu_validate_locale(const char *loc_str);
+extern void icu_validate_locale(int encoding, const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
 #ifdef USE_ICU
-- 
2.34.1

v10-0004-Move-provider-specific-code-when-looking-up-loca.patchtext/x-patch; charset=UTF-8; name=v10-0004-Move-provider-specific-code-when-looking-up-loca.patchDownload

From 6f99c49d920da1043e8f4467c6fa0cdebd480268 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 29 Nov 2024 05:49:20 +0100
Subject: [PATCH v10 04/11] Move provider specific code when looking up locales
 into pg_locale.c

---
 src/backend/catalog/namespace.c   | 14 ++++----------
 src/backend/utils/adt/pg_locale.c |  9 +++++++++
 src/include/utils/pg_locale.h     |  1 +
 3 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 30807f91904..6ad40a96334 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -57,6 +57,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/pg_locale.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/varlena.h"
@@ -2346,17 +2347,10 @@ lookup_collation(const char *collname, Oid collnamespace, int32 encoding)
 	if (!HeapTupleIsValid(colltup))
 		return InvalidOid;
 	collform = (Form_pg_collation) GETSTRUCT(colltup);
-	if (collform->collprovider == COLLPROVIDER_ICU)
-	{
-		if (is_encoding_supported_by_icu(encoding))
-			collid = collform->oid;
-		else
-			collid = InvalidOid;
-	}
-	else
-	{
+	if (is_encoding_supported_by_collprovider(collform->collprovider, encoding))
 		collid = collform->oid;
-	}
+	else
+		collid = InvalidOid;
 	ReleaseSysCache(colltup);
 	return collid;
 }
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index ce255a4b91f..1ec1373bbc8 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1384,6 +1384,15 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+bool
+is_encoding_supported_by_collprovider(char collprovider, int encoding)
+{
+	if (collprovider == COLLPROVIDER_ICU)
+		return is_encoding_supported_by_icu(encoding);
+	else
+		return true;
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index be9bb62c4b2..977bcf9fe29 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -93,6 +93,7 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern bool is_encoding_supported_by_collprovider(char collprovider, int encoding);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
-- 
2.34.1

v10-0005-Refactor-case-mapping-into-provider-specific-fil.patchtext/x-patch; charset=UTF-8; name=v10-0005-Refactor-case-mapping-into-provider-specific-fil.patchDownload

From 0743580180ed344cb831109f17f148026db74970 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 29 Nov 2024 09:37:36 -0800
Subject: [PATCH v10 05/11] Refactor case mapping into provider-specific files.

Previously, formatting.c contained a lot of provider-specific code to
implement LOWER(), INITCAP(), and UPPER().

Move that into pg_locale_builtin.c, pg_locale_icu.c, and
pg_locale_libc.c as appropriate. Create API entry points
pg_strlower(), etc., that work with any provider and can be called
from formatting.c.
---
 src/backend/utils/adt/formatting.c        | 465 +++-------------------
 src/backend/utils/adt/pg_locale.c         |  78 ++++
 src/backend/utils/adt/pg_locale_builtin.c |  80 ++++
 src/backend/utils/adt/pg_locale_icu.c     | 130 +++++-
 src/backend/utils/adt/pg_locale_libc.c    | 327 +++++++++++++++
 src/include/utils/pg_locale.h             |  14 +-
 6 files changed, 676 insertions(+), 418 deletions(-)

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 2bcc185708c..6a0571f93e6 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1570,52 +1570,6 @@ str_numth(char *dest, char *num, int type)
  *			upper/lower/initcap functions
  *****************************************************************************/
 
-#ifdef USE_ICU
-
-typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
-									 const UChar *src, int32_t srcLength,
-									 const char *locale,
-									 UErrorCode *pErrorCode);
-
-static int32_t
-icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
-				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
-{
-	UErrorCode	status;
-	int32_t		len_dest;
-
-	len_dest = len_source;		/* try first with same length */
-	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-	status = U_ZERO_ERROR;
-	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
-	if (status == U_BUFFER_OVERFLOW_ERROR)
-	{
-		/* try again with adjusted length */
-		pfree(*buff_dest);
-		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
-		status = U_ZERO_ERROR;
-		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
-	}
-	if (U_FAILURE(status))
-		ereport(ERROR,
-				(errmsg("case conversion failed: %s", u_errorName(status))));
-	return len_dest;
-}
-
-static int32_t
-u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
-						const UChar *src, int32_t srcLength,
-						const char *locale,
-						UErrorCode *pErrorCode)
-{
-	return u_strToTitle(dest, destCapacity, src, srcLength,
-						NULL, locale, pErrorCode);
-}
-
-#endif							/* USE_ICU */
-
 /*
  * If the system provides the needed functions for wide-character manipulation
  * (which are all standardized by C99), then we implement upper/lower/initcap
@@ -1663,106 +1617,28 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			int32_t		len_uchar;
-			int32_t		len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToLower, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
-		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strlower(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strlower(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strlower(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
 
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that tolower_l() will not be so broken as
-				 * to need an isupper_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (mylocale->is_default)
-						*p = pg_tolower((unsigned char) *p);
-					else
-						*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
@@ -1805,152 +1681,33 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToUpper, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strupper(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strupper(dst, dstsize, src, srclen);
-			if (needed + 1 > dstsize)
-			{
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strupper(dst, dstsize, src, srclen);
-				Assert(needed + 1 == dstsize);
-			}
-
-			Assert(dst[needed] == '\0');
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-					workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
 
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l() will not be so broken as
-				 * to need an islower_l() guard test.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (mylocale->is_default)
-						*p = pg_toupper((unsigned char) *p);
-					else
-						*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
 }
 
-struct WordBoundaryState
-{
-	const char *str;
-	size_t		len;
-	size_t		offset;
-	bool		init;
-	bool		prev_alnum;
-};
-
-/*
- * Simple word boundary iterator that draws boundaries each time the result of
- * pg_u_isalnum() changes.
- */
-static size_t
-initcap_wbnext(void *state)
-{
-	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
-
-	while (wbstate->offset < wbstate->len &&
-		   wbstate->str[wbstate->offset] != '\0')
-	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
-										wbstate->offset);
-		bool		curr_alnum = pg_u_isalnum(u, true);
-
-		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
-		{
-			size_t		prev_offset = wbstate->offset;
-
-			wbstate->init = true;
-			wbstate->offset += unicode_utf8len(u);
-			wbstate->prev_alnum = curr_alnum;
-			return prev_offset;
-		}
-
-		wbstate->offset += unicode_utf8len(u);
-	}
-
-	return wbstate->len;
-}
-
 /*
  * collation-aware, wide-character-aware initcap function
  *
@@ -1961,7 +1718,6 @@ char *
 str_initcap(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
-	int			wasalnum = false;
 	pg_locale_t mylocale;
 
 	if (!buff)
@@ -1989,135 +1745,28 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
+		const char *src = buff;
+		size_t		srclen = nbytes;
+		size_t		dstsize;
+		char	   *dst;
+		size_t		needed;
+
+		/* first try buffer of equal size plus terminating NUL */
+		dstsize = srclen + 1;
+		dst = palloc(dstsize);
+
+		needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+		if (needed + 1 > dstsize)
 		{
-			int32_t		len_uchar,
-						len_conv;
-			UChar	   *buff_uchar;
-			UChar	   *buff_conv;
-
-			len_uchar = icu_to_uchar(&buff_uchar, buff, nbytes);
-			len_conv = icu_convert_case(u_strToTitle_default_BI, mylocale,
-										&buff_conv, buff_uchar, len_uchar);
-			icu_from_uchar(&result, buff_conv, len_conv);
-			pfree(buff_uchar);
-			pfree(buff_conv);
+			/* grow buffer if needed and retry */
+			dstsize = needed + 1;
+			dst = repalloc(dst, dstsize);
+			needed = pg_strtitle(dst, dstsize, src, srclen, mylocale);
+			Assert(needed + 1 <= dstsize);
 		}
-		else
-#endif
-		if (mylocale->provider == COLLPROVIDER_BUILTIN)
-		{
-			const char *src = buff;
-			size_t		srclen = nbytes;
-			size_t		dstsize;
-			char	   *dst;
-			size_t		needed;
-			struct WordBoundaryState wbstate = {
-				.str = src,
-				.len = srclen,
-				.offset = 0,
-				.init = false,
-				.prev_alnum = false,
-			};
-
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-
-			/* first try buffer of equal size plus terminating NUL */
-			dstsize = srclen + 1;
-			dst = palloc(dstsize);
-
-			needed = unicode_strtitle(dst, dstsize, src, srclen,
-									  initcap_wbnext, &wbstate);
-			if (needed + 1 > dstsize)
-			{
-				/* reset iterator */
-				wbstate.offset = 0;
-				wbstate.init = false;
-
-				/* grow buffer if needed and retry */
-				dstsize = needed + 1;
-				dst = repalloc(dst, dstsize);
-				needed = unicode_strtitle(dst, dstsize, src, srclen,
-										  initcap_wbnext, &wbstate);
-				Assert(needed + 1 == dstsize);
-			}
 
-			result = dst;
-		}
-		else
-		{
-			Assert(mylocale->provider == COLLPROVIDER_LIBC);
-
-			if (pg_database_encoding_max_length() > 1)
-			{
-				wchar_t    *workspace;
-				size_t		curr_char;
-				size_t		result_size;
-
-				/* Overflow paranoia */
-				if ((nbytes + 1) > (INT_MAX / sizeof(wchar_t)))
-					ereport(ERROR,
-							(errcode(ERRCODE_OUT_OF_MEMORY),
-							 errmsg("out of memory")));
-
-				/* Output workspace cannot have more codes than input bytes */
-				workspace = (wchar_t *) palloc((nbytes + 1) * sizeof(wchar_t));
-
-				char2wchar(workspace, nbytes + 1, buff, nbytes, mylocale);
-
-				for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-				{
-					if (wasalnum)
-						workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
-					else
-						workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
-					wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
-				}
-
-				/*
-				 * Make result large enough; case change might change number
-				 * of bytes
-				 */
-				result_size = curr_char * pg_database_encoding_max_length() + 1;
-				result = palloc(result_size);
-
-				wchar2char(result, workspace, result_size, mylocale);
-				pfree(workspace);
-			}
-			else
-			{
-				char	   *p;
-
-				result = pnstrdup(buff, nbytes);
-
-				/*
-				 * Note: we assume that toupper_l()/tolower_l() will not be so
-				 * broken as to need guard tests.  When using the default
-				 * collation, we apply the traditional Postgres behavior that
-				 * forces ASCII-style treatment of I/i, but in non-default
-				 * collations you get exactly what the collation says.
-				 */
-				for (p = result; *p; p++)
-				{
-					if (mylocale->is_default)
-					{
-						if (wasalnum)
-							*p = pg_tolower((unsigned char) *p);
-						else
-							*p = pg_toupper((unsigned char) *p);
-					}
-					else
-					{
-						if (wasalnum)
-							*p = tolower_l((unsigned char) *p, mylocale->info.lt);
-						else
-							*p = toupper_l((unsigned char) *p, mylocale->info.lt);
-					}
-					wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
-				}
-			}
-		}
+		Assert(dst[needed] == '\0');
+		result = dst;
 	}
 
 	return result;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1ec1373bbc8..06484f01403 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -115,6 +115,27 @@ extern size_t strnxfrm_libc(char *dest, size_t destsize,
 							pg_locale_t locale);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
+extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
+							   ssize_t srclen, pg_locale_t locale);
+extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
+							   ssize_t srclen, pg_locale_t locale);
+extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
+							   ssize_t srclen, pg_locale_t locale);
+
+extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+
+extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
+							ssize_t srclen, pg_locale_t locale);
+extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
+							ssize_t srclen, pg_locale_t locale);
+extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
+							ssize_t srclen, pg_locale_t locale);
+
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1393,6 +1414,63 @@ is_encoding_supported_by_collprovider(char collprovider, int encoding)
 		return true;
 }
 
+size_t
+pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	if (locale->provider == COLLPROVIDER_BUILTIN)
+		return strlower_builtin(dst, dstsize, src, srclen, locale);
+#ifdef USE_ICU
+	else if (locale->provider == COLLPROVIDER_ICU)
+		return strlower_icu(dst, dstsize, src, srclen, locale);
+#endif
+	else if (locale->provider == COLLPROVIDER_LIBC)
+		return strlower_libc(dst, dstsize, src, srclen, locale);
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
+
+	return 0;					/* keep compiler quiet */
+}
+
+size_t
+pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	if (locale->provider == COLLPROVIDER_BUILTIN)
+		return strtitle_builtin(dst, dstsize, src, srclen, locale);
+#ifdef USE_ICU
+	else if (locale->provider == COLLPROVIDER_ICU)
+		return strtitle_icu(dst, dstsize, src, srclen, locale);
+#endif
+	else if (locale->provider == COLLPROVIDER_LIBC)
+		return strtitle_libc(dst, dstsize, src, srclen, locale);
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
+
+	return 0;					/* keep compiler quiet */
+}
+
+size_t
+pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
+			pg_locale_t locale)
+{
+	if (locale->provider == COLLPROVIDER_BUILTIN)
+		return strupper_builtin(dst, dstsize, src, srclen, locale);
+#ifdef USE_ICU
+	else if (locale->provider == COLLPROVIDER_ICU)
+		return strupper_icu(dst, dstsize, src, srclen, locale);
+#endif
+	else if (locale->provider == COLLPROVIDER_LIBC)
+		return strupper_libc(dst, dstsize, src, srclen, locale);
+	else
+		/* shouldn't happen */
+		PGLOCALE_SUPPORT_ERROR(locale->provider);
+
+	return 0;					/* keep compiler quiet */
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 2e2d78758e1..c060b89940d 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -13,6 +13,8 @@
 
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
+#include "common/unicode_case.h"
+#include "common/unicode_category.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
@@ -23,6 +25,84 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
+extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
+							   ssize_t srclen, pg_locale_t locale);
+extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
+							   ssize_t srclen, pg_locale_t locale);
+extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
+							   ssize_t srclen, pg_locale_t locale);
+
+
+struct WordBoundaryState
+{
+	const char *str;
+	size_t		len;
+	size_t		offset;
+	bool		init;
+	bool		prev_alnum;
+};
+
+/*
+ * Simple word boundary iterator that draws boundaries each time the result of
+ * pg_u_isalnum() changes.
+ */
+static size_t
+initcap_wbnext(void *state)
+{
+	struct WordBoundaryState *wbstate = (struct WordBoundaryState *) state;
+
+	while (wbstate->offset < wbstate->len &&
+		   wbstate->str[wbstate->offset] != '\0')
+	{
+		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+										wbstate->offset);
+		bool		curr_alnum = pg_u_isalnum(u, true);
+
+		if (!wbstate->init || curr_alnum != wbstate->prev_alnum)
+		{
+			size_t		prev_offset = wbstate->offset;
+
+			wbstate->init = true;
+			wbstate->offset += unicode_utf8len(u);
+			wbstate->prev_alnum = curr_alnum;
+			return prev_offset;
+		}
+
+		wbstate->offset += unicode_utf8len(u);
+	}
+
+	return wbstate->len;
+}
+
+size_t
+strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strlower(dest, destsize, src, srclen);
+}
+
+size_t
+strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	struct WordBoundaryState wbstate = {
+		.str = src,
+		.len = srclen,
+		.offset = 0,
+		.init = false,
+		.prev_alnum = false,
+	};
+
+	return unicode_strtitle(dest, destsize, src, srclen,
+							initcap_wbnext, &wbstate);
+}
+
+size_t
+strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	return unicode_strupper(dest, destsize, src, srclen);
+}
 
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 158c00a8130..6d83705ee42 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -48,6 +48,12 @@
 #define		TEXTBUFLEN			1024
 
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
+extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
 
 #ifdef USE_ICU
 
@@ -63,6 +69,11 @@ extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  pg_locale_t locale);
 extern char *get_collation_actual_version_icu(const char *collcollate);
 
+typedef int32_t (*ICU_Convert_Func) (UChar *dest, int32_t destCapacity,
+									 const UChar *src, int32_t srcLength,
+									 const char *locale,
+									 UErrorCode *pErrorCode);
+
 /*
  * Converter object for converting between ICU's UChar strings and C strings
  * in database encoding.  Since the database encoding doesn't change, we only
@@ -84,8 +95,19 @@ static size_t uchar_length(UConverter *converter,
 static int32_t uchar_convert(UConverter *converter,
 							 UChar *dest, int32_t destlen,
 							 const char *src, int32_t srclen);
+static int32_t icu_to_uchar(UChar **buff_uchar, const char *buff,
+							size_t nbytes);
+static size_t icu_from_uchar(char *dest, size_t destsize,
+							 const UChar *buff_uchar, int32_t len_uchar);
 static void icu_set_collation_attributes(UCollator *collator, const char *loc,
 										 UErrorCode *status);
+static int32_t icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+								UChar **buff_dest, UChar *buff_source,
+								int32_t len_source);
+static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+									   const UChar *src, int32_t srcLength,
+									   const char *locale,
+									   UErrorCode *pErrorCode);
 #endif
 
 pg_locale_t
@@ -325,6 +347,66 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
+size_t
+strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToLower, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+size_t
+strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToTitle_default_BI, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
+size_t
+strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
+			 pg_locale_t locale)
+{
+	int32_t		len_uchar;
+	int32_t		len_conv;
+	UChar	   *buff_uchar;
+	UChar	   *buff_conv;
+	size_t		result_len;
+
+	len_uchar = icu_to_uchar(&buff_uchar, src, srclen);
+	len_conv = icu_convert_case(u_strToUpper, locale,
+								&buff_conv, buff_uchar, len_uchar);
+	result_len = icu_from_uchar(dest, destsize, buff_conv, len_conv);
+	pfree(buff_uchar);
+	pfree(buff_conv);
+
+	return result_len;
+}
+
 /*
  * strncoll_icu
  *
@@ -475,7 +557,7 @@ get_collation_actual_version_icu(const char *collcollate)
  * The result string is nul-terminated, though most callers rely on the
  * result length instead.
  */
-int32_t
+static int32_t
 icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
 {
 	int32_t		len_uchar;
@@ -502,8 +584,8 @@ icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes)
  *
  * The result string is nul-terminated.
  */
-int32_t
-icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
+static size_t
+icu_from_uchar(char *dest, size_t destsize, const UChar *buff_uchar, int32_t len_uchar)
 {
 	UErrorCode	status;
 	int32_t		len_result;
@@ -518,10 +600,11 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 				(errmsg("%s failed: %s", "ucnv_fromUChars",
 						u_errorName(status))));
 
-	*result = palloc(len_result + 1);
+	if (len_result + 1 > destsize)
+		return len_result;
 
 	status = U_ZERO_ERROR;
-	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
+	len_result = ucnv_fromUChars(icu_converter, dest, len_result + 1,
 								 buff_uchar, len_uchar, &status);
 	if (U_FAILURE(status) ||
 		status == U_STRING_NOT_TERMINATED_WARNING)
@@ -532,6 +615,43 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	return len_result;
 }
 
+static int32_t
+icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
+				 UChar **buff_dest, UChar *buff_source, int32_t len_source)
+{
+	UErrorCode	status;
+	int32_t		len_dest;
+
+	len_dest = len_source;		/* try first with same length */
+	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+	status = U_ZERO_ERROR;
+	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+					mylocale->info.icu.locale, &status);
+	if (status == U_BUFFER_OVERFLOW_ERROR)
+	{
+		/* try again with adjusted length */
+		pfree(*buff_dest);
+		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
+		status = U_ZERO_ERROR;
+		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
+						mylocale->info.icu.locale, &status);
+	}
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("case conversion failed: %s", u_errorName(status))));
+	return len_dest;
+}
+
+static int32_t
+u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
+						const UChar *src, int32_t srcLength,
+						const char *locale,
+						UErrorCode *pErrorCode)
+{
+	return u_strToTitle(dest, destCapacity, src, srcLength,
+						NULL, locale, pErrorCode);
+}
+
 /*
  * strncoll_icu_no_utf8
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index fdf5f784551..a46c6326854 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -11,6 +11,9 @@
 
 #include "postgres.h"
 
+#include <limits.h>
+#include <wctype.h>
+
 #include "access/htup_details.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_collation.h"
@@ -40,6 +43,13 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
+extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
+							ssize_t srclen, pg_locale_t locale);
+extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
+							ssize_t srclen, pg_locale_t locale);
+extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
+							ssize_t srclen, pg_locale_t locale);
+
 extern int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -57,6 +67,323 @@ static int	strncoll_libc_win32_utf8(const char *arg1, ssize_t len1,
 									 pg_locale_t locale);
 #endif
 
+static size_t strlower_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strlower_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strtitle_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_sb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+static size_t strupper_libc_mb(char *dest, size_t destsize,
+							   const char *src, ssize_t srclen,
+							   pg_locale_t locale);
+
+size_t
+strlower_libc(char *dst, size_t dstsize, const char *src,
+			  ssize_t srclen, pg_locale_t locale)
+{
+	if (pg_database_encoding_max_length() > 1)
+		return strlower_libc_mb(dst, dstsize, src, srclen, locale);
+	else
+		return strlower_libc_sb(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+strtitle_libc(char *dst, size_t dstsize, const char *src,
+			  ssize_t srclen, pg_locale_t locale)
+{
+	if (pg_database_encoding_max_length() > 1)
+		return strtitle_libc_mb(dst, dstsize, src, srclen, locale);
+	else
+		return strtitle_libc_sb(dst, dstsize, src, srclen, locale);
+}
+
+size_t
+strupper_libc(char *dst, size_t dstsize, const char *src,
+			  ssize_t srclen, pg_locale_t locale)
+{
+	if (pg_database_encoding_max_length() > 1)
+		return strupper_libc_mb(dst, dstsize, src, srclen, locale);
+	else
+		return strupper_libc_sb(dst, dstsize, src, srclen, locale);
+}
+
+static size_t
+strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		if (srclen + 1 > destsize)
+			return srclen;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that tolower_l() will not be so broken as to need
+		 * an isupper_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (locale->is_default)
+				*p = pg_tolower((unsigned char) *p);
+			else
+				*p = tolower_l((unsigned char) *p, loc);
+		}
+	}
+
+	return srclen;
+}
+
+static size_t
+strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		int			wasalnum = false;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l()/tolower_l() will not be so broken
+		 * as to need guard tests.  When using the default collation, we apply
+		 * the traditional Postgres behavior that forces ASCII-style treatment
+		 * of I/i, but in non-default collations you get exactly what the
+		 * collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (locale->is_default)
+			{
+				if (wasalnum)
+					*p = pg_tolower((unsigned char) *p);
+				else
+					*p = pg_toupper((unsigned char) *p);
+			}
+			else
+			{
+				if (wasalnum)
+					*p = tolower_l((unsigned char) *p, loc);
+				else
+					*p = toupper_l((unsigned char) *p, loc);
+			}
+			wasalnum = isalnum_l((unsigned char) *p, loc);
+		}
+	}
+
+	return srclen;
+}
+
+static size_t
+strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	int			wasalnum = false;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+	{
+		if (wasalnum)
+			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		else
+			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		wasalnum = iswalnum_l(workspace[curr_char], loc);
+	}
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
+static size_t
+strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	if (srclen + 1 <= destsize)
+	{
+		locale_t	loc = locale->info.lt;
+		char	   *p;
+
+		memcpy(dest, src, srclen);
+		dest[srclen] = '\0';
+
+		/*
+		 * Note: we assume that toupper_l() will not be so broken as to need
+		 * an islower_l() guard test.  When using the default collation, we
+		 * apply the traditional Postgres behavior that forces ASCII-style
+		 * treatment of I/i, but in non-default collations you get exactly
+		 * what the collation says.
+		 */
+		for (p = dest; *p; p++)
+		{
+			if (locale->is_default)
+				*p = pg_toupper((unsigned char) *p);
+			else
+				*p = toupper_l((unsigned char) *p, loc);
+		}
+	}
+
+	return srclen;
+}
+
+static size_t
+strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
+				 pg_locale_t locale)
+{
+	locale_t	loc = locale->info.lt;
+	size_t		result_size;
+	wchar_t    *workspace;
+	char	   *result;
+	size_t		curr_char;
+	size_t		max_size;
+
+	if (srclen < 0)
+		srclen = strlen(src);
+
+	/* Overflow paranoia */
+	if ((srclen + 1) > (INT_MAX / sizeof(wchar_t)))
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+
+	/* Output workspace cannot have more codes than input bytes */
+	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
+
+	char2wchar(workspace, srclen + 1, src, srclen, locale);
+
+	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
+		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+
+	/*
+	 * Make result large enough; case change might change number of bytes
+	 */
+	max_size = curr_char * pg_database_encoding_max_length();
+	result = palloc(max_size + 1);
+
+	result_size = wchar2char(result, workspace, max_size + 1, locale);
+
+	if (result_size + 1 > destsize)
+		return result_size;
+
+	memcpy(dest, result, result_size);
+	dest[result_size] = '\0';
+
+	pfree(workspace);
+	pfree(result);
+
+	return result_size;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 977bcf9fe29..14a17e4869d 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -94,6 +94,15 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 extern bool is_encoding_supported_by_collprovider(char collprovider, int encoding);
+extern size_t pg_strlower(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strtitle(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
+extern size_t pg_strupper(char *dest, size_t destsize,
+						  const char *src, ssize_t srclen,
+						  pg_locale_t locale);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
@@ -113,11 +122,6 @@ extern const char *builtin_validate_locale(int encoding, const char *locale);
 extern void icu_validate_locale(int encoding, const char *loc_str);
 extern char *icu_language_tag(const char *loc_str, int elevel);
 
-#ifdef USE_ICU
-extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
-extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
-
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
 						 pg_locale_t locale);
-- 
2.34.1

v10-0006-Control-collation-behavior-with-a-method-table.patchtext/x-patch; charset=UTF-8; name=v10-0006-Control-collation-behavior-with-a-method-table.patchDownload

From 7efe275f13bec841534d1959cd3288176c7b392a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 26 Sep 2024 11:27:29 -0700
Subject: [PATCH v10 06/11] Control collation behavior with a method table.

Previously, behavior branched based on the provider.

A method table is less error prone and easier to hook.
---
 src/backend/utils/adt/pg_locale.c      | 123 +++------------------
 src/backend/utils/adt/pg_locale_icu.c  | 147 +++++++++++++++----------
 src/backend/utils/adt/pg_locale_libc.c |  40 +++++--
 src/include/utils/pg_locale.h          |  33 ++++++
 4 files changed, 167 insertions(+), 176 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 06484f01403..735335b556a 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -92,27 +92,12 @@ extern char *get_collation_actual_version_builtin(const char *collcollate);
 /* pg_locale_icu.c */
 #ifdef USE_ICU
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
-						 const char *arg2, ssize_t len2,
-						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
-						   const char *src, ssize_t srclen,
-						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
-								  const char *src, ssize_t srclen,
-								  pg_locale_t locale);
 extern char *get_collation_actual_version_icu(const char *collcollate);
 #endif
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 /* pg_locale_libc.c */
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
-						  const char *arg2, ssize_t len2,
-						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
 extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
@@ -1244,6 +1229,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 
 	result->is_default = false;
 
+	Assert((result->collate_is_c && result->collate == NULL) ||
+		   (!result->collate_is_c && result->collate != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1479,19 +1467,7 @@ pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 int
 pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, -1, arg2, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, -1, arg2, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, -1, arg2, -1, locale);
 }
 
 /*
@@ -1512,51 +1488,25 @@ int
 pg_strncoll(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 			pg_locale_t locale)
 {
-	int			result;
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strncoll_libc(arg1, len1, arg2, len2, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strncoll_icu(arg1, len1, arg2, len2, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strncoll(arg1, len1, arg2, len2, locale);
 }
 
 /*
  * Return true if the collation provider supports pg_strxfrm() and
  * pg_strnxfrm(); otherwise false.
  *
- * Unfortunately, it seems that strxfrm() for non-C collations is broken on
- * many common platforms; testing of multiple versions of glibc reveals that,
- * for many locales, strcoll() and strxfrm() do not return consistent
- * results. While no other libc other than Cygwin has so far been shown to
- * have a problem, we take the conservative course of action for right now and
- * disable this categorically.  (Users who are certain this isn't a problem on
- * their system can define TRUST_STRXFRM.)
  *
  * No similar problem is known for the ICU provider.
  */
 bool
 pg_strxfrm_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-#ifdef TRUST_STRXFRM
-		return true;
-#else
-		return false;
-#endif
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	/*
+	 * locale->collate->strnxfrm is still a required method, even if it may
+	 * have the wrong behavior, because the planner uses it for estimates in
+	 * some cases.
+	 */
+	return locale->collate->strxfrm_is_safe;
 }
 
 /*
@@ -1567,19 +1517,7 @@ pg_strxfrm_enabled(pg_locale_t locale)
 size_t
 pg_strxfrm(char *dest, const char *src, size_t destsize, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, -1, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, -1, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1605,19 +1543,7 @@ size_t
 pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-	if (locale->provider == COLLPROVIDER_LIBC)
-		result = strnxfrm_libc(dest, destsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_icu(dest, destsize, src, srclen, locale);
-#endif
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm(dest, destsize, src, srclen, locale);
 }
 
 /*
@@ -1627,15 +1553,7 @@ pg_strnxfrm(char *dest, size_t destsize, const char *src, ssize_t srclen,
 bool
 pg_strxfrm_prefix_enabled(pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_LIBC)
-		return false;
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return true;
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return false;				/* keep compiler quiet */
+	return (locale->collate->strnxfrm_prefix != NULL);
 }
 
 /*
@@ -1647,7 +1565,7 @@ size_t
 pg_strxfrm_prefix(char *dest, const char *src, size_t destsize,
 				  pg_locale_t locale)
 {
-	return pg_strnxfrm_prefix(dest, destsize, src, -1, locale);
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, -1, locale);
 }
 
 /*
@@ -1672,16 +1590,7 @@ size_t
 pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 				   ssize_t srclen, pg_locale_t locale)
 {
-	size_t		result = 0;		/* keep compiler quiet */
-
-#ifdef USE_ICU
-	if (locale->provider == COLLPROVIDER_ICU)
-		result = strnxfrm_prefix_icu(dest, destsize, src, -1, locale);
-	else
-#endif
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return result;
+	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 6d83705ee42..0a032d9a923 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -58,13 +58,14 @@ extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
-extern int	strncoll_icu(const char *arg1, ssize_t len1,
+
+static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
-extern size_t strnxfrm_icu(char *dest, size_t destsize,
+static size_t strnxfrm_icu(char *dest, size_t destsize,
 						   const char *src, ssize_t srclen,
 						   pg_locale_t locale);
-extern size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
 								  const char *src, ssize_t srclen,
 								  pg_locale_t locale);
 extern char *get_collation_actual_version_icu(const char *collcollate);
@@ -83,12 +84,20 @@ static UConverter *icu_converter = NULL;
 
 static UCollator *make_icu_collator(const char *iculocstr,
 									const char *icurules);
-static int	strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-								 const char *arg2, ssize_t len2,
-								 pg_locale_t locale);
-static size_t strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-										  const char *src, ssize_t srclen,
-										  pg_locale_t locale);
+static int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
+static size_t strnxfrm_prefix_icu(char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
+#ifdef HAVE_UCOL_STRCOLLUTF8
+static int	strncoll_icu_utf8(const char *arg1, ssize_t len1,
+							  const char *arg2, ssize_t len2,
+							  pg_locale_t locale);
+#endif
+static size_t strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+									   const char *src, ssize_t srclen,
+									   pg_locale_t locale);
 static void init_icu_converter(void);
 static size_t uchar_length(UConverter *converter,
 						   const char *str, int32_t len);
@@ -108,6 +117,25 @@ static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 									   const UChar *src, int32_t srcLength,
 									   const char *locale,
 									   UErrorCode *pErrorCode);
+
+static const struct collate_methods collate_methods_icu = {
+	.strncoll = strncoll_icu,
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu,
+	.strxfrm_is_safe = true,
+};
+
+static const struct collate_methods collate_methods_icu_utf8 = {
+#ifdef HAVE_UCOL_STRCOLLUTF8
+	.strncoll = strncoll_icu_utf8,
+#else
+	.strncoll = strncoll_icu,
+#endif
+	.strnxfrm = strnxfrm_icu,
+	.strnxfrm_prefix = strnxfrm_prefix_icu_utf8,
+	.strxfrm_is_safe = true,
+};
+
 #endif
 
 pg_locale_t
@@ -174,6 +202,10 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
+	if (GetDatabaseEncoding() == PG_UTF8)
+		result->collate = &collate_methods_icu_utf8;
+	else
+		result->collate = &collate_methods_icu;
 
 	return result;
 #else
@@ -408,42 +440,36 @@ strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 }
 
 /*
- * strncoll_icu
+ * strncoll_icu_utf8
  *
  * Call ucol_strcollUTF8() or ucol_strcoll() as appropriate for the given
  * database encoding. An argument length of -1 means the string is
  * NUL-terminated.
  */
+#ifdef HAVE_UCOL_STRCOLLUTF8
 int
-strncoll_icu(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
-			 pg_locale_t locale)
+strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
+				  pg_locale_t locale)
 {
 	int			result;
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-#ifdef HAVE_UCOL_STRCOLLUTF8
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		status = U_ZERO_ERROR;
-		result = ucol_strcollUTF8(locale->info.icu.ucol,
-								  arg1, len1,
-								  arg2, len2,
-								  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("collation failed: %s", u_errorName(status))));
-	}
-	else
-#endif
-	{
-		result = strncoll_icu_no_utf8(arg1, len1, arg2, len2, locale);
-	}
+	status = U_ZERO_ERROR;
+	result = ucol_strcollUTF8(locale->info.icu.ucol,
+							  arg1, len1,
+							  arg2, len2,
+							  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("collation failed: %s", u_errorName(status))));
 
 	return result;
 }
+#endif
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
@@ -494,37 +520,32 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 size_t
-strnxfrm_prefix_icu(char *dest, size_t destsize,
-					const char *src, ssize_t srclen,
-					pg_locale_t locale)
+strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
+						 const char *src, ssize_t srclen,
+						 pg_locale_t locale)
 {
 	size_t		result;
+	UCharIterator iter;
+	uint32_t	state[2];
+	UErrorCode	status;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
 
-	if (GetDatabaseEncoding() == PG_UTF8)
-	{
-		UCharIterator iter;
-		uint32_t	state[2];
-		UErrorCode	status;
+	Assert(GetDatabaseEncoding() == PG_UTF8);
 
-		uiter_setUTF8(&iter, src, srclen);
-		state[0] = state[1] = 0;	/* won't need that again */
-		status = U_ZERO_ERROR;
-		result = ucol_nextSortKeyPart(locale->info.icu.ucol,
-									  &iter,
-									  state,
-									  (uint8_t *) dest,
-									  destsize,
-									  &status);
-		if (U_FAILURE(status))
-			ereport(ERROR,
-					(errmsg("sort key generation failed: %s",
-							u_errorName(status))));
-	}
-	else
-		result = strnxfrm_prefix_icu_no_utf8(dest, destsize, src, srclen,
-											 locale);
+	uiter_setUTF8(&iter, src, srclen);
+	state[0] = state[1] = 0;	/* won't need that again */
+	status = U_ZERO_ERROR;
+	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+								  &iter,
+								  state,
+								  (uint8_t *) dest,
+								  destsize,
+								  &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("sort key generation failed: %s",
+						u_errorName(status))));
 
 	return result;
 }
@@ -653,7 +674,7 @@ u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 }
 
 /*
- * strncoll_icu_no_utf8
+ * strncoll_icu
  *
  * Convert the arguments from the database encoding to UChar strings, then
  * call ucol_strcoll(). An argument length of -1 means that the string is
@@ -663,8 +684,8 @@ u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
  * caller should call that instead.
  */
 static int
-strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
-					 const char *arg2, ssize_t len2, pg_locale_t locale)
+strncoll_icu(const char *arg1, ssize_t len1,
+			 const char *arg2, ssize_t len2, pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -677,6 +698,8 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 	int			result;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 #endif
@@ -710,9 +733,9 @@ strncoll_icu_no_utf8(const char *arg1, ssize_t len1,
 
 /* 'srclen' of -1 means the strings are NUL-terminated */
 static size_t
-strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
-							const char *src, ssize_t srclen,
-							pg_locale_t locale)
+strnxfrm_prefix_icu(char *dest, size_t destsize,
+					const char *src, ssize_t srclen,
+					pg_locale_t locale)
 {
 	char		sbuf[TEXTBUFLEN];
 	char	   *buf = sbuf;
@@ -725,6 +748,8 @@ strnxfrm_prefix_icu_no_utf8(char *dest, size_t destsize,
 	Size		result_bsize;
 
 	Assert(locale->provider == COLLPROVIDER_ICU);
+
+	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	init_icu_converter();
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index a46c6326854..2a97dcaf2e2 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -50,10 +50,10 @@ extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
 extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
 							ssize_t srclen, pg_locale_t locale);
 
-extern int	strncoll_libc(const char *arg1, ssize_t len1,
+static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
-extern size_t strnxfrm_libc(char *dest, size_t destsize,
+static size_t strnxfrm_libc(char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
 extern char *get_collation_actual_version_libc(const char *collcollate);
@@ -86,6 +86,27 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static const struct collate_methods collate_methods_libc = {
+	.strncoll = strncoll_libc,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+
+	/*
+	 * Unfortunately, it seems that strxfrm() for non-C collations is broken
+	 * on many common platforms; testing of multiple versions of glibc reveals
+	 * that, for many locales, strcoll() and strxfrm() do not return
+	 * consistent results. While no other libc other than Cygwin has so far
+	 * been shown to have a problem, we take the conservative course of action
+	 * for right now and disable this categorically.  (Users who are certain
+	 * this isn't a problem on their system can define TRUST_STRXFRM.)
+	 */
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+
 size_t
 strlower_libc(char *dst, size_t dstsize, const char *src,
 			  ssize_t srclen, pg_locale_t locale)
@@ -439,6 +460,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
 	result->info.lt = loc;
+	if (!result->collate_is_c)
+	{
+#ifdef WIN32
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->collate = &collate_methods_libc_win32_utf8;
+		else
+#endif
+			result->collate = &collate_methods_libc;
+	}
 
 	return result;
 }
@@ -536,12 +566,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 
 	Assert(locale->provider == COLLPROVIDER_LIBC);
 
-#ifdef WIN32
-	/* check for this case before doing the work for nul-termination */
-	if (GetDatabaseEncoding() == PG_UTF8)
-		return strncoll_libc_win32_utf8(arg1, len1, arg2, len2, locale);
-#endif							/* WIN32 */
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 14a17e4869d..2b257a9cd31 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -47,6 +47,36 @@ extern struct lconv *PGLC_localeconv(void);
 extern void cache_locale_time(void);
 
 
+struct pg_locale_struct;
+typedef struct pg_locale_struct *pg_locale_t;
+
+/* methods that define collation behavior */
+struct collate_methods
+{
+	/* required */
+	int			(*strncoll) (const char *arg1, ssize_t len1,
+							 const char *arg2, ssize_t len2,
+							 pg_locale_t locale);
+
+	/* required */
+	size_t		(*strnxfrm) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* optional */
+	size_t		(*strnxfrm_prefix) (char *dest, size_t destsize,
+									const char *src, ssize_t srclen,
+									pg_locale_t locale);
+
+	/*
+	 * If the strnxfrm method is not trusted to return the correct results,
+	 * set strxfrm_is_safe to false. It set to false, the method will not be
+	 * used in most cases, but the planner still expects it to be there for
+	 * estimation purposes (where incorrect results are acceptable).
+	 */
+	bool		strxfrm_is_safe;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -70,6 +100,9 @@ struct pg_locale_struct
 	bool		collate_is_c;
 	bool		ctype_is_c;
 	bool		is_default;
+
+	const struct collate_methods *collate;	/* NULL if collate_is_c */
+
 	union
 	{
 		struct
-- 
2.34.1

v10-0007-Control-ctype-behavior-internally-with-a-method-.patchtext/x-patch; charset=UTF-8; name=v10-0007-Control-ctype-behavior-internally-with-a-method-.patchDownload

From 8dcc4be2904463692589be0caf26e70a91da738d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 29 Nov 2024 09:37:43 -0800
Subject: [PATCH v10 07/11] Control ctype behavior internally with a method
 table.

Previously, pattern matching and case mapping behavior branched based
on the provider.

Refactor to use a method table, which is less error-prone and easier
to hook.
---
 src/backend/regex/regc_pg_locale.c        | 388 +++++-----------------
 src/backend/utils/adt/like.c              |  22 +-
 src/backend/utils/adt/like_support.c      |   7 +-
 src/backend/utils/adt/pg_locale.c         | 113 +++----
 src/backend/utils/adt/pg_locale_builtin.c |  69 +++-
 src/backend/utils/adt/pg_locale_icu.c     |  72 +++-
 src/backend/utils/adt/pg_locale_libc.c    | 216 ++++++++++--
 src/include/utils/pg_locale.h             |  57 ++++
 src/tools/pgindent/typedefs.list          |   1 -
 9 files changed, 510 insertions(+), 435 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index b75784b6ce5..e898634fdf6 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,33 +63,18 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 static Oid	pg_regex_collation;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
-#define PG_ISDIGIT	0x01
-#define PG_ISALPHA	0x02
-#define PG_ISALNUM	(PG_ISDIGIT | PG_ISALPHA)
-#define PG_ISUPPER	0x04
-#define PG_ISLOWER	0x08
-#define PG_ISGRAPH	0x10
-#define PG_ISPRINT	0x20
-#define PG_ISPUNCT	0x40
-#define PG_ISSPACE	0x80
-
-static const unsigned char pg_char_properties[128] = {
+static const unsigned char char_properties_tbl[128] = {
 	 /* NUL */ 0,
 	 /* ^A */ 0,
 	 /* ^B */ 0,
@@ -232,7 +217,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -253,8 +237,8 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
 		collation = C_COLLATION_OID;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -271,32 +255,11 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
+			locale = &dummy_c_locale;
 			collation = C_COLLATION_OID;
 		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
-		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 	pg_regex_collation = collation;
 }
@@ -304,82 +267,31 @@ pg_set_regex_collation(Oid collation)
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISDIGIT));
+	else
+		return char_properties(c, PG_ISDIGIT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALPHA));
+	else
+		return char_properties(c, PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISALNUM));
+	else
+		return char_properties(c, PG_ISDIGIT | PG_ISALPHA, pg_regex_locale) != 0;
 }
 
 static int
@@ -394,219 +306,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISUPPER));
+	else
+		return char_properties(c, PG_ISUPPER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISLOWER));
+	else
+		return char_properties(c, PG_ISLOWER, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISGRAPH));
+	else
+		return char_properties(c, PG_ISGRAPH, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPRINT));
+	else
+		return char_properties(c, PG_ISPRINT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISPUNCT));
+	else
+		return char_properties(c, PG_ISPUNCT, pg_regex_locale) != 0;
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(char_properties_tbl[c] & PG_ISSPACE));
+	else
+		return char_properties(c, PG_ISSPACE, pg_regex_locale) != 0;
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -732,37 +512,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 7df50b50d15..b39bb78c327 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -98,7 +98,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	else if (locale->is_default)
 		return pg_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -209,7 +209,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -224,14 +234,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index ee71ca89ffd..c172f7e55fc 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1495,13 +1495,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 735335b556a..7be8326c2c7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -100,27 +100,6 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1232,6 +1211,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
 
+	Assert((result->ctype_is_c && result->ctype == NULL) ||
+		   (!result->ctype_is_c && result->ctype != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1406,57 +1388,21 @@ size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strlower_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strlower_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strlower_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strtitle_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strtitle_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strtitle_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strupper_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strupper_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strupper_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1593,6 +1539,53 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_properties()
+ *
+ * Out of the properties specified in the given mask, return a new mask of the
+ * properties true for the given character.
+ */
+int
+char_properties(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	return locale->ctype->char_properties(wc, mask, locale);
+}
+
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index c060b89940d..50efcb5e3d3 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -25,13 +25,6 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-
 
 struct WordBoundaryState
 {
@@ -74,14 +67,14 @@ initcap_wbnext(void *state)
 	return wbstate->len;
 }
 
-size_t
+static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
 	return unicode_strlower(dest, destsize, src, srclen);
 }
 
-size_t
+static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -97,13 +90,67 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							initcap_wbnext, &wbstate);
 }
 
-size_t
+static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
 	return unicode_strupper(dest, destsize, src, srclen);
 }
 
+static int
+char_properties_builtin(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && pg_u_isdigit(wc, true))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && pg_u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && pg_u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && pg_u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && pg_u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && pg_u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && pg_u_ispunct(wc, true))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && pg_u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+	.char_properties = char_properties_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
@@ -146,6 +193,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 0a032d9a923..eb9e72eae1a 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -48,17 +48,17 @@
 #define		TEXTBUFLEN			1024
 
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
 
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
 
+static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strupper_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -118,6 +118,50 @@ static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 									   const char *locale,
 									   UErrorCode *pErrorCode);
 
+static int
+char_properties_icu(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	if ((mask & PG_ISDIGIT) && u_isdigit(wc))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && u_isalpha(wc))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && u_isupper(wc))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && u_islower(wc))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && u_isgraph(wc))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && u_isprint(wc))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && u_ispunct(wc))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && u_isspace(wc))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -136,6 +180,15 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static const struct ctype_methods ctype_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+	.char_properties = char_properties_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -206,6 +259,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -379,7 +433,7 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
-size_t
+static size_t
 strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -399,7 +453,7 @@ strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -419,7 +473,7 @@ strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 2a97dcaf2e2..a135e4b21b9 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -43,13 +43,6 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -86,6 +79,15 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static int	char_properties_libc_sb(pg_wchar wc, int mask,
+									pg_locale_t locale);
+static int	char_properties_libc_mb(pg_wchar wc, int mask,
+									pg_locale_t locale);
+static pg_wchar toupper_libc_sb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar toupper_libc_mb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_sb(pg_wchar wc, pg_locale_t locale);
+static pg_wchar tolower_libc_mb(pg_wchar wc, pg_locale_t locale);
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -107,36 +109,76 @@ static const struct collate_methods collate_methods_libc = {
 #endif
 };
 
-size_t
-strlower_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strlower_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strlower_libc_sb(dst, dstsize, src, srclen, locale);
-}
+#ifdef WIN32
+static const struct collate_methods collate_methods_libc_win32_utf8 = {
+	.strncoll = strncoll_libc_win32_utf8,
+	.strnxfrm = strnxfrm_libc,
+	.strnxfrm_prefix = NULL,
+#ifdef TRUST_STRXFRM
+	.strxfrm_is_safe = true,
+#else
+	.strxfrm_is_safe = false,
+#endif
+};
+#endif
 
-size_t
-strtitle_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
 {
-	if (pg_database_encoding_max_length() > 1)
-		return strtitle_libc_mb(dst, dstsize, src, srclen, locale);
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
 	else
-		return strtitle_libc_sb(dst, dstsize, src, srclen, locale);
+		return isalpha_l((unsigned char) ch, locale->info.lt);
 }
 
-size_t
-strupper_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
-	if (pg_database_encoding_max_length() > 1)
-		return strupper_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strupper_libc_sb(dst, dstsize, src, srclen, locale);
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
 }
 
+static const struct ctype_methods ctype_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+	.char_properties = char_properties_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+/*
+ * Non-UTF8 multibyte encodings use multibyte semantics for case mapping, but
+ * single-byte semantics for pattern matching.
+ */
+static const struct ctype_methods ctype_methods_libc_other_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.char_properties = char_properties_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_utf8 = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.char_properties = char_properties_libc_mb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_mb,
+	.wc_tolower = tolower_libc_mb,
+};
+
 static size_t
 strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
@@ -469,6 +511,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_utf8;
+		else if (pg_database_encoding_max_length() > 1)
+			result->ctype = &ctype_methods_libc_other_mb;
+		else
+			result->ctype = &ctype_methods_libc_sb;
+	}
 
 	return result;
 }
@@ -817,6 +868,113 @@ report_newlocale_failure(const char *localename)
 						localename) : 0)));
 }
 
+static int
+char_properties_libc_sb(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc > (pg_wchar) UCHAR_MAX)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static int
+char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
+{
+	int			result = 0;
+
+	Assert(!locale->ctype_is_c);
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	/* if wchar_t cannot represent the value, just return 0 */
+	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
+		return 0;
+
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISDIGIT;
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISALPHA;
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISUPPER;
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISLOWER;
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISGRAPH;
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPRINT;
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISPUNCT;
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+		result |= PG_ISSPACE;
+
+	return result;
+}
+
+static pg_wchar
+toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
 /*
  * POSIX doesn't define _l-variants of these functions, but several systems
  * have them.  We provide our own replacements here.
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 2b257a9cd31..bd7479cac5d 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,10 +12,25 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
 
+/*
+ * Character properties for regular expressions.
+ */
+#define PG_ISDIGIT     0x01
+#define PG_ISALPHA     0x02
+#define PG_ISALNUM     (PG_ISDIGIT | PG_ISALPHA)
+#define PG_ISUPPER     0x04
+#define PG_ISLOWER     0x08
+#define PG_ISGRAPH     0x10
+#define PG_ISPRINT     0x20
+#define PG_ISPUNCT     0x40
+#define PG_ISSPACE     0x80
+
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
@@ -77,6 +92,43 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+struct ctype_methods
+{
+	/* case mapping: LOWER()/INITCAP()/UPPER() */
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* required */
+	int			(*char_properties) (pg_wchar wc, int mask, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/* required */
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -102,6 +154,7 @@ struct pg_locale_struct
 	bool		is_default;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -127,6 +180,10 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 extern bool is_encoding_supported_by_collprovider(char collprovider, int encoding);
+extern int	char_properties(pg_wchar wc, int mask, pg_locale_t locale);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dest, size_t destsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2d4c870423a..94b041ec9e9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1823,7 +1823,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.34.1

v10-0008-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v10-0008-Remove-provider-field-from-pg_locale_t.patchDownload

From c6e78ab4c22281441eb1062681cb3e022536f10f Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v10 08/11] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale_builtin.c |  1 -
 src/backend/utils/adt/pg_locale_icu.c     | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c    |  6 ------
 src/include/utils/pg_locale.h             |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 50efcb5e3d3..630adac1bcb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -189,7 +189,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index eb9e72eae1a..1b8847e6dc9 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -251,7 +251,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -508,8 +507,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -537,8 +534,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -583,8 +578,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -751,8 +744,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -801,8 +792,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index a135e4b21b9..7bf83c98952 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -495,7 +495,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -615,8 +614,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -671,8 +668,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -781,7 +776,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index bd7479cac5d..0b1c01d73cb 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -147,7 +147,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.34.1

v10-0009-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchtext/x-patch; charset=UTF-8; name=v10-0009-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchDownload

From cb89467d976c2de0db002685ef3a283f8624efc7 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v10 09/11] Make provider data in pg_locale_t an opaque
 pointer.

---
 src/backend/utils/adt/pg_locale_builtin.c |  11 +-
 src/backend/utils/adt/pg_locale_icu.c     |  40 +++++--
 src/backend/utils/adt/pg_locale_libc.c    | 131 ++++++++++++++--------
 src/include/utils/pg_locale.h             |  16 +--
 4 files changed, 127 insertions(+), 71 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 630adac1bcb..7dbc6faf430 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -26,6 +26,11 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
 
+struct builtin_provider
+{
+	const char *locale;
+};
+
 struct WordBoundaryState
 {
 	const char *str;
@@ -155,6 +160,7 @@ pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
+	struct builtin_provider *builtin;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -188,7 +194,10 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	builtin = MemoryContextAllocZero(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	result->provider_data = (void *) builtin;
+
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 1b8847e6dc9..6b7ebf95b6f 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
@@ -198,6 +204,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -249,8 +256,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -506,11 +517,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -534,6 +546,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -547,7 +561,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -578,12 +592,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -690,11 +706,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -702,7 +720,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -744,6 +762,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -766,7 +786,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -792,6 +812,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -811,7 +833,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 7bf83c98952..725e0efd390 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -41,6 +42,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -127,17 +133,21 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static const struct ctype_methods ctype_methods_libc_sb = {
@@ -188,7 +198,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -209,7 +219,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_tolower((unsigned char) *p);
 			else
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -220,7 +230,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -242,7 +253,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -273,7 +284,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -299,11 +310,11 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			else
 			{
 				if (wasalnum)
-					*p = tolower_l((unsigned char) *p, loc);
+					*p = tolower_l((unsigned char) *p, libc->lt);
 				else
-					*p = toupper_l((unsigned char) *p, loc);
+					*p = toupper_l((unsigned char) *p, libc->lt);
 			}
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -314,7 +325,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -339,10 +351,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -374,7 +386,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -392,7 +404,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_toupper((unsigned char) *p);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -403,7 +415,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -425,7 +438,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -453,6 +466,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -491,16 +505,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -614,6 +631,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -644,7 +663,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -668,8 +687,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -678,7 +699,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -776,6 +797,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -820,7 +843,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -867,27 +890,29 @@ char_properties_libc_sb(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc > (pg_wchar) UCHAR_MAX)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && isdigit_l((unsigned char) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && isalpha_l((unsigned char) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && isupper_l((unsigned char) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && islower_l((unsigned char) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && isgraph_l((unsigned char) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && isprint_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && ispunct_l((unsigned char) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && isspace_l((unsigned char) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -898,6 +923,8 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 {
 	int			result = 0;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(!locale->ctype_is_c);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
@@ -905,21 +932,21 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 	if (sizeof(wchar_t) < 4 && wc > (pg_wchar) 0xFFFF)
 		return 0;
 
-	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISDIGIT) && iswdigit_l((wint_t) wc, libc->lt))
 		result |= PG_ISDIGIT;
-	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISALPHA) && iswalpha_l((wint_t) wc, libc->lt))
 		result |= PG_ISALPHA;
-	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISUPPER) && iswupper_l((wint_t) wc, libc->lt))
 		result |= PG_ISUPPER;
-	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISLOWER) && iswlower_l((wint_t) wc, libc->lt))
 		result |= PG_ISLOWER;
-	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISGRAPH) && iswgraph_l((wint_t) wc, libc->lt))
 		result |= PG_ISGRAPH;
-	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPRINT) && iswprint_l((wint_t) wc, libc->lt))
 		result |= PG_ISPRINT;
-	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISPUNCT) && iswpunct_l((wint_t) wc, libc->lt))
 		result |= PG_ISPUNCT;
-	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, locale->info.lt))
+	if ((mask & PG_ISSPACE) && iswspace_l((wint_t) wc, libc->lt))
 		result |= PG_ISSPACE;
 
 	return result;
@@ -928,10 +955,12 @@ char_properties_libc_mb(pg_wchar wc, int mask, pg_locale_t locale)
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -939,10 +968,12 @@ toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -950,10 +981,12 @@ toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -961,10 +994,12 @@ tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -1056,8 +1091,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1116,8 +1153,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 0b1c01d73cb..b08efbae912 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -155,21 +155,7 @@ struct pg_locale_struct
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 typedef struct pg_locale_struct *pg_locale_t;
-- 
2.34.1

v10-0010-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v10-0010-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From bb5d2f739be775ab438d335edfdb3e2c43fce3a9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v10 10/11] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  | 4 ++++
 src/backend/utils/adt/formatting.c    | 4 ----
 src/backend/utils/adt/pg_locale.c     | 4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 1 +
 src/backend/utils/adt/varlena.c       | 4 ++++
 src/include/utils/pg_locale.h         | 4 ----
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8001f5ed082..bfec533dbd0 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 6a0571f93e6..387009a4a9e 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -71,10 +71,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/unicode_case.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7be8326c2c7..11f30017d53 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 6b7ebf95b6f..ec5886b1780 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 533bebc1c7b..37b3506f06c 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b08efbae912..c8ac0996f97 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /*
  * Character properties for regular expressions.
  */
-- 
2.34.1

v10-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v10-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patchDownload

From 1314f55f4be43eb2b71d1988ea6c0c7f477f0d98 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 25 Sep 2024 16:10:28 -0700
Subject: [PATCH v10 11/11] Introduce hooks for creating custom pg_locale_t.

Now that collation, case mapping, and ctype behavior is controlled
with a method table, we can hook the behavior.

The hooks can provide their own arbitrary method table, which may be
based on a different version of ICU than what Postgres was built with,
or entirely unrelated to ICU/libc.
---
 src/backend/utils/adt/pg_locale.c | 68 +++++++++++++++++++++----------
 src/include/utils/pg_locale.h     | 24 +++++++++++
 src/tools/pgindent/typedefs.list  |  2 +
 3 files changed, 72 insertions(+), 22 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 11f30017d53..a896337e4fd 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -104,6 +104,9 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
+create_pg_locale_hook_type create_pg_locale_hook = NULL;
+collation_version_hook_type collation_version_hook = NULL;
+
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1191,7 +1194,7 @@ create_pg_locale(Oid collid, MemoryContext context)
 {
 	HeapTuple	tp;
 	Form_pg_collation collform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 	Datum		datum;
 	bool		isnull;
 
@@ -1200,15 +1203,21 @@ create_pg_locale(Oid collid, MemoryContext context)
 		elog(ERROR, "cache lookup failed for collation %u", collid);
 	collform = (Form_pg_collation) GETSTRUCT(tp);
 
-	if (collform->collprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(collid, context);
-	else if (collform->collprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(collid, context);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(collid, context);
+
+	if (result == NULL)
+	{
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(collid, context);
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(collid, context);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(collform->collprovider);
+	}
 
 	result->is_default = false;
 
@@ -1273,7 +1282,7 @@ init_database_collation(void)
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
-	pg_locale_t result;
+	pg_locale_t result = NULL;
 
 	Assert(default_locale == NULL);
 
@@ -1283,18 +1292,25 @@ init_database_collation(void)
 		elog(ERROR, "cache lookup failed for database %u", MyDatabaseId);
 	dbform = (Form_pg_database) GETSTRUCT(tup);
 
-	if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
-		result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
-										  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_ICU)
-		result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
-									  TopMemoryContext);
-	else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
-		result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+	if (create_pg_locale_hook != NULL)
+		result = create_pg_locale_hook(DEFAULT_COLLATION_OID,
 									   TopMemoryContext);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+
+	if (result == NULL)
+	{
+		if (dbform->datlocprovider == COLLPROVIDER_BUILTIN)
+			result = create_pg_locale_builtin(DEFAULT_COLLATION_OID,
+											  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			result = create_pg_locale_icu(DEFAULT_COLLATION_OID,
+										  TopMemoryContext);
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			result = create_pg_locale_libc(DEFAULT_COLLATION_OID,
+										   TopMemoryContext);
+		else
+			/* shouldn't happen */
+			PGLOCALE_SUPPORT_ERROR(dbform->datlocprovider);
+	}
 
 	result->is_default = true;
 	ReleaseSysCache(tup);
@@ -1364,6 +1380,14 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collation_version_hook != NULL)
+	{
+		char	   *version;
+
+		if (collation_version_hook(collprovider, collcollate, &version))
+			return version;
+	}
+
 	if (collprovider == COLLPROVIDER_BUILTIN)
 		collversion = get_collation_actual_version_builtin(collcollate);
 #ifdef USE_ICU
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index c8ac0996f97..2ab69d2372e 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -156,6 +156,30 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+/*
+ * Hooks to enable custom locale providers.
+ */
+
+/*
+ * Hook create_pg_locale(). Return result (allocated in the given context) to
+ * override; or return NULL to return control to create_pg_locale(). When
+ * creating the default database collation, collid is DEFAULT_COLLATION_OID.
+ */
+typedef pg_locale_t (*create_pg_locale_hook_type) (Oid collid,
+												   MemoryContext context);
+
+/*
+ * Hook get_collation_actual_version(). Set *version out parameter and return
+ * true to override; or return false to return control to
+ * get_collation_actual_version().
+ */
+typedef bool (*collation_version_hook_type) (char collprovider,
+											 const char *collcollate,
+											 char **version);
+
+extern PGDLLIMPORT create_pg_locale_hook_type create_pg_locale_hook;
+extern PGDLLIMPORT collation_version_hook_type collation_version_hook;
+
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 94b041ec9e9..e39fe343d21 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3376,6 +3376,7 @@ cmpEntriesArg
 codes_t
 collation_cache_entry
 collation_cache_hash
+collation_version_hook_type
 color
 colormaprange
 compare_context
@@ -3392,6 +3393,7 @@ core_yyscan_t
 corrupt_items
 cost_qual_eval_context
 cp_hash_func
+create_pg_locale_hook_type
 create_upper_paths_hook_type
 createdb_failure_params
 crosstab_HashEnt
-- 
2.34.1

#12

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Andreas Karlsson (#10)

Re: Collation & ctype method table, and extension hooks

On Mon, 2024-12-02 at 16:39 +0100, Andreas Karlsson wrote:

My patches:

= v9-0002-Move-check-for-ucol_strcollUTF8-to-pg_locale_icu..patch

Committed.

= v9-0003-Move-code-for-collation-version-into-provider-spe.patch

Moves some code from pg_collate.c into provider specific files.

I agree with the general idea, but it seems we are accumulating a lot
of provider-specific functions. Should we define a provider struct with
its own methods?

That would be a good step toward making the provider catalog-driven.
Even if we don't support CREATE LOCALE PROVIDER, having space in the
catalog would be a good place to track the provider version.

= v9-0004-Move-ICU-database-encoding-check-into-validation-.patch

This seems to be causing a test failure in 020_createdb.pl.

= v9-0005-Move-provider-specific-code-when-looking-up-local.patch

I did not like how namespace.c had knowledge of ICU.

See comments above about v9-0003.

Regards,
Jeff Davis

#13

Andreas Karlsson

andreas@proxel.se

about 1 year ago

In reply to: Jeff Davis (#12)

1 attachment(s)

Re: Collation & ctype method table, and extension hooks

On 12/5/24 1:21 AM, Jeff Davis wrote:

= v9-0003-Move-code-for-collation-version-into-provider-spe.patch

Moves some code from pg_collate.c into provider specific files.

I agree with the general idea, but it seems we are accumulating a lot
of provider-specific functions. Should we define a provider struct with
its own methods?

That would be a good step toward making the provider catalog-driven.
Even if we don't support CREATE LOCALE PROVIDER, having space in the
catalog would be a good place to track the provider version.

Yeah, that was my idea too but I just have not gotten around to it yet.

= v9-0004-Move-ICU-database-encoding-check-into-validation-.patch

This seems to be causing a test failure in 020_createdb.pl.

Thanks, I have attached a fixup commit for this.

Andreas

Attachments:

0001-fixup-Move-ICU-database-encoding-check-into-validati.patchtext/x-patch; charset=UTF-8; name=0001-fixup-Move-ICU-database-encoding-check-into-validati.patchDownload

From ccaaf785a2aa14460d8360709d6f0ea4746f0157 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andreas@proxel.se>
Date: Fri, 20 Dec 2024 06:47:33 +0100
Subject: [PATCH] fixup! Move ICU database encoding check into validation
 function

---
 src/bin/scripts/t/020_createdb.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 4a0e2c883a1..36d285d4777 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -59,7 +59,7 @@ if ($ENV{with_icu} eq 'yes')
 	$node->command_fails_like(
 		[
 			'createdb', '-T',
-			'template0', '--locale-provider=icu',
+			'template0', '--locale-provider=icu', '--icu-locale=en',
 			'--encoding=SQL_ASCII', 'foobarX'
 		],
 		qr/ERROR:  encoding "SQL_ASCII" is not supported with ICU provider/,
-- 
2.45.2

#14

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Jeff Davis (#11)

4 attachment(s)

Re: Collation & ctype method table, and extension hooks

On Mon, 2024-12-02 at 23:58 -0800, Jeff Davis wrote:

On Mon, 2024-12-02 at 16:39 +0100, Andreas Karlsson wrote:

I feel your first patch in the series is something you can just
commit.

Done.

I combined your patches and mine into the attached v10 series.

Here's v12 after committing a few of the earlier patches.

I changed the ctype method table to have separate methods for isdigit,
isalpha, etc., instead of the combined char_properties method. That's
more consistent with how things are currently done.

I may still be seeing a tiny perf regression using the same test as
[1]: /messages/by-id/78a1b434ff40510dc5aaabe986299a09f4da90cf.camel@j-davis.com
you think that's a problem.

I committed your change to move the version reporting into the
provider-specific files.

Your other change to lookup_collation() in namespace.c should also
account for the code in DefineCollation() -- I don't think it makes
sense to refactor one without the other.

Regards,
Jeff Davis

[1]: /messages/by-id/78a1b434ff40510dc5aaabe986299a09f4da90cf.camel@j-davis.com
/messages/by-id/78a1b434ff40510dc5aaabe986299a09f4da90cf.camel@j-davis.com

Attachments:

v12-0001-Control-ctype-behavior-internally-with-a-method-.patchtext/x-patch; charset=UTF-8; name=v12-0001-Control-ctype-behavior-internally-with-a-method-.patchDownload

From 129b35a2ecc7243def519e50525b0476220e17e6 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 29 Nov 2024 09:37:43 -0800
Subject: [PATCH v12 1/4] Control ctype behavior internally with a method
 table.

Previously, pattern matching and case mapping behavior branched based
on the provider.

Refactor to use a method table, which is less error-prone and easier
to hook.
---
 src/backend/regex/regc_pg_locale.c        | 377 +++++-----------------
 src/backend/utils/adt/like.c              |  22 +-
 src/backend/utils/adt/like_support.c      |   7 +-
 src/backend/utils/adt/pg_locale.c         | 101 +++---
 src/backend/utils/adt/pg_locale_builtin.c | 106 +++++-
 src/backend/utils/adt/pg_locale_icu.c     | 109 ++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 279 +++++++++++++---
 src/include/utils/pg_locale.h             |  49 +++
 src/tools/pgindent/typedefs.list          |   1 -
 9 files changed, 618 insertions(+), 433 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 2360d08efae..31b8f4a9478 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,18 +63,13 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
@@ -231,7 +226,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -252,8 +246,7 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
-		locale = 0;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -270,113 +263,41 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
-		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
+			locale = &dummy_c_locale;
 		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 }
 
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISDIGIT));
+	else
+		return pg_regex_locale->ctype->wc_isdigit(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALPHA));
+	else
+		return pg_regex_locale->ctype->wc_isalpha(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALNUM));
+	else
+		return pg_regex_locale->ctype->wc_isalnum(c, pg_regex_locale);
 }
 
 static int
@@ -391,219 +312,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISUPPER));
+	else
+		return pg_regex_locale->ctype->wc_isupper(c, pg_regex_locale);
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISLOWER));
+	else
+		return pg_regex_locale->ctype->wc_islower(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISGRAPH));
+	else
+		return pg_regex_locale->ctype->wc_isgraph(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPRINT));
+	else
+		return pg_regex_locale->ctype->wc_isprint(c, pg_regex_locale);
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, true);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPUNCT));
+	else
+		return pg_regex_locale->ctype->wc_ispunct(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISSPACE));
+	else
+		return pg_regex_locale->ctype->wc_isspace(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -729,37 +518,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 7f4cf614585..4216ac17f43 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -98,7 +98,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	else if (locale->is_default)
 		return pg_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -209,7 +209,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -224,14 +234,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 8fdc677371f..999f23f86d5 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1495,13 +1495,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 875cca6efc8..cdb4950ac47 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -100,27 +100,6 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1232,6 +1211,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
 
+	Assert((result->ctype_is_c && result->ctype == NULL) ||
+		   (!result->ctype_is_c && result->ctype != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1394,57 +1376,21 @@ size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strlower_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strlower_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strlower_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strtitle_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strtitle_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strtitle_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strupper_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strupper_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strupper_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1581,6 +1527,41 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 5161915e6b1..aa7d0e3d6cb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -25,13 +25,6 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-
 
 struct WordBoundaryState
 {
@@ -74,14 +67,14 @@ initcap_wbnext(void *state)
 	return wbstate->len;
 }
 
-size_t
+static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
 	return unicode_strlower(dest, destsize, src, srclen);
 }
 
-size_t
+static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -97,13 +90,104 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							initcap_wbnext, &wbstate);
 }
 
-size_t
+static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
 	return unicode_strupper(dest, destsize, src, srclen);
 }
 
+static bool
+wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isdigit(wc, true);
+}
+
+static bool
+wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalnum(wc, true);
+}
+
+static bool
+wc_isupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isupper(wc);
+}
+
+static bool
+wc_islower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_islower(wc);
+}
+
+static bool
+wc_isgraph_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isgraph(wc);
+}
+
+static bool
+wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isprint(wc);
+}
+
+static bool
+wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_ispunct(wc, true);
+}
+
+static bool
+wc_isspace_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isspace(wc);
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+	.wc_isdigit = wc_isdigit_builtin,
+	.wc_isalpha = wc_isalpha_builtin,
+	.wc_isalnum = wc_isalnum_builtin,
+	.wc_isupper = wc_isupper_builtin,
+	.wc_islower = wc_islower_builtin,
+	.wc_isgraph = wc_isgraph_builtin,
+	.wc_isprint = wc_isprint_builtin,
+	.wc_ispunct = wc_ispunct_builtin,
+	.wc_isspace = wc_isspace_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
@@ -146,6 +230,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 5185b0f7289..3e9a2e0cfaa 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -48,17 +48,17 @@
 #define		TEXTBUFLEN			1024
 
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
 
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
 
+static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strupper_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -118,6 +118,25 @@ static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 									   const char *locale,
 									   UErrorCode *pErrorCode);
 
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -136,6 +155,77 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static bool
+wc_isdigit_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isdigit(wc);
+}
+
+static bool
+wc_isalpha_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalnum(wc);
+}
+
+static bool
+wc_isupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isupper(wc);
+}
+
+static bool
+wc_islower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_islower(wc);
+}
+
+static bool
+wc_isgraph_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isgraph(wc);
+}
+
+static bool
+wc_isprint_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isprint(wc);
+}
+
+static bool
+wc_ispunct_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_ispunct(wc);
+}
+
+static bool
+wc_isspace_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isspace(wc);
+}
+
+static const struct ctype_methods ctype_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+	.wc_isdigit = wc_isdigit_icu,
+	.wc_isalpha = wc_isalpha_icu,
+	.wc_isalnum = wc_isalnum_icu,
+	.wc_isupper = wc_isupper_icu,
+	.wc_islower = wc_islower_icu,
+	.wc_isgraph = wc_isgraph_icu,
+	.wc_isprint = wc_isprint_icu,
+	.wc_ispunct = wc_ispunct_icu,
+	.wc_isspace = wc_isspace_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -206,6 +296,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -379,7 +470,7 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
-size_t
+static size_t
 strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -399,7 +490,7 @@ strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -419,7 +510,7 @@ strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 8f9a8637897..1144c6ff304 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -43,13 +43,6 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -86,6 +79,239 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static bool
+wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isdigit_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalpha_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalnum_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return islower_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isgraph_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isprint_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return ispunct_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isspace_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswdigit_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalpha_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalnum_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswupper_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswlower_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswgraph_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswprint_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswpunct_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswspace_l((wint_t) wc, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static pg_wchar
+toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static const struct ctype_methods ctype_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+/*
+ * Non-UTF8 multibyte encodings use multibyte semantics for case mapping, but
+ * single-byte semantics for pattern matching.
+ */
+static const struct ctype_methods ctype_methods_libc_other_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_utf8 = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_mb,
+	.wc_isalpha = wc_isalpha_libc_mb,
+	.wc_isalnum = wc_isalnum_libc_mb,
+	.wc_isupper = wc_isupper_libc_mb,
+	.wc_islower = wc_islower_libc_mb,
+	.wc_isgraph = wc_isgraph_libc_mb,
+	.wc_isprint = wc_isprint_libc_mb,
+	.wc_ispunct = wc_ispunct_libc_mb,
+	.wc_isspace = wc_isspace_libc_mb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_mb,
+	.wc_tolower = tolower_libc_mb,
+};
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -120,36 +346,6 @@ static const struct collate_methods collate_methods_libc_win32_utf8 = {
 };
 #endif
 
-size_t
-strlower_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strlower_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strlower_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strtitle_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strtitle_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strtitle_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strupper_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strupper_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strupper_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
 static size_t
 strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
@@ -482,6 +678,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_utf8;
+		else if (pg_database_encoding_max_length() > 1)
+			result->ctype = &ctype_methods_libc_other_mb;
+		else
+			result->ctype = &ctype_methods_libc_sb;
+	}
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index ec42ca3da4c..b64135ab389 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,6 +12,8 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
@@ -77,6 +79,49 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+struct ctype_methods
+{
+	/* case mapping: LOWER()/INITCAP()/UPPER() */
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* required */
+	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalpha) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalnum) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isupper) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_islower) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isgraph) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isprint) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -102,6 +147,7 @@ struct pg_locale_struct
 	bool		is_default;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -124,6 +170,9 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dest, size_t destsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f83ecf181f..a869d6b7283 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1826,7 +1826,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.34.1

v12-0002-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v12-0002-Remove-provider-field-from-pg_locale_t.patchDownload

From c7f7159cdd31cc1f10ac35e43afc39b2074ecefe Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v12 2/4] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale_builtin.c |  1 -
 src/backend/utils/adt/pg_locale_icu.c     | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c    |  6 ------
 src/include/utils/pg_locale.h             |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index aa7d0e3d6cb..4db21882ac3 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -226,7 +226,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 3e9a2e0cfaa..e4f0398c217 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -288,7 +288,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -545,8 +544,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -574,8 +571,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -620,8 +615,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -788,8 +781,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -838,8 +829,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 1144c6ff304..1582f8cdd2a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -662,7 +662,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -782,8 +781,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -838,8 +835,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -948,7 +943,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b64135ab389..d9650cec5cc 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -140,7 +140,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.34.1

v12-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchtext/x-patch; charset=UTF-8; name=v12-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchDownload

From ad4371e6f641479275dbb21cda7d615393831271 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v12 3/4] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale_builtin.c |  11 +-
 src/backend/utils/adt/pg_locale_icu.c     |  40 ++++--
 src/backend/utils/adt/pg_locale_libc.c    | 149 +++++++++++++++-------
 src/include/utils/pg_locale.h             |  16 +--
 4 files changed, 143 insertions(+), 73 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 4db21882ac3..77768735149 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -26,6 +26,11 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
 
+struct builtin_provider
+{
+	const char *locale;
+};
+
 struct WordBoundaryState
 {
 	const char *str;
@@ -192,6 +197,7 @@ pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
+	struct builtin_provider *builtin;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -225,7 +231,10 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
+	builtin = MemoryContextAllocZero(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	result->provider_data = (void *) builtin;
+
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index e4f0398c217..7bd58f26c44 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
@@ -235,6 +241,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -286,8 +293,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -543,11 +554,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -571,6 +583,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -584,7 +598,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -615,12 +629,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -727,11 +743,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -739,7 +757,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -781,6 +799,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -803,7 +823,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -829,6 +849,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -848,7 +870,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 1582f8cdd2a..1d990a612b4 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -41,6 +42,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -82,116 +88,136 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 static bool
 wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isdigit_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return isdigit_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalpha_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return isalpha_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalnum_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return isalnum_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isupper_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return isupper_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return islower_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return islower_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isgraph_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return isgraph_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isprint_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return isprint_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return ispunct_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return ispunct_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isspace_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return isspace_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswdigit_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswdigit_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalpha_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswalpha_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalnum_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswalnum_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswupper_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswupper_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswlower_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswlower_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswgraph_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswgraph_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswprint_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswprint_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswpunct_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswpunct_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswspace_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+	return iswspace_l((wint_t) wc, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static bool
@@ -199,19 +225,23 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -219,10 +249,12 @@ toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -230,10 +262,12 @@ toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -241,10 +275,12 @@ tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -355,7 +391,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -376,7 +412,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_tolower((unsigned char) *p);
 			else
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -387,7 +423,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -409,7 +446,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -440,7 +477,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -466,11 +503,11 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			else
 			{
 				if (wasalnum)
-					*p = tolower_l((unsigned char) *p, loc);
+					*p = tolower_l((unsigned char) *p, libc->lt);
 				else
-					*p = toupper_l((unsigned char) *p, loc);
+					*p = toupper_l((unsigned char) *p, libc->lt);
 			}
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -481,7 +518,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -506,10 +544,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -541,7 +579,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -559,7 +597,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_toupper((unsigned char) *p);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -570,7 +608,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -592,7 +631,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -620,6 +659,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -658,16 +698,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -781,6 +824,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -811,7 +856,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -835,8 +880,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -845,7 +892,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -943,6 +990,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -987,7 +1036,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -1116,8 +1165,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1176,8 +1227,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index d9650cec5cc..74dd8435a6b 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -148,21 +148,7 @@ struct pg_locale_struct
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 extern void init_database_collation(void);
-- 
2.34.1

v12-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v12-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From 95c70ad9d8a2f90967a5d62b276d96756dfae172 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v12 4/4] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  | 4 ++++
 src/backend/utils/adt/formatting.c    | 4 ----
 src/backend/utils/adt/pg_locale.c     | 4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 1 +
 src/backend/utils/adt/varlena.c       | 4 ++++
 src/include/utils/pg_locale.h         | 4 ----
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8acbfbbeda0..a57fe93c387 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 3960235e14e..2ba4ca7f0f2 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -71,10 +71,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/int.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index cdb4950ac47..e3ddec2d57d 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 7bd58f26c44..0469c52b669 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 34796f2e27c..c57262e1888 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 74dd8435a6b..acb4890a78a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.34.1

#15

Jeff Davis

pgsql@j-davis.com

12 months ago

In reply to: Jeff Davis (#14)

Re: Collation & ctype method table, and extension hooks

On Thu, 2025-01-09 at 16:19 -0800, Jeff Davis wrote:

On Mon, 2024-12-02 at 23:58 -0800, Jeff Davis wrote:

On Mon, 2024-12-02 at 16:39 +0100, Andreas Karlsson wrote:

I feel your first patch in the series is something you can just
commit.

Done.

I combined your patches and mine into the attached v10 series.

Here's v12 after committing a few of the earlier patches.

I collected some performance numbers for a worst case on UTF8. This is
where each row is million characters wide and each one is greater than
MAX_SIMPLE_CHAR (U+07FF):

create table wide (t text);
insert into wide
select repeat('カ', 1048576)
from generate_series(1,1000) g;

select 1 from wide where t ~ '([[:punct:]]|[[:lower:]])'
collate "the_collation";

results:
master patched
C 3736 3589
pg_c_utf8 19500 23404
en_US 10251 12396
en-US-x-icu 10264 11963

And a separate test for ILIKE on en_US.iso885915 where each character
is beyond the ASCII range and needs to be lowercased using the
optimization for single-byte encodings in Generic_Text_IC_like:

create table sb (t text);
insert into sb
select repeat('É', 1048576)
from generate_series(1, 3000) g;

select 1 from sb where t ilike '%á%';

results:

master patched
C 2900 2812
en_US 2203 3702
en-US-x-icu 17483 18123

The numbers from both tests show a slowdown. The worst one is probably
tolower() for libc in LATIN9, which appears to be heavily optimized,
and the extra indirection for a method call slows things down quite a
bit.

This is a bit unfortunate because the method table feels like the right
code organization. Having special cases at the call sites (aside from
ctype_is_c) is not great. Are the above numbers bad enough that we need
to give up on this method-ization approach? Or should we say that the
above cases don't represent reality, and a moderate regression there is
OK?

Or perhaps someone has an idea how to mitigate the regression? I could
imagine another cache of character properties, like an extensible
pg_char_properties. I'm not sure if the extra complexity is worth it,
though.

Regards,
Jeff Davis

#16

Jeff Davis

pgsql@j-davis.com

12 months ago

In reply to: Jeff Davis (#15)

4 attachment(s)

Re: Collation & ctype method table, and extension hooks

On Wed, 2025-01-15 at 12:42 -0800, Jeff Davis wrote:

Here's v12 after committing a few of the earlier patches.

And here's v14, just a rebase.

I collected some performance numbers for a worst case on UTF8.

I'm still inlined to think the method table is a good thing to do:

(a) The performance cases I tried seem implausibly bad -- running
character classification patterns over large fields consisting only of
codepoints over U+07FF.

(b) The method tables seem like a better code organization that
separates the responsibilities of the provider from the calling code.
It's also a requirement (or nearly so) if we want to provide some
pluggability or support multiple library versions.

It would be good to hear from others on these points, though.

Regards,
Jeff Davis

Attachments:

v14-0001-Control-ctype-behavior-internally-with-a-method-.patchtext/x-patch; charset=UTF-8; name=v14-0001-Control-ctype-behavior-internally-with-a-method-.patchDownload

From e30915172c98616d0aec56f190dff48836760ccc Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 29 Nov 2024 09:37:43 -0800
Subject: [PATCH v14 1/4] Control ctype behavior internally with a method
 table.

Previously, pattern matching and case mapping behavior branched based
on the provider.

Refactor to use a method table, which is less error-prone and easier
to hook.
---
 src/backend/regex/regc_pg_locale.c        | 377 +++++-----------------
 src/backend/utils/adt/like.c              |  22 +-
 src/backend/utils/adt/like_support.c      |   7 +-
 src/backend/utils/adt/pg_locale.c         | 101 +++---
 src/backend/utils/adt/pg_locale_builtin.c | 106 +++++-
 src/backend/utils/adt/pg_locale_icu.c     | 109 ++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 279 +++++++++++++---
 src/include/utils/pg_locale.h             |  49 +++
 src/tools/pgindent/typedefs.list          |   1 -
 9 files changed, 618 insertions(+), 433 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index ed7411df83d..31b8f4a9478 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,18 +63,13 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
@@ -231,7 +226,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -252,8 +246,7 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
-		locale = 0;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -270,113 +263,41 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
-		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
+			locale = &dummy_c_locale;
 		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 }
 
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISDIGIT));
+	else
+		return pg_regex_locale->ctype->wc_isdigit(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALPHA));
+	else
+		return pg_regex_locale->ctype->wc_isalpha(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALNUM));
+	else
+		return pg_regex_locale->ctype->wc_isalnum(c, pg_regex_locale);
 }
 
 static int
@@ -391,219 +312,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISUPPER));
+	else
+		return pg_regex_locale->ctype->wc_isupper(c, pg_regex_locale);
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISLOWER));
+	else
+		return pg_regex_locale->ctype->wc_islower(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISGRAPH));
+	else
+		return pg_regex_locale->ctype->wc_isgraph(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPRINT));
+	else
+		return pg_regex_locale->ctype->wc_isprint(c, pg_regex_locale);
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPUNCT));
+	else
+		return pg_regex_locale->ctype->wc_ispunct(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISSPACE));
+	else
+		return pg_regex_locale->ctype->wc_isspace(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -729,37 +518,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 7f4cf614585..4216ac17f43 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -98,7 +98,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	else if (locale->is_default)
 		return pg_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -209,7 +209,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -224,14 +234,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 8fdc677371f..999f23f86d5 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1495,13 +1495,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 94444acd2c5..5b78237f72e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -100,27 +100,6 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1232,6 +1211,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
 
+	Assert((result->ctype_is_c && result->ctype == NULL) ||
+		   (!result->ctype_is_c && result->ctype != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1394,57 +1376,21 @@ size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strlower_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strlower_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strlower_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strtitle_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strtitle_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strtitle_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strupper_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strupper_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strupper_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1581,6 +1527,41 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 436e32c0ca0..5f43658ab5b 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -25,13 +25,6 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-
 
 struct WordBoundaryState
 {
@@ -74,7 +67,7 @@ initcap_wbnext(void *state)
 	return wbstate->len;
 }
 
-size_t
+static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -82,7 +75,7 @@ strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							locale->info.builtin.casemap_full);
 }
 
-size_t
+static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -99,7 +92,7 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							initcap_wbnext, &wbstate);
 }
 
-size_t
+static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -107,6 +100,97 @@ strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							locale->info.builtin.casemap_full);
 }
 
+static bool
+wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isdigit(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalnum(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isupper(wc);
+}
+
+static bool
+wc_islower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_islower(wc);
+}
+
+static bool
+wc_isgraph_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isgraph(wc);
+}
+
+static bool
+wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isprint(wc);
+}
+
+static bool
+wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_ispunct(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isspace_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isspace(wc);
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+	.wc_isdigit = wc_isdigit_builtin,
+	.wc_isalpha = wc_isalpha_builtin,
+	.wc_isalnum = wc_isalnum_builtin,
+	.wc_isupper = wc_isupper_builtin,
+	.wc_islower = wc_islower_builtin,
+	.wc_isgraph = wc_isgraph_builtin,
+	.wc_isprint = wc_isprint_builtin,
+	.wc_ispunct = wc_ispunct_builtin,
+	.wc_isspace = wc_isspace_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
@@ -150,6 +234,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 5185b0f7289..3e9a2e0cfaa 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -48,17 +48,17 @@
 #define		TEXTBUFLEN			1024
 
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
 
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
 
+static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strupper_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -118,6 +118,25 @@ static int32_t u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
 									   const char *locale,
 									   UErrorCode *pErrorCode);
 
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -136,6 +155,77 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static bool
+wc_isdigit_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isdigit(wc);
+}
+
+static bool
+wc_isalpha_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalnum(wc);
+}
+
+static bool
+wc_isupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isupper(wc);
+}
+
+static bool
+wc_islower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_islower(wc);
+}
+
+static bool
+wc_isgraph_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isgraph(wc);
+}
+
+static bool
+wc_isprint_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isprint(wc);
+}
+
+static bool
+wc_ispunct_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_ispunct(wc);
+}
+
+static bool
+wc_isspace_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isspace(wc);
+}
+
+static const struct ctype_methods ctype_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+	.wc_isdigit = wc_isdigit_icu,
+	.wc_isalpha = wc_isalpha_icu,
+	.wc_isalnum = wc_isalnum_icu,
+	.wc_isupper = wc_isupper_icu,
+	.wc_islower = wc_islower_icu,
+	.wc_isgraph = wc_isgraph_icu,
+	.wc_isprint = wc_isprint_icu,
+	.wc_ispunct = wc_ispunct_icu,
+	.wc_isspace = wc_isspace_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -206,6 +296,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -379,7 +470,7 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
-size_t
+static size_t
 strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -399,7 +490,7 @@ strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -419,7 +510,7 @@ strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 8f9a8637897..1144c6ff304 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -43,13 +43,6 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -86,6 +79,239 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static bool
+wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isdigit_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalpha_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalnum_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return islower_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isgraph_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isprint_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return ispunct_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isspace_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswdigit_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalpha_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalnum_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswupper_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswlower_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswgraph_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswprint_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswpunct_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswspace_l((wint_t) wc, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static pg_wchar
+toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static const struct ctype_methods ctype_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+/*
+ * Non-UTF8 multibyte encodings use multibyte semantics for case mapping, but
+ * single-byte semantics for pattern matching.
+ */
+static const struct ctype_methods ctype_methods_libc_other_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_utf8 = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_mb,
+	.wc_isalpha = wc_isalpha_libc_mb,
+	.wc_isalnum = wc_isalnum_libc_mb,
+	.wc_isupper = wc_isupper_libc_mb,
+	.wc_islower = wc_islower_libc_mb,
+	.wc_isgraph = wc_isgraph_libc_mb,
+	.wc_isprint = wc_isprint_libc_mb,
+	.wc_ispunct = wc_ispunct_libc_mb,
+	.wc_isspace = wc_isspace_libc_mb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_mb,
+	.wc_tolower = tolower_libc_mb,
+};
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -120,36 +346,6 @@ static const struct collate_methods collate_methods_libc_win32_utf8 = {
 };
 #endif
 
-size_t
-strlower_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strlower_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strlower_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strtitle_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strtitle_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strtitle_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strupper_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strupper_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strupper_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
 static size_t
 strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
@@ -482,6 +678,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_utf8;
+		else if (pg_database_encoding_max_length() > 1)
+			result->ctype = &ctype_methods_libc_other_mb;
+		else
+			result->ctype = &ctype_methods_libc_sb;
+	}
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 2bc3a7df2d9..cac05c69d34 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,6 +12,8 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
@@ -77,6 +79,49 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+struct ctype_methods
+{
+	/* case mapping: LOWER()/INITCAP()/UPPER() */
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+
+	/* required */
+	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalpha) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalnum) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isupper) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_islower) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isgraph) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isprint) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -102,6 +147,7 @@ struct pg_locale_struct
 	bool		is_default;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -125,6 +171,9 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dest, size_t destsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 668bddbfcd7..9aa19f88b5b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1829,7 +1829,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.34.1

v14-0002-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v14-0002-Remove-provider-field-from-pg_locale_t.patchDownload

From 6f434248cdabd9e9ada75b6eabe62e77f5a22e6a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v14 2/4] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale_builtin.c |  1 -
 src/backend/utils/adt/pg_locale_icu.c     | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c    |  6 ------
 src/include/utils/pg_locale.h             |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 5f43658ab5b..9ea5a461e84 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -230,7 +230,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
 	result->info.builtin.casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 3e9a2e0cfaa..e4f0398c217 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -288,7 +288,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -545,8 +544,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -574,8 +571,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -620,8 +615,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -788,8 +781,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -838,8 +829,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 1144c6ff304..1582f8cdd2a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -662,7 +662,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -782,8 +781,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -838,8 +835,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -948,7 +943,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cac05c69d34..11e1810eeb8 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -140,7 +140,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.34.1

v14-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchtext/x-patch; charset=UTF-8; name=v14-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchDownload

From edf86ee0af1a36ef379118b84c3cef65b71ff9c5 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v14 3/4] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale_builtin.c |  49 +++++--
 src/backend/utils/adt/pg_locale_icu.c     |  40 ++++--
 src/backend/utils/adt/pg_locale_libc.c    | 167 +++++++++++++++-------
 src/include/utils/pg_locale.h             |  17 +--
 4 files changed, 192 insertions(+), 81 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 9ea5a461e84..de328e05a78 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -26,6 +26,12 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
 
+struct builtin_provider
+{
+	const char *locale;
+	bool		casemap_full;
+};
+
 struct WordBoundaryState
 {
 	const char *str;
@@ -71,14 +77,19 @@ static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strlower(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full);
+							builtin->casemap_full);
 }
 
 static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
 	struct WordBoundaryState wbstate = {
 		.str = src,
 		.len = srclen,
@@ -87,8 +98,10 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		.prev_alnum = false,
 	};
 
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strtitle(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full,
+							builtin->casemap_full,
 							initcap_wbnext, &wbstate);
 }
 
@@ -96,14 +109,22 @@ static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strupper(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full);
+							builtin->casemap_full);
 }
 
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isdigit(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_isdigit(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -115,7 +136,11 @@ wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
 static bool
 wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isalnum(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_isalnum(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -145,7 +170,11 @@ wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
 static bool
 wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_ispunct(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_ispunct(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -195,6 +224,7 @@ pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
+	struct builtin_provider *builtin;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -228,8 +258,11 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->info.builtin.casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
+	builtin = MemoryContextAllocZero(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	builtin->casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
+	result->provider_data = (void *) builtin;
+
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index e4f0398c217..7bd58f26c44 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
@@ -235,6 +241,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -286,8 +293,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -543,11 +554,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -571,6 +583,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -584,7 +598,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -615,12 +629,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -727,11 +743,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -739,7 +757,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -781,6 +799,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -803,7 +823,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -829,6 +849,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -848,7 +870,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 1582f8cdd2a..d357962ebdf 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -41,6 +42,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -82,116 +88,154 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 static bool
 wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isdigit_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isdigit_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalpha_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isalpha_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalnum_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isalnum_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isupper_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isupper_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return islower_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return islower_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isgraph_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isgraph_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isprint_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isprint_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return ispunct_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return ispunct_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isspace_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isspace_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswdigit_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswdigit_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalpha_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswalpha_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalnum_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswalnum_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswupper_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswupper_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswlower_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswlower_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswgraph_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswgraph_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswprint_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswprint_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswpunct_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswpunct_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswspace_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswspace_l((wint_t) wc, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static bool
@@ -199,19 +243,23 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -219,10 +267,12 @@ toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -230,10 +280,12 @@ toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -241,10 +293,12 @@ tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -355,7 +409,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -376,7 +430,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_tolower((unsigned char) *p);
 			else
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -387,7 +441,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -409,7 +464,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -440,7 +495,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -466,11 +521,11 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			else
 			{
 				if (wasalnum)
-					*p = tolower_l((unsigned char) *p, loc);
+					*p = tolower_l((unsigned char) *p, libc->lt);
 				else
-					*p = toupper_l((unsigned char) *p, loc);
+					*p = toupper_l((unsigned char) *p, libc->lt);
 			}
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -481,7 +536,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -506,10 +562,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -541,7 +597,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -559,7 +615,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_toupper((unsigned char) *p);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -570,7 +626,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -592,7 +649,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -620,6 +677,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -658,16 +716,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -781,6 +842,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -811,7 +874,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -835,8 +898,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -845,7 +910,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -943,6 +1008,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -987,7 +1054,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -1116,8 +1183,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1176,8 +1245,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 11e1810eeb8..74dd8435a6b 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -148,22 +148,7 @@ struct pg_locale_struct
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-			bool		casemap_full;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 extern void init_database_collation(void);
-- 
2.34.1

v14-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v14-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From c7abdaf4198e6e8d8e812523f82663fb3bede1e7 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v14 4/4] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  | 4 ++++
 src/backend/utils/adt/formatting.c    | 4 ----
 src/backend/utils/adt/pg_locale.c     | 4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 1 +
 src/backend/utils/adt/varlena.c       | 4 ++++
 src/include/utils/pg_locale.h         | 4 ----
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8acbfbbeda0..a57fe93c387 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 3960235e14e..2ba4ca7f0f2 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -71,10 +71,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/int.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5b78237f72e..f73888de68c 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 7bd58f26c44..0469c52b669 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 34796f2e27c..c57262e1888 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 74dd8435a6b..acb4890a78a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.34.1

#17

Jeff Davis

pgsql@j-davis.com

11 months ago

In reply to: Jeff Davis (#16)

4 attachment(s)

Re: Collation & ctype method table, and extension hooks

I'm still inlined to think the method table is a good thing to do:

(a) The performance cases I tried seem implausibly bad -- running
character classification patterns over large fields consisting only
of
codepoints over U+07FF.

(b) The method tables seem like a better code organization that
separates the responsibilities of the provider from the calling code.
It's also a requirement (or nearly so) if we want to provide some
pluggability or support multiple library versions.

It would be good to hear from others on these points, though.

Attached v15. Just a rebase.

I'd still like some input here. We could either:

* commit this on the grounds that it's a desirable code improvement and
the worst-case regression isn't a major concern; or

* wait until v19 when we might have a more compelling use for the
method table (e.g. pluggable provider or multilib)

Regards,
Jeff Davis

Attachments:

v15-0001-Control-ctype-behavior-internally-with-a-method-.patchtext/x-patch; charset=UTF-8; name=v15-0001-Control-ctype-behavior-internally-with-a-method-.patchDownload

From 7abb1f6c9f8c845c20d6baf648e4bba075f4a3cb Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 29 Nov 2024 09:37:43 -0800
Subject: [PATCH v15 1/4] Control ctype behavior internally with a method
 table.

Previously, pattern matching and case mapping behavior branched based
on the provider.

Refactor to use a method table, which is less error-prone and easier
to hook.
---
 src/backend/regex/regc_pg_locale.c        | 377 +++++-----------------
 src/backend/utils/adt/like.c              |  22 +-
 src/backend/utils/adt/like_support.c      |   7 +-
 src/backend/utils/adt/pg_locale.c         | 121 +++----
 src/backend/utils/adt/pg_locale_builtin.c | 111 ++++++-
 src/backend/utils/adt/pg_locale_icu.c     | 116 ++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 279 +++++++++++++---
 src/include/utils/pg_locale.h             |  52 +++
 src/tools/pgindent/typedefs.list          |   1 -
 9 files changed, 630 insertions(+), 456 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index ed7411df83d..31b8f4a9478 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -63,18 +63,13 @@
  * NB: the coding here assumes pg_wchar is an unsigned type.
  */
 
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
@@ -231,7 +226,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -252,8 +246,7 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
-		locale = 0;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -270,113 +263,41 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
-		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
+			locale = &dummy_c_locale;
 		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 }
 
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISDIGIT));
+	else
+		return pg_regex_locale->ctype->wc_isdigit(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALPHA));
+	else
+		return pg_regex_locale->ctype->wc_isalpha(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALNUM));
+	else
+		return pg_regex_locale->ctype->wc_isalnum(c, pg_regex_locale);
 }
 
 static int
@@ -391,219 +312,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISUPPER));
+	else
+		return pg_regex_locale->ctype->wc_isupper(c, pg_regex_locale);
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISLOWER));
+	else
+		return pg_regex_locale->ctype->wc_islower(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISGRAPH));
+	else
+		return pg_regex_locale->ctype->wc_isgraph(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPRINT));
+	else
+		return pg_regex_locale->ctype->wc_isprint(c, pg_regex_locale);
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPUNCT));
+	else
+		return pg_regex_locale->ctype->wc_ispunct(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISSPACE));
+	else
+		return pg_regex_locale->ctype->wc_isspace(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -729,37 +518,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 7f4cf614585..4216ac17f43 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -98,7 +98,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	else if (locale->is_default)
 		return pg_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -209,7 +209,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -224,14 +234,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 8fdc677371f..999f23f86d5 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1495,13 +1495,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7d92f580a57..f26ce0e6cc7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -100,31 +100,6 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_builtin(char *dst, size_t dstsize, const char *src,
-							  ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_icu(char *dst, size_t dstsize, const char *src,
-						  ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1236,6 +1211,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
 
+	Assert((result->ctype_is_c && result->ctype == NULL) ||
+		   (!result->ctype_is_c && result->ctype != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1398,77 +1376,31 @@ size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strlower_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strlower_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strlower_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strtitle_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strtitle_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strtitle_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strupper_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strupper_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strupper_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strfold_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strfold_icu(dst, dstsize, src, srclen, locale);
-#endif
-	/* for libc, just use strlower */
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strlower_libc(dst, dstsize, src, srclen, locale);
+	if (locale->ctype->strfold)
+		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1605,6 +1537,41 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 33ad20bbf07..23504be383a 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -25,15 +25,6 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_builtin(char *dst, size_t dstsize, const char *src,
-							  ssize_t srclen, pg_locale_t locale);
-
 
 struct WordBoundaryState
 {
@@ -76,7 +67,7 @@ initcap_wbnext(void *state)
 	return wbstate->len;
 }
 
-size_t
+static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -84,7 +75,7 @@ strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							locale->info.builtin.casemap_full);
 }
 
-size_t
+static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -101,7 +92,7 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							initcap_wbnext, &wbstate);
 }
 
-size_t
+static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -109,7 +100,7 @@ strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							locale->info.builtin.casemap_full);
 }
 
-size_t
+static size_t
 strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				pg_locale_t locale)
 {
@@ -117,6 +108,98 @@ strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 						   locale->info.builtin.casemap_full);
 }
 
+static bool
+wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isdigit(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalnum(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isupper(wc);
+}
+
+static bool
+wc_islower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_islower(wc);
+}
+
+static bool
+wc_isgraph_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isgraph(wc);
+}
+
+static bool
+wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isprint(wc);
+}
+
+static bool
+wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_ispunct(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isspace_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isspace(wc);
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+	.strfold = strfold_builtin,
+	.wc_isdigit = wc_isdigit_builtin,
+	.wc_isalpha = wc_isalpha_builtin,
+	.wc_isalnum = wc_isalnum_builtin,
+	.wc_isupper = wc_isupper_builtin,
+	.wc_islower = wc_islower_builtin,
+	.wc_isgraph = wc_isgraph_builtin,
+	.wc_isprint = wc_isprint_builtin,
+	.wc_ispunct = wc_ispunct_builtin,
+	.wc_isspace = wc_isspace_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
@@ -160,6 +243,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index b0c73f2e43d..1cb24fadea2 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -48,19 +48,19 @@
 #define		TEXTBUFLEN			1024
 
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_icu(char *dst, size_t dstsize, const char *src,
-						  ssize_t srclen, pg_locale_t locale);
 
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
 
+static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strupper_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strfold_icu(char *dst, size_t dstsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -124,6 +124,25 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -142,6 +161,78 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static bool
+wc_isdigit_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isdigit(wc);
+}
+
+static bool
+wc_isalpha_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalnum(wc);
+}
+
+static bool
+wc_isupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isupper(wc);
+}
+
+static bool
+wc_islower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_islower(wc);
+}
+
+static bool
+wc_isgraph_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isgraph(wc);
+}
+
+static bool
+wc_isprint_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isprint(wc);
+}
+
+static bool
+wc_ispunct_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_ispunct(wc);
+}
+
+static bool
+wc_isspace_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isspace(wc);
+}
+
+static const struct ctype_methods ctype_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+	.strfold = strfold_icu,
+	.wc_isdigit = wc_isdigit_icu,
+	.wc_isalpha = wc_isalpha_icu,
+	.wc_isalnum = wc_isalnum_icu,
+	.wc_isupper = wc_isupper_icu,
+	.wc_islower = wc_islower_icu,
+	.wc_isgraph = wc_isgraph_icu,
+	.wc_isprint = wc_isprint_icu,
+	.wc_ispunct = wc_ispunct_icu,
+	.wc_isspace = wc_isspace_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -212,6 +303,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -385,7 +477,7 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
-size_t
+static size_t
 strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -405,7 +497,7 @@ strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -425,7 +517,7 @@ strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -445,7 +537,7 @@ strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strfold_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 8f9a8637897..1144c6ff304 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -43,13 +43,6 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -86,6 +79,239 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static bool
+wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isdigit_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalpha_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalnum_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return islower_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isgraph_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isprint_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return ispunct_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isspace_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswdigit_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalpha_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalnum_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswupper_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswlower_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswgraph_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswprint_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswpunct_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswspace_l((wint_t) wc, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static pg_wchar
+toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static const struct ctype_methods ctype_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+/*
+ * Non-UTF8 multibyte encodings use multibyte semantics for case mapping, but
+ * single-byte semantics for pattern matching.
+ */
+static const struct ctype_methods ctype_methods_libc_other_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_utf8 = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_mb,
+	.wc_isalpha = wc_isalpha_libc_mb,
+	.wc_isalnum = wc_isalnum_libc_mb,
+	.wc_isupper = wc_isupper_libc_mb,
+	.wc_islower = wc_islower_libc_mb,
+	.wc_isgraph = wc_isgraph_libc_mb,
+	.wc_isprint = wc_isprint_libc_mb,
+	.wc_ispunct = wc_ispunct_libc_mb,
+	.wc_isspace = wc_isspace_libc_mb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_mb,
+	.wc_tolower = tolower_libc_mb,
+};
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -120,36 +346,6 @@ static const struct collate_methods collate_methods_libc_win32_utf8 = {
 };
 #endif
 
-size_t
-strlower_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strlower_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strlower_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strtitle_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strtitle_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strtitle_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strupper_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strupper_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strupper_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
 static size_t
 strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
@@ -482,6 +678,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_utf8;
+		else if (pg_database_encoding_max_length() > 1)
+			result->ctype = &ctype_methods_libc_other_mb;
+		else
+			result->ctype = &ctype_methods_libc_sb;
+	}
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 0d5f0513ceb..cd2c812ae26 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,6 +12,8 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
@@ -77,6 +79,52 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+struct ctype_methods
+{
+	/* case mapping: LOWER()/INITCAP()/UPPER() */
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strfold) (char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+
+	/* required */
+	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalpha) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalnum) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isupper) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_islower) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isgraph) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isprint) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -102,6 +150,7 @@ struct pg_locale_struct
 	bool		is_default;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -125,6 +174,9 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dest, size_t destsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9a3bee93dec..594c98509c0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1829,7 +1829,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGcancel
-- 
2.34.1

v15-0002-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v15-0002-Remove-provider-field-from-pg_locale_t.patchDownload

From 25e77e96f99412a4bf1a6d363bb5dbe8f914fbbf Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v15 2/4] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale_builtin.c |  1 -
 src/backend/utils/adt/pg_locale_icu.c     | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c    |  6 ------
 src/include/utils/pg_locale.h             |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 23504be383a..bba98284e25 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -239,7 +239,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
 	result->info.builtin.casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 1cb24fadea2..f0068749471 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -295,7 +295,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -572,8 +571,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -601,8 +598,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -647,8 +642,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -847,8 +840,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -897,8 +888,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 1144c6ff304..1582f8cdd2a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -662,7 +662,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -782,8 +781,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -838,8 +835,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -948,7 +943,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cd2c812ae26..f081091d200 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -143,7 +143,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.34.1

v15-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchtext/x-patch; charset=UTF-8; name=v15-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchDownload

From babe04624ec0143794329950547ff9a4dda91990 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v15 3/4] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale_builtin.c |  55 +++++--
 src/backend/utils/adt/pg_locale_icu.c     |  40 ++++--
 src/backend/utils/adt/pg_locale_libc.c    | 167 +++++++++++++++-------
 src/include/utils/pg_locale.h             |  17 +--
 4 files changed, 197 insertions(+), 82 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index bba98284e25..f0462aa96aa 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -26,6 +26,12 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
 
+struct builtin_provider
+{
+	const char *locale;
+	bool		casemap_full;
+};
+
 struct WordBoundaryState
 {
 	const char *str;
@@ -71,14 +77,19 @@ static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strlower(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full);
+							builtin->casemap_full);
 }
 
 static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
 	struct WordBoundaryState wbstate = {
 		.str = src,
 		.len = srclen,
@@ -87,8 +98,10 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		.prev_alnum = false,
 	};
 
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strtitle(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full,
+							builtin->casemap_full,
 							initcap_wbnext, &wbstate);
 }
 
@@ -96,22 +109,34 @@ static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strupper(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full);
+							builtin->casemap_full);
 }
 
 static size_t
 strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strfold(dest, destsize, src, srclen,
-						   locale->info.builtin.casemap_full);
+						   builtin->casemap_full);
 }
 
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isdigit(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_isdigit(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -123,7 +148,11 @@ wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
 static bool
 wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isalnum(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_isalnum(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -153,7 +182,11 @@ wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
 static bool
 wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_ispunct(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_ispunct(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -204,6 +237,7 @@ pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
+	struct builtin_provider *builtin;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -237,8 +271,11 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->info.builtin.casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
+	builtin = MemoryContextAllocZero(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	builtin->casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
+	result->provider_data = (void *) builtin;
+
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index f0068749471..19c51504f85 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static size_t strlower_icu(char *dst, size_t dstsize, const char *src,
@@ -242,6 +248,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -293,8 +300,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -570,11 +581,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -598,6 +610,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -611,7 +625,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -642,12 +656,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -754,11 +770,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -766,7 +784,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -840,6 +858,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -862,7 +882,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -888,6 +908,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -907,7 +929,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 1582f8cdd2a..d357962ebdf 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -41,6 +42,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -82,116 +88,154 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 static bool
 wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isdigit_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isdigit_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalpha_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isalpha_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalnum_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isalnum_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isupper_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isupper_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return islower_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return islower_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isgraph_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isgraph_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isprint_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isprint_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return ispunct_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return ispunct_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isspace_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isspace_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswdigit_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswdigit_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalpha_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswalpha_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalnum_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswalnum_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswupper_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswupper_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswlower_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswlower_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswgraph_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswgraph_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswprint_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswprint_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswpunct_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswpunct_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswspace_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswspace_l((wint_t) wc, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static bool
@@ -199,19 +243,23 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -219,10 +267,12 @@ toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -230,10 +280,12 @@ toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -241,10 +293,12 @@ tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -355,7 +409,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -376,7 +430,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_tolower((unsigned char) *p);
 			else
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -387,7 +441,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -409,7 +464,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -440,7 +495,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -466,11 +521,11 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			else
 			{
 				if (wasalnum)
-					*p = tolower_l((unsigned char) *p, loc);
+					*p = tolower_l((unsigned char) *p, libc->lt);
 				else
-					*p = toupper_l((unsigned char) *p, loc);
+					*p = toupper_l((unsigned char) *p, libc->lt);
 			}
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -481,7 +536,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -506,10 +562,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -541,7 +597,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -559,7 +615,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_toupper((unsigned char) *p);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -570,7 +626,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -592,7 +649,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -620,6 +677,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -658,16 +716,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -781,6 +842,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -811,7 +874,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -835,8 +898,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -845,7 +910,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -943,6 +1008,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -987,7 +1054,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -1116,8 +1183,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1176,8 +1245,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index f081091d200..ccf10cd617b 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -151,22 +151,7 @@ struct pg_locale_struct
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-			bool		casemap_full;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 extern void init_database_collation(void);
-- 
2.34.1

v15-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v15-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From e32a88cc236a84cbc80f00c1713f2e851be1d109 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v15 4/4] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  | 4 ++++
 src/backend/utils/adt/formatting.c    | 4 ----
 src/backend/utils/adt/pg_locale.c     | 4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 1 +
 src/backend/utils/adt/varlena.c       | 4 ++++
 src/include/utils/pg_locale.h         | 4 ----
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8acbfbbeda0..a57fe93c387 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 5bd1e01f7e4..b3d5e0436ee 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -70,10 +70,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/int.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f26ce0e6cc7..894f141dbb1 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -54,6 +54,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 19c51504f85..2ff7b960abc 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 34796f2e27c..c57262e1888 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index ccf10cd617b..4c941d02d76 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.34.1

#18

Jeff Davis

pgsql@j-davis.com

7 months ago

In reply to: Jeff Davis (#17)

4 attachment(s)

Re: Collation & ctype method table, and extension hooks

On Fri, 2025-02-07 at 11:19 -0800, Jeff Davis wrote:

Attached v15. Just a rebase.

Attached v16.

* commit this on the grounds that it's a desirable code improvement
and
the worst-case regression isn't a major concern; or

I plan to commit this soon after branching. There's a general consensus
that enabling multi-lib provider support is a good idea, and turning
the provider behavior into method tables is a prerequisite for that. I
doubt the performance issue will be a serious concern and I don't see a
good way to avoid it.

Regards,
Jeff Davis

Attachments:

v16-0001-Control-ctype-behavior-internally-with-a-method-.patchtext/x-patch; charset=UTF-8; name=v16-0001-Control-ctype-behavior-internally-with-a-method-.patchDownload

From c9a464483e3341ccb828ca0a6a5fd1b698d56cee Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 29 Nov 2024 09:37:43 -0800
Subject: [PATCH v16 1/4] Control ctype behavior internally with a method
 table.

Previously, pattern matching and case mapping behavior branched based
on the provider.

Refactor to use a method table, which is less error-prone and easier
to hook.
---
 src/backend/regex/regc_pg_locale.c        | 429 ++++------------------
 src/backend/utils/adt/like.c              |  22 +-
 src/backend/utils/adt/like_support.c      |   7 +-
 src/backend/utils/adt/pg_locale.c         | 121 +++---
 src/backend/utils/adt/pg_locale_builtin.c | 111 +++++-
 src/backend/utils/adt/pg_locale_icu.c     | 119 +++++-
 src/backend/utils/adt/pg_locale_libc.c    | 331 +++++++++++++++--
 src/include/utils/pg_locale.h             |  53 +++
 src/tools/pgindent/typedefs.list          |   1 -
 9 files changed, 686 insertions(+), 508 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 78193cfb964..d9eab5357bc 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -20,58 +20,13 @@
 #include "common/unicode_category.h"
 #include "utils/pg_locale.h"
 
-/*
- * For the libc provider, to provide as much functionality as possible on a
- * variety of platforms without going so far as to implement everything from
- * scratch, we use several implementation strategies depending on the
- * situation:
- *
- * 1. In C/POSIX collations, we use hard-wired code.  We can't depend on
- * the <ctype.h> functions since those will obey LC_CTYPE.  Note that these
- * collations don't give a fig about multibyte characters.
- *
- * 2. When working in UTF8 encoding, we use the <wctype.h> functions.
- * This assumes that every platform uses Unicode codepoints directly
- * as the wchar_t representation of Unicode.  (XXX: ICU makes this assumption
- * even for non-UTF8 encodings, which may be a problem.)  On some platforms
- * wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.
- *
- * 3. In all other encodings, we use the <ctype.h> functions for pg_wchar
- * values up to 255, and punt for values above that.  This is 100% correct
- * only in single-byte encodings such as LATINn.  However, non-Unicode
- * multibyte encodings are mostly Far Eastern character sets for which the
- * properties being tested here aren't very relevant for higher code values
- * anyway.  The difficulty with using the <wctype.h> functions with
- * non-Unicode multibyte encodings is that we can have no certainty that
- * the platform's wchar_t representation matches what we do in pg_wchar
- * conversions.
- *
- * As a special case, in the "default" collation, (2) and (3) force ASCII
- * letters to follow ASCII upcase/downcase rules, while in a non-default
- * collation we just let the library functions do what they will.  The case
- * where this matters is treatment of I/i in Turkish, and the behavior is
- * meant to match the upper()/lower() SQL functions.
- *
- * We store the active collation setting in static variables.  In principle
- * it could be passed down to here via the regex library's "struct vars" data
- * structure; but that would require somewhat invasive changes in the regex
- * library, and right now there's no real benefit to be gained from that.
- *
- * NB: the coding here assumes pg_wchar is an unsigned type.
- */
-
-typedef enum
-{
-	PG_REGEX_STRATEGY_C,		/* C locale (encoding independent) */
-	PG_REGEX_STRATEGY_BUILTIN,	/* built-in Unicode semantics */
-	PG_REGEX_STRATEGY_LIBC_WIDE,	/* Use locale_t <wctype.h> functions */
-	PG_REGEX_STRATEGY_LIBC_1BYTE,	/* Use locale_t <ctype.h> functions */
-	PG_REGEX_STRATEGY_ICU,		/* Use ICU uchar.h functions */
-} PG_Locale_Strategy;
-
-static PG_Locale_Strategy pg_regex_strategy;
 static pg_locale_t pg_regex_locale;
 
+static struct pg_locale_struct dummy_c_locale = {
+	.collate_is_c = true,
+	.ctype_is_c = true,
+};
+
 /*
  * Hard-wired character properties for C locale
  */
@@ -228,7 +183,6 @@ void
 pg_set_regex_collation(Oid collation)
 {
 	pg_locale_t locale = 0;
-	PG_Locale_Strategy strategy;
 
 	if (!OidIsValid(collation))
 	{
@@ -249,8 +203,7 @@ pg_set_regex_collation(Oid collation)
 		 * catalog access is available, so we can't call
 		 * pg_newlocale_from_collation().
 		 */
-		strategy = PG_REGEX_STRATEGY_C;
-		locale = 0;
+		locale = &dummy_c_locale;
 	}
 	else
 	{
@@ -267,113 +220,41 @@ pg_set_regex_collation(Oid collation)
 			 * C/POSIX collations use this path regardless of database
 			 * encoding
 			 */
-			strategy = PG_REGEX_STRATEGY_C;
-			locale = 0;
-		}
-		else if (locale->provider == COLLPROVIDER_BUILTIN)
-		{
-			Assert(GetDatabaseEncoding() == PG_UTF8);
-			strategy = PG_REGEX_STRATEGY_BUILTIN;
-		}
-#ifdef USE_ICU
-		else if (locale->provider == COLLPROVIDER_ICU)
-		{
-			strategy = PG_REGEX_STRATEGY_ICU;
-		}
-#endif
-		else
-		{
-			Assert(locale->provider == COLLPROVIDER_LIBC);
-			if (GetDatabaseEncoding() == PG_UTF8)
-				strategy = PG_REGEX_STRATEGY_LIBC_WIDE;
-			else
-				strategy = PG_REGEX_STRATEGY_LIBC_1BYTE;
+			locale = &dummy_c_locale;
 		}
 	}
 
-	pg_regex_strategy = strategy;
 	pg_regex_locale = locale;
 }
 
 static int
 pg_wc_isdigit(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISDIGIT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isdigit(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isdigit(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISDIGIT));
+	else
+		return pg_regex_locale->ctype->wc_isdigit(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalpha(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALPHA));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalpha(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalpha(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALPHA));
+	else
+		return pg_regex_locale->ctype->wc_isalpha(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isalnum(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISALNUM));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isalnum(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isalnum(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISALNUM));
+	else
+		return pg_regex_locale->ctype->wc_isalnum(c, pg_regex_locale);
 }
 
 static int
@@ -388,231 +269,87 @@ pg_wc_isword(pg_wchar c)
 static int
 pg_wc_isupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISUPPER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isupper(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isupper(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISUPPER));
+	else
+		return pg_regex_locale->ctype->wc_isupper(c, pg_regex_locale);
 }
 
 static int
 pg_wc_islower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISLOWER));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_islower(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					islower_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_islower(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISLOWER));
+	else
+		return pg_regex_locale->ctype->wc_islower(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isgraph(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISGRAPH));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isgraph(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isgraph(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISGRAPH));
+	else
+		return pg_regex_locale->ctype->wc_isgraph(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isprint(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPRINT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isprint(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isprint(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPRINT));
+	else
+		return pg_regex_locale->ctype->wc_isprint(c, pg_regex_locale);
 }
 
 static int
 pg_wc_ispunct(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISPUNCT));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_ispunct(c, !pg_regex_locale->info.builtin.casemap_full);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_ispunct(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISPUNCT));
+	else
+		return pg_regex_locale->ctype->wc_ispunct(c, pg_regex_locale);
 }
 
 static int
 pg_wc_isspace(pg_wchar c)
 {
-	switch (pg_regex_strategy)
-	{
-		case PG_REGEX_STRATEGY_C:
-			return (c <= (pg_wchar) 127 &&
-					(pg_char_properties[c] & PG_ISSPACE));
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return pg_u_isspace(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			return (c <= (pg_wchar) UCHAR_MAX &&
-					isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-			break;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_isspace(c);
-#endif
-			break;
-	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	if (pg_regex_locale->ctype_is_c)
+		return (c <= (pg_wchar) 127 &&
+				(pg_char_properties[c] & PG_ISSPACE));
+	else
+		return pg_regex_locale->ctype->wc_isspace(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_toupper(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_uppercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			/* force C behavior for ASCII characters, per comments above */
-			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			/* force C behavior for ASCII characters, per comments above */
-			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
-				return pg_ascii_toupper((unsigned char) c);
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_toupper(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_toupper((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_toupper(c, pg_regex_locale);
 }
 
 static pg_wchar
 pg_wc_tolower(pg_wchar c)
 {
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
-			if (c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			return c;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			return unicode_lowercase_simple(c);
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			/* force C behavior for ASCII characters, per comments above */
-			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
-				return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-			/* FALL THRU */
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-			/* force C behavior for ASCII characters, per comments above */
-			if (pg_regex_locale->is_default && c <= (pg_wchar) 127)
-				return pg_ascii_tolower((unsigned char) c);
-			if (c <= (pg_wchar) UCHAR_MAX)
-				return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-			return c;
-		case PG_REGEX_STRATEGY_ICU:
-#ifdef USE_ICU
-			return u_tolower(c);
-#endif
-			break;
+		if (c <= (pg_wchar) 127)
+			return pg_ascii_tolower((unsigned char) c);
+		return c;
 	}
-	return 0;					/* can't get here, but keep compiler quiet */
+	else
+		return pg_regex_locale->ctype->wc_tolower(c, pg_regex_locale);
 }
 
 
@@ -738,37 +475,25 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
 	 * useful to allow it to be small for testing purposes.)
 	 */
-	switch (pg_regex_strategy)
+	if (pg_regex_locale->ctype_is_c)
 	{
-		case PG_REGEX_STRATEGY_C:
 #if MAX_SIMPLE_CHR >= 127
-			max_chr = (pg_wchar) 127;
-			pcc->cv.cclasscode = -1;
+		max_chr = (pg_wchar) 127;
+		pcc->cv.cclasscode = -1;
 #else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
-			break;
-		case PG_REGEX_STRATEGY_BUILTIN:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_WIDE:
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		case PG_REGEX_STRATEGY_LIBC_1BYTE:
-#if MAX_SIMPLE_CHR >= UCHAR_MAX
-			max_chr = (pg_wchar) UCHAR_MAX;
+	}
+	else
+	{
+		if (pg_regex_locale->ctype->max_chr != 0 &&
+			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
+		{
+			max_chr = pg_regex_locale->ctype->max_chr;
 			pcc->cv.cclasscode = -1;
-#else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-#endif
-			break;
-		case PG_REGEX_STRATEGY_ICU:
+		}
+		else
 			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
-			break;
-		default:
-			Assert(false);
-			max_chr = 0;		/* can't get here, but keep compiler quiet */
-			break;
 	}
 
 	/*
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 7f4cf614585..4216ac17f43 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -98,7 +98,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	else if (locale->is_default)
 		return pg_tolower(c);
 	else
-		return tolower_l(c, locale->info.lt);
+		return char_tolower(c, locale);
 }
 
 
@@ -209,7 +209,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (pg_database_encoding_max_length() > 1 || (locale->provider == COLLPROVIDER_ICU))
+	if (locale->ctype_is_c ||
+		(char_tolower_enabled(locale) &&
+		 pg_database_encoding_max_length() == 1))
+	{
+		p = VARDATA_ANY(pat);
+		plen = VARSIZE_ANY_EXHDR(pat);
+		s = VARDATA_ANY(str);
+		slen = VARSIZE_ANY_EXHDR(str);
+		return SB_IMatchText(s, slen, p, plen, locale);
+	}
+	else
 	{
 		pat = DatumGetTextPP(DirectFunctionCall1Coll(lower, collation,
 													 PointerGetDatum(pat)));
@@ -224,14 +234,6 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		else
 			return MB_MatchText(s, slen, p, plen, 0);
 	}
-	else
-	{
-		p = VARDATA_ANY(pat);
-		plen = VARSIZE_ANY_EXHDR(pat);
-		s = VARDATA_ANY(str);
-		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
-	}
 }
 
 /*
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 8fdc677371f..999f23f86d5 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1495,13 +1495,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
 {
 	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else if (is_multibyte && IS_HIGHBIT_SET(c))
-		return true;
-	else if (locale->provider != COLLPROVIDER_LIBC)
-		return IS_HIGHBIT_SET(c) ||
-			(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else
-		return isalpha_l((unsigned char) c, locale->info.lt);
+		return char_is_cased(c, locale);
 }
 
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f5e31c433a0..451ac4e2d9b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -80,31 +80,6 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
-extern size_t strlower_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dst, size_t dstsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_builtin(char *dst, size_t dstsize, const char *src,
-							  ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dst, size_t dstsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_icu(char *dst, size_t dstsize, const char *src,
-						  ssize_t srclen, pg_locale_t locale);
-
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 /* GUC settings */
 char	   *locale_messages;
 char	   *locale_monetary;
@@ -1093,6 +1068,9 @@ create_pg_locale(Oid collid, MemoryContext context)
 	Assert((result->collate_is_c && result->collate == NULL) ||
 		   (!result->collate_is_c && result->collate != NULL));
 
+	Assert((result->ctype_is_c && result->ctype == NULL) ||
+		   (!result->ctype_is_c && result->ctype != NULL));
+
 	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 							&isnull);
 	if (!isnull)
@@ -1257,77 +1235,31 @@ size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strlower_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strlower_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strlower_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strtitle_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strtitle_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strtitle_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strupper_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strupper_icu(dst, dstsize, src, srclen, locale);
-#endif
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strupper_libc(dst, dstsize, src, srclen, locale);
-	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->provider == COLLPROVIDER_BUILTIN)
-		return strfold_builtin(dst, dstsize, src, srclen, locale);
-#ifdef USE_ICU
-	else if (locale->provider == COLLPROVIDER_ICU)
-		return strfold_icu(dst, dstsize, src, srclen, locale);
-#endif
-	/* for libc, just use strlower */
-	else if (locale->provider == COLLPROVIDER_LIBC)
-		return strlower_libc(dst, dstsize, src, srclen, locale);
+	if (locale->ctype->strfold)
+		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 	else
-		/* shouldn't happen */
-		PGLOCALE_SUPPORT_ERROR(locale->provider);
-
-	return 0;					/* keep compiler quiet */
+		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1464,6 +1396,41 @@ pg_strnxfrm_prefix(char *dest, size_t destsize, const char *src,
 	return locale->collate->strnxfrm_prefix(dest, destsize, src, srclen, locale);
 }
 
+/*
+ * char_is_cased()
+ *
+ * Fuzzy test of whether the given char is case-varying or not. The argument
+ * is a single byte, so in a multibyte encoding, just assume any non-ASCII
+ * char is case-varying.
+ */
+bool
+char_is_cased(char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_is_cased(ch, locale);
+}
+
+/*
+ * char_tolower_enabled()
+ *
+ * Does the provider support char_tolower()?
+ */
+bool
+char_tolower_enabled(pg_locale_t locale)
+{
+	return (locale->ctype->char_tolower != NULL);
+}
+
+/*
+ * char_tolower()
+ *
+ * Convert char (single-byte encoding) to lowercase.
+ */
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
+{
+	return locale->ctype->char_tolower(ch, locale);
+}
+
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index f51768830cd..6fe090708ff 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -25,15 +25,6 @@
 extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
-extern size_t strlower_builtin(char *dest, size_t destsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_builtin(char *dest, size_t destsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_builtin(char *dest, size_t destsize, const char *src,
-							   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_builtin(char *dest, size_t destsize, const char *src,
-							  ssize_t srclen, pg_locale_t locale);
-
 
 struct WordBoundaryState
 {
@@ -77,7 +68,7 @@ initcap_wbnext(void *state)
 	return wbstate->len;
 }
 
-size_t
+static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -85,7 +76,7 @@ strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							locale->info.builtin.casemap_full);
 }
 
-size_t
+static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -103,7 +94,7 @@ strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							initcap_wbnext, &wbstate);
 }
 
-size_t
+static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
@@ -111,7 +102,7 @@ strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 							locale->info.builtin.casemap_full);
 }
 
-size_t
+static size_t
 strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				pg_locale_t locale)
 {
@@ -119,6 +110,98 @@ strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 						   locale->info.builtin.casemap_full);
 }
 
+static bool
+wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isdigit(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isalnum(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isupper(wc);
+}
+
+static bool
+wc_islower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_islower(wc);
+}
+
+static bool
+wc_isgraph_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isgraph(wc);
+}
+
+static bool
+wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isprint(wc);
+}
+
+static bool
+wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_ispunct(wc, !locale->info.builtin.casemap_full);
+}
+
+static bool
+wc_isspace_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_isspace(wc);
+}
+
+static bool
+char_is_cased_builtin(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_uppercase_simple(wc);
+}
+
+static pg_wchar
+wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return unicode_lowercase_simple(wc);
+}
+
+static const struct ctype_methods ctype_methods_builtin = {
+	.strlower = strlower_builtin,
+	.strtitle = strtitle_builtin,
+	.strupper = strupper_builtin,
+	.strfold = strfold_builtin,
+	.wc_isdigit = wc_isdigit_builtin,
+	.wc_isalpha = wc_isalpha_builtin,
+	.wc_isalnum = wc_isalnum_builtin,
+	.wc_isupper = wc_isupper_builtin,
+	.wc_islower = wc_islower_builtin,
+	.wc_isgraph = wc_isgraph_builtin,
+	.wc_isprint = wc_isprint_builtin,
+	.wc_ispunct = wc_ispunct_builtin,
+	.wc_isspace = wc_isspace_builtin,
+	.char_is_cased = char_is_cased_builtin,
+	.wc_tolower = wc_tolower_builtin,
+	.wc_toupper = wc_toupper_builtin,
+};
+
 pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
@@ -162,6 +245,8 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
+	if (!result->ctype_is_c)
+		result->ctype = &ctype_methods_builtin;
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index a32c32a0744..1f4ee2d1990 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -48,19 +48,22 @@
 #define		TEXTBUFLEN			1024
 
 extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
-extern size_t strlower_icu(char *dest, size_t destsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_icu(char *dest, size_t destsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_icu(char *dest, size_t destsize, const char *src,
-						   ssize_t srclen, pg_locale_t locale);
-extern size_t strfold_icu(char *dest, size_t destsize, const char *src,
-						  ssize_t srclen, pg_locale_t locale);
 
 #ifdef USE_ICU
 
 extern UCollator *pg_ucol_open(const char *loc_str);
 
+static size_t strlower_icu(char *dest, size_t destsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strtitle_icu(char *dest, size_t destsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strupper_icu(char *dest, size_t destsize, const char *src,
+						   ssize_t srclen, pg_locale_t locale);
+static size_t strfold_icu(char *dest, size_t destsize, const char *src,
+						  ssize_t srclen, pg_locale_t locale);
+static int	strncoll_icu(const char *arg1, ssize_t len1,
+						 const char *arg2, ssize_t len2,
+						 pg_locale_t locale);
 static size_t strnxfrm_icu(char *dest, size_t destsize,
 						   const char *src, ssize_t srclen,
 						   pg_locale_t locale);
@@ -118,6 +121,25 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
+static bool
+char_is_cased_icu(char ch, pg_locale_t locale)
+{
+	return IS_HIGHBIT_SET(ch) ||
+		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+}
+
+static pg_wchar
+toupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_toupper(wc);
+}
+
+static pg_wchar
+tolower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_tolower(wc);
+}
+
 static const struct collate_methods collate_methods_icu = {
 	.strncoll = strncoll_icu,
 	.strnxfrm = strnxfrm_icu,
@@ -136,6 +158,78 @@ static const struct collate_methods collate_methods_icu_utf8 = {
 	.strxfrm_is_safe = true,
 };
 
+static bool
+wc_isdigit_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isdigit(wc);
+}
+
+static bool
+wc_isalpha_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalpha(wc);
+}
+
+static bool
+wc_isalnum_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isalnum(wc);
+}
+
+static bool
+wc_isupper_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isupper(wc);
+}
+
+static bool
+wc_islower_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_islower(wc);
+}
+
+static bool
+wc_isgraph_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isgraph(wc);
+}
+
+static bool
+wc_isprint_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isprint(wc);
+}
+
+static bool
+wc_ispunct_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_ispunct(wc);
+}
+
+static bool
+wc_isspace_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_isspace(wc);
+}
+
+static const struct ctype_methods ctype_methods_icu = {
+	.strlower = strlower_icu,
+	.strtitle = strtitle_icu,
+	.strupper = strupper_icu,
+	.strfold = strfold_icu,
+	.wc_isdigit = wc_isdigit_icu,
+	.wc_isalpha = wc_isalpha_icu,
+	.wc_isalnum = wc_isalnum_icu,
+	.wc_isupper = wc_isupper_icu,
+	.wc_islower = wc_islower_icu,
+	.wc_isgraph = wc_isgraph_icu,
+	.wc_isprint = wc_isprint_icu,
+	.wc_ispunct = wc_ispunct_icu,
+	.wc_isspace = wc_isspace_icu,
+	.char_is_cased = char_is_cased_icu,
+	.wc_toupper = toupper_icu,
+	.wc_tolower = tolower_icu,
+};
 #endif
 
 pg_locale_t
@@ -206,6 +300,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		result->collate = &collate_methods_icu_utf8;
 	else
 		result->collate = &collate_methods_icu;
+	result->ctype = &ctype_methods_icu;
 
 	return result;
 #else
@@ -379,7 +474,7 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 	}
 }
 
-size_t
+static size_t
 strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -399,7 +494,7 @@ strlower_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -419,7 +514,7 @@ strtitle_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			 pg_locale_t locale)
 {
@@ -439,7 +534,7 @@ strupper_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
-size_t
+static size_t
 strfold_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 199857e22db..be714db5283 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -33,6 +33,46 @@
 #include <shlwapi.h>
 #endif
 
+/*
+ * For the libc provider, to provide as much functionality as possible on a
+ * variety of platforms without going so far as to implement everything from
+ * scratch, we use several implementation strategies depending on the
+ * situation:
+ *
+ * 1. In C/POSIX collations, we use hard-wired code.  We can't depend on
+ * the <ctype.h> functions since those will obey LC_CTYPE.  Note that these
+ * collations don't give a fig about multibyte characters.
+ *
+ * 2. When working in UTF8 encoding, we use the <wctype.h> functions.
+ * This assumes that every platform uses Unicode codepoints directly
+ * as the wchar_t representation of Unicode.  (XXX: ICU makes this assumption
+ * even for non-UTF8 encodings, which may be a problem.)  On some platforms
+ * wchar_t is only 16 bits wide, so we have to punt for codepoints > 0xFFFF.
+ *
+ * 3. In all other encodings, we use the <ctype.h> functions for pg_wchar
+ * values up to 255, and punt for values above that.  This is 100% correct
+ * only in single-byte encodings such as LATINn.  However, non-Unicode
+ * multibyte encodings are mostly Far Eastern character sets for which the
+ * properties being tested here aren't very relevant for higher code values
+ * anyway.  The difficulty with using the <wctype.h> functions with
+ * non-Unicode multibyte encodings is that we can have no certainty that
+ * the platform's wchar_t representation matches what we do in pg_wchar
+ * conversions.
+ *
+ * As a special case, in the "default" collation, (2) and (3) force ASCII
+ * letters to follow ASCII upcase/downcase rules, while in a non-default
+ * collation we just let the library functions do what they will.  The case
+ * where this matters is treatment of I/i in Turkish, and the behavior is
+ * meant to match the upper()/lower() SQL functions.
+ *
+ * We store the active collation setting in static variables.  In principle
+ * it could be passed down to here via the regex library's "struct vars" data
+ * structure; but that would require somewhat invasive changes in the regex
+ * library, and right now there's no real benefit to be gained from that.
+ *
+ * NB: the coding here assumes pg_wchar is an unsigned type.
+ */
+
 /*
  * Size of stack buffer to use for string transformations, used to avoid heap
  * allocations in typical cases. This should be large enough that most strings
@@ -43,13 +83,6 @@
 
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
-extern size_t strlower_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strtitle_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-extern size_t strupper_libc(char *dst, size_t dstsize, const char *src,
-							ssize_t srclen, pg_locale_t locale);
-
 static int	strncoll_libc(const char *arg1, ssize_t len1,
 						  const char *arg2, ssize_t len2,
 						  pg_locale_t locale);
@@ -85,6 +118,251 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+static bool
+wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isdigit_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalpha_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isalnum_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return islower_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isgraph_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isprint_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return ispunct_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isspace_l((unsigned char) wc, locale->info.lt);
+}
+
+static bool
+wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswdigit_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalpha_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswalnum_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswupper_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswlower_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswgraph_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswprint_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswpunct_l((wint_t) wc, locale->info.lt);
+}
+
+static bool
+wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswspace_l((wint_t) wc, locale->info.lt);
+}
+
+static char
+char_tolower_libc(unsigned char ch, pg_locale_t locale)
+{
+	Assert(pg_database_encoding_max_length() == 1);
+	return tolower_l(ch, locale->info.lt);
+}
+
+static bool
+char_is_cased_libc(char ch, pg_locale_t locale)
+{
+	bool		is_multibyte = pg_database_encoding_max_length() > 1;
+
+	if (is_multibyte && IS_HIGHBIT_SET(ch))
+		return true;
+	else
+		return isalpha_l((unsigned char) ch, locale->info.lt);
+}
+
+static pg_wchar
+toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	/* force C behavior for ASCII characters, per comments above */
+	if (locale->is_default && wc <= (pg_wchar) 127)
+		return pg_ascii_toupper((unsigned char) wc);
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return toupper_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	/* force C behavior for ASCII characters, per comments above */
+	if (locale->is_default && wc <= (pg_wchar) 127)
+		return pg_ascii_toupper((unsigned char) wc);
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towupper_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() != PG_UTF8);
+
+	/* force C behavior for ASCII characters, per comments above */
+	if (locale->is_default && wc <= (pg_wchar) 127)
+		return pg_ascii_tolower((unsigned char) wc);
+	if (wc <= (pg_wchar) UCHAR_MAX)
+		return tolower_l((unsigned char) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static pg_wchar
+tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	/* force C behavior for ASCII characters, per comments above */
+	if (locale->is_default && wc <= (pg_wchar) 127)
+		return pg_ascii_tolower((unsigned char) wc);
+	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
+		return towlower_l((wint_t) wc, locale->info.lt);
+	else
+		return wc;
+}
+
+static const struct ctype_methods ctype_methods_libc_sb = {
+	.strlower = strlower_libc_sb,
+	.strtitle = strtitle_libc_sb,
+	.strupper = strupper_libc_sb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+/*
+ * Non-UTF8 multibyte encodings use multibyte semantics for case mapping, but
+ * single-byte semantics for pattern matching.
+ */
+static const struct ctype_methods ctype_methods_libc_other_mb = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_sb,
+	.wc_isalpha = wc_isalpha_libc_sb,
+	.wc_isalnum = wc_isalnum_libc_sb,
+	.wc_isupper = wc_isupper_libc_sb,
+	.wc_islower = wc_islower_libc_sb,
+	.wc_isgraph = wc_isgraph_libc_sb,
+	.wc_isprint = wc_isprint_libc_sb,
+	.wc_ispunct = wc_ispunct_libc_sb,
+	.wc_isspace = wc_isspace_libc_sb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_sb,
+	.wc_tolower = tolower_libc_sb,
+	.max_chr = UCHAR_MAX,
+};
+
+static const struct ctype_methods ctype_methods_libc_utf8 = {
+	.strlower = strlower_libc_mb,
+	.strtitle = strtitle_libc_mb,
+	.strupper = strupper_libc_mb,
+	.wc_isdigit = wc_isdigit_libc_mb,
+	.wc_isalpha = wc_isalpha_libc_mb,
+	.wc_isalnum = wc_isalnum_libc_mb,
+	.wc_isupper = wc_isupper_libc_mb,
+	.wc_islower = wc_islower_libc_mb,
+	.wc_isgraph = wc_isgraph_libc_mb,
+	.wc_isprint = wc_isprint_libc_mb,
+	.wc_ispunct = wc_ispunct_libc_mb,
+	.wc_isspace = wc_isspace_libc_mb,
+	.char_is_cased = char_is_cased_libc,
+	.char_tolower = char_tolower_libc,
+	.wc_toupper = toupper_libc_mb,
+	.wc_tolower = tolower_libc_mb,
+};
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -119,36 +397,6 @@ static const struct collate_methods collate_methods_libc_win32_utf8 = {
 };
 #endif
 
-size_t
-strlower_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strlower_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strlower_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strtitle_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strtitle_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strtitle_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
-size_t
-strupper_libc(char *dst, size_t dstsize, const char *src,
-			  ssize_t srclen, pg_locale_t locale)
-{
-	if (pg_database_encoding_max_length() > 1)
-		return strupper_libc_mb(dst, dstsize, src, srclen, locale);
-	else
-		return strupper_libc_sb(dst, dstsize, src, srclen, locale);
-}
-
 static size_t
 strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
@@ -481,6 +729,15 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 #endif
 			result->collate = &collate_methods_libc;
 	}
+	if (!result->ctype_is_c)
+	{
+		if (GetDatabaseEncoding() == PG_UTF8)
+			result->ctype = &ctype_methods_libc_utf8;
+		else if (pg_database_encoding_max_length() > 1)
+			result->ctype = &ctype_methods_libc_other_mb;
+		else
+			result->ctype = &ctype_methods_libc_sb;
+	}
 
 	return result;
 }
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 7b8cbf58d2c..0f497fa8ce2 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -12,6 +12,8 @@
 #ifndef _PG_LOCALE_
 #define _PG_LOCALE_
 
+#include "mb/pg_wchar.h"
+
 #ifdef USE_ICU
 #include <unicode/ucol.h>
 #endif
@@ -77,6 +79,52 @@ struct collate_methods
 	bool		strxfrm_is_safe;
 };
 
+struct ctype_methods
+{
+	/* case mapping: LOWER()/INITCAP()/UPPER() */
+	size_t		(*strlower) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strtitle) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strupper) (char *dest, size_t destsize,
+							 const char *src, ssize_t srclen,
+							 pg_locale_t locale);
+	size_t		(*strfold) (char *dest, size_t destsize,
+							const char *src, ssize_t srclen,
+							pg_locale_t locale);
+
+	/* required */
+	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalpha) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isalnum) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isupper) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_islower) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isgraph) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isprint) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
+	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
+
+	/* required */
+	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+
+	/*
+	 * Optional. If defined, will only be called for single-byte encodings. If
+	 * not defined, or if the encoding is multibyte, will fall back to
+	 * pg_strlower().
+	 */
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+
+	/*
+	 * For regex and pattern matching efficiency, the maximum char value
+	 * supported by the above methods. If zero, limit is set by regex code.
+	 */
+	pg_wchar	max_chr;
+};
+
 /*
  * We use a discriminated union to hold either a locale_t or an ICU collator.
  * pg_locale_t is occasionally checked for truth, so make it a pointer.
@@ -102,6 +150,7 @@ struct pg_locale_struct
 	bool		is_default;
 
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
+	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
 	union
 	{
@@ -125,6 +174,10 @@ extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
+
+extern bool char_is_cased(char ch, pg_locale_t locale);
+extern bool char_tolower_enabled(pg_locale_t locale);
+extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8346cda633..eea8f1eee6b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1872,7 +1872,6 @@ PGTargetServerType
 PGTernaryBool
 PGTransactionStatusType
 PGVerbosity
-PG_Locale_Strategy
 PG_Lock_Status
 PG_init_t
 PGauthData
-- 
2.43.0

v16-0002-Remove-provider-field-from-pg_locale_t.patchtext/x-patch; charset=UTF-8; name=v16-0002-Remove-provider-field-from-pg_locale_t.patchDownload

From a95b771572d625b8bbcdf60955f11076b633267d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 12:51:27 -0700
Subject: [PATCH v16 2/4] Remove provider field from pg_locale_t.

The behavior of pg_locale_t is entirely specified by methods, so a
separate provider field is no longer necessary.
---
 src/backend/utils/adt/pg_locale_builtin.c |  1 -
 src/backend/utils/adt/pg_locale_icu.c     | 11 -----------
 src/backend/utils/adt/pg_locale_libc.c    |  6 ------
 src/include/utils/pg_locale.h             |  1 -
 4 files changed, 19 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 6fe090708ff..3fe41c880a6 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -241,7 +241,6 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
 	result->info.builtin.casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
-	result->provider = COLLPROVIDER_BUILTIN;
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 1f4ee2d1990..96741e08269 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -292,7 +292,6 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->info.icu.ucol = collator;
-	result->provider = COLLPROVIDER_ICU;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -569,8 +568,6 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 	int			result;
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
@@ -598,8 +595,6 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -644,8 +639,6 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
@@ -844,8 +837,6 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -894,8 +885,6 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
-	Assert(locale->provider == COLLPROVIDER_ICU);
-
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index be714db5283..e9f9fc1e369 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -713,7 +713,6 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->provider = COLLPROVIDER_LIBC;
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
@@ -833,8 +832,6 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -889,8 +886,6 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
-
 	if (srclen == -1)
 		return strxfrm_l(dest, src, destsize, locale->info.lt);
 
@@ -999,7 +994,6 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
-	Assert(locale->provider == COLLPROVIDER_LIBC);
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 0f497fa8ce2..44ff60a25b4 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -143,7 +143,6 @@ struct ctype_methods
  */
 struct pg_locale_struct
 {
-	char		provider;
 	bool		deterministic;
 	bool		collate_is_c;
 	bool		ctype_is_c;
-- 
2.43.0

v16-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchtext/x-patch; charset=UTF-8; name=v16-0003-Make-provider-data-in-pg_locale_t-an-opaque-poin.patchDownload

From 3c25c6dff7842ad0f5c2cdb880e3d745742ec3d7 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Oct 2024 13:36:44 -0700
Subject: [PATCH v16 3/4] Make provider data in pg_locale_t an opaque pointer.

---
 src/backend/utils/adt/pg_locale_builtin.c |  55 +++++--
 src/backend/utils/adt/pg_locale_icu.c     |  40 ++++--
 src/backend/utils/adt/pg_locale_libc.c    | 167 +++++++++++++++-------
 src/include/utils/pg_locale.h             |  17 +--
 4 files changed, 196 insertions(+), 83 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 3fe41c880a6..2d095527e96 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -26,6 +26,12 @@ extern pg_locale_t create_pg_locale_builtin(Oid collid,
 											MemoryContext context);
 extern char *get_collation_actual_version_builtin(const char *collcollate);
 
+struct builtin_provider
+{
+	const char *locale;
+	bool		casemap_full;
+};
+
 struct WordBoundaryState
 {
 	const char *str;
@@ -72,25 +78,30 @@ static size_t
 strlower_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strlower(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full);
+							builtin->casemap_full);
 }
 
 static size_t
 strtitle_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin = (struct builtin_provider *) locale->provider_data;
 	struct WordBoundaryState wbstate = {
 		.str = src,
 		.len = srclen,
 		.offset = 0,
-		.posix = !locale->info.builtin.casemap_full,
+		.posix = !builtin->casemap_full,
 		.init = false,
 		.prev_alnum = false,
 	};
 
 	return unicode_strtitle(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full,
+							builtin->casemap_full,
 							initcap_wbnext, &wbstate);
 }
 
@@ -98,22 +109,34 @@ static size_t
 strupper_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strupper(dest, destsize, src, srclen,
-							locale->info.builtin.casemap_full);
+							builtin->casemap_full);
 }
 
 static size_t
 strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				pg_locale_t locale)
 {
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
 	return unicode_strfold(dest, destsize, src, srclen,
-						   locale->info.builtin.casemap_full);
+						   builtin->casemap_full);
 }
 
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isdigit(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_isdigit(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -125,7 +148,11 @@ wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
 static bool
 wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isalnum(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_isalnum(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -155,7 +182,11 @@ wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
 static bool
 wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_ispunct(wc, !locale->info.builtin.casemap_full);
+	struct builtin_provider *builtin;
+
+	builtin = (struct builtin_provider *) locale->provider_data;
+
+	return pg_u_ispunct(wc, !builtin->casemap_full);
 }
 
 static bool
@@ -206,6 +237,7 @@ pg_locale_t
 create_pg_locale_builtin(Oid collid, MemoryContext context)
 {
 	const char *locstr;
+	struct builtin_provider *builtin;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -239,8 +271,11 @@ create_pg_locale_builtin(Oid collid, MemoryContext context)
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 
-	result->info.builtin.locale = MemoryContextStrdup(context, locstr);
-	result->info.builtin.casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
+	builtin = MemoryContextAllocZero(context, sizeof(struct builtin_provider));
+	builtin->locale = MemoryContextStrdup(context, locstr);
+	builtin->casemap_full = (strcmp(locstr, "PG_UNICODE_FAST") == 0);
+	result->provider_data = (void *) builtin;
+
 	result->deterministic = true;
 	result->collate_is_c = true;
 	result->ctype_is_c = (strcmp(locstr, "C") == 0);
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 96741e08269..497be95a869 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -51,6 +51,12 @@ extern pg_locale_t create_pg_locale_icu(Oid collid, MemoryContext context);
 
 #ifdef USE_ICU
 
+struct icu_provider
+{
+	const char *locale;
+	UCollator  *ucol;
+};
+
 extern UCollator *pg_ucol_open(const char *loc_str);
 
 static size_t strlower_icu(char *dest, size_t destsize, const char *src,
@@ -239,6 +245,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	bool		deterministic;
 	const char *iculocstr;
 	const char *icurules = NULL;
+	struct icu_provider *icu;
 	UCollator  *collator;
 	pg_locale_t result;
 
@@ -290,8 +297,12 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	collator = make_icu_collator(iculocstr, icurules);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
-	result->info.icu.locale = MemoryContextStrdup(context, iculocstr);
-	result->info.icu.ucol = collator;
+
+	icu = MemoryContextAllocZero(context, sizeof(struct icu_provider));
+	icu->locale = MemoryContextStrdup(context, iculocstr);
+	icu->ucol = collator;
+	result->provider_data = (void *) icu;
+
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -567,11 +578,12 @@ strncoll_icu_utf8(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2
 {
 	int			result;
 	UErrorCode	status;
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
 
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	status = U_ZERO_ERROR;
-	result = ucol_strcollUTF8(locale->info.icu.ucol,
+	result = ucol_strcollUTF8(icu->ucol,
 							  arg1, len1,
 							  arg2, len2,
 							  &status);
@@ -595,6 +607,8 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	init_icu_converter();
 
 	ulen = uchar_length(icu_converter, src, srclen);
@@ -608,7 +622,7 @@ strnxfrm_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	ulen = uchar_convert(icu_converter, uchar, ulen + 1, src, srclen);
 
-	result_bsize = ucol_getSortKey(locale->info.icu.ucol,
+	result_bsize = ucol_getSortKey(icu->ucol,
 								   uchar, ulen,
 								   (uint8_t *) dest, destsize);
 
@@ -639,12 +653,14 @@ strnxfrm_prefix_icu_utf8(char *dest, size_t destsize,
 	uint32_t	state[2];
 	UErrorCode	status;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	uiter_setUTF8(&iter, src, srclen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result = ucol_nextSortKeyPart(icu->ucol,
 								  &iter,
 								  state,
 								  (uint8_t *) dest,
@@ -751,11 +767,13 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 	UErrorCode	status;
 	int32_t		len_dest;
 
+	struct icu_provider *icu = (struct icu_provider *) mylocale->provider_data;
+
 	len_dest = len_source;		/* try first with same length */
 	*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 	status = U_ZERO_ERROR;
 	len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-					mylocale->info.icu.locale, &status);
+					icu->locale, &status);
 	if (status == U_BUFFER_OVERFLOW_ERROR)
 	{
 		/* try again with adjusted length */
@@ -763,7 +781,7 @@ icu_convert_case(ICU_Convert_Func func, pg_locale_t mylocale,
 		*buff_dest = palloc(len_dest * sizeof(**buff_dest));
 		status = U_ZERO_ERROR;
 		len_dest = func(*buff_dest, len_dest, buff_source, len_source,
-						mylocale->info.icu.locale, &status);
+						icu->locale, &status);
 	}
 	if (U_FAILURE(status))
 		ereport(ERROR,
@@ -837,6 +855,8 @@ strncoll_icu(const char *arg1, ssize_t len1,
 			   *uchar2;
 	int			result;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strncoll_icu_utf8 */
 #ifdef HAVE_UCOL_STRCOLLUTF8
 	Assert(GetDatabaseEncoding() != PG_UTF8);
@@ -859,7 +879,7 @@ strncoll_icu(const char *arg1, ssize_t len1,
 	ulen1 = uchar_convert(icu_converter, uchar1, ulen1 + 1, arg1, len1);
 	ulen2 = uchar_convert(icu_converter, uchar2, ulen2 + 1, arg2, len2);
 
-	result = ucol_strcoll(locale->info.icu.ucol,
+	result = ucol_strcoll(icu->ucol,
 						  uchar1, ulen1,
 						  uchar2, ulen2);
 
@@ -885,6 +905,8 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	size_t		uchar_bsize;
 	Size		result_bsize;
 
+	struct icu_provider *icu = (struct icu_provider *) locale->provider_data;
+
 	/* if encoding is UTF8, use more efficient strnxfrm_prefix_icu_utf8 */
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
@@ -904,7 +926,7 @@ strnxfrm_prefix_icu(char *dest, size_t destsize,
 	uiter_setString(&iter, uchar, ulen);
 	state[0] = state[1] = 0;	/* won't need that again */
 	status = U_ZERO_ERROR;
-	result_bsize = ucol_nextSortKeyPart(locale->info.icu.ucol,
+	result_bsize = ucol_nextSortKeyPart(icu->ucol,
 										&iter,
 										state,
 										(uint8_t *) dest,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index e9f9fc1e369..41fd657ace6 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1,3 +1,4 @@
+
 /*-----------------------------------------------------------------------
  *
  * PostgreSQL locale utilities for libc
@@ -81,6 +82,11 @@
  */
 #define		TEXTBUFLEN			1024
 
+struct libc_provider
+{
+	locale_t	lt;
+};
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -121,116 +127,154 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 static bool
 wc_isdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isdigit_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isdigit_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalpha_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isalpha_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isalnum_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isalnum_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isupper_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isupper_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return islower_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return islower_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isgraph_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isgraph_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isprint_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isprint_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return ispunct_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return ispunct_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
-	return isspace_l((unsigned char) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return isspace_l((unsigned char) wc, libc->lt);
 }
 
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswdigit_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswdigit_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalpha_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalpha_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswalpha_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isalnum_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswalnum_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswalnum_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswupper_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswupper_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_islower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswlower_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswlower_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isgraph_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswgraph_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswgraph_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isprint_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswprint_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswprint_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_ispunct_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswpunct_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswpunct_l((wint_t) wc, libc->lt);
 }
 
 static bool
 wc_isspace_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	return iswspace_l((wint_t) wc, locale->info.lt);
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
+	return iswspace_l((wint_t) wc, libc->lt);
 }
 
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->info.lt);
+	return tolower_l(ch, libc->lt);
 }
 
 static bool
@@ -238,22 +282,26 @@ char_is_cased_libc(char ch, pg_locale_t locale)
 {
 	bool		is_multibyte = pg_database_encoding_max_length() > 1;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (is_multibyte && IS_HIGHBIT_SET(ch))
 		return true;
 	else
-		return isalpha_l((unsigned char) ch, locale->info.lt);
+		return isalpha_l((unsigned char) ch, libc->lt);
 }
 
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	/* force C behavior for ASCII characters, per comments above */
 	if (locale->is_default && wc <= (pg_wchar) 127)
 		return pg_ascii_toupper((unsigned char) wc);
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return toupper_l((unsigned char) wc, locale->info.lt);
+		return toupper_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -261,13 +309,15 @@ toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	/* force C behavior for ASCII characters, per comments above */
 	if (locale->is_default && wc <= (pg_wchar) 127)
 		return pg_ascii_toupper((unsigned char) wc);
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towupper_l((wint_t) wc, locale->info.lt);
+		return towupper_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -275,13 +325,15 @@ toupper_libc_mb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() != PG_UTF8);
 
 	/* force C behavior for ASCII characters, per comments above */
 	if (locale->is_default && wc <= (pg_wchar) 127)
 		return pg_ascii_tolower((unsigned char) wc);
 	if (wc <= (pg_wchar) UCHAR_MAX)
-		return tolower_l((unsigned char) wc, locale->info.lt);
+		return tolower_l((unsigned char) wc, libc->lt);
 	else
 		return wc;
 }
@@ -289,13 +341,15 @@ tolower_libc_sb(pg_wchar wc, pg_locale_t locale)
 static pg_wchar
 tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	/* force C behavior for ASCII characters, per comments above */
 	if (locale->is_default && wc <= (pg_wchar) 127)
 		return pg_ascii_tolower((unsigned char) wc);
 	if (sizeof(wchar_t) >= 4 || wc <= (pg_wchar) 0xFFFF)
-		return towlower_l((wint_t) wc, locale->info.lt);
+		return towlower_l((wint_t) wc, libc->lt);
 	else
 		return wc;
 }
@@ -406,7 +460,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		if (srclen + 1 > destsize)
@@ -427,7 +481,7 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_tolower((unsigned char) *p);
 			else
-				*p = tolower_l((unsigned char) *p, loc);
+				*p = tolower_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -438,7 +492,8 @@ static size_t
 strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -460,7 +515,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+		workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -491,7 +546,7 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		int			wasalnum = false;
 		char	   *p;
 
@@ -517,11 +572,11 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			else
 			{
 				if (wasalnum)
-					*p = tolower_l((unsigned char) *p, loc);
+					*p = tolower_l((unsigned char) *p, libc->lt);
 				else
-					*p = toupper_l((unsigned char) *p, loc);
+					*p = toupper_l((unsigned char) *p, libc->lt);
 			}
-			wasalnum = isalnum_l((unsigned char) *p, loc);
+			wasalnum = isalnum_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -532,7 +587,8 @@ static size_t
 strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	int			wasalnum = false;
 	size_t		result_size;
 	wchar_t    *workspace;
@@ -557,10 +613,10 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
 		if (wasalnum)
-			workspace[curr_char] = towlower_l(workspace[curr_char], loc);
+			workspace[curr_char] = towlower_l(workspace[curr_char], libc->lt);
 		else
-			workspace[curr_char] = towupper_l(workspace[curr_char], loc);
-		wasalnum = iswalnum_l(workspace[curr_char], loc);
+			workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
+		wasalnum = iswalnum_l(workspace[curr_char], libc->lt);
 	}
 
 	/*
@@ -592,7 +648,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 
 	if (srclen + 1 <= destsize)
 	{
-		locale_t	loc = locale->info.lt;
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
 		char	   *p;
 
 		memcpy(dest, src, srclen);
@@ -610,7 +666,7 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 				*p = pg_toupper((unsigned char) *p);
 			else
-				*p = toupper_l((unsigned char) *p, loc);
+				*p = toupper_l((unsigned char) *p, libc->lt);
 		}
 	}
 
@@ -621,7 +677,8 @@ static size_t
 strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 				 pg_locale_t locale)
 {
-	locale_t	loc = locale->info.lt;
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	size_t		result_size;
 	wchar_t    *workspace;
 	char	   *result;
@@ -643,7 +700,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	char2wchar(workspace, srclen + 1, src, srclen, locale);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
-		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
+		workspace[curr_char] = towupper_l(workspace[curr_char], libc->lt);
 
 	/*
 	 * Make result large enough; case change might change number of bytes
@@ -671,6 +728,7 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 	const char *collate;
 	const char *ctype;
 	locale_t	loc;
+	struct libc_provider *libc;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -709,16 +767,19 @@ create_pg_locale_libc(Oid collid, MemoryContext context)
 		ReleaseSysCache(tp);
 	}
 
-
 	loc = make_libc_collator(collate, ctype);
 
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
+
+	libc = MemoryContextAllocZero(context, sizeof(struct libc_provider));
+	libc->lt = loc;
+	result->provider_data = (void *) libc;
+
 	result->deterministic = true;
 	result->collate_is_c = (strcmp(collate, "C") == 0) ||
 		(strcmp(collate, "POSIX") == 0);
 	result->ctype_is_c = (strcmp(ctype, "C") == 0) ||
 		(strcmp(ctype, "POSIX") == 0);
-	result->info.lt = loc;
 	if (!result->collate_is_c)
 	{
 #ifdef WIN32
@@ -832,6 +893,8 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 	const char *arg2n;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (bufsize1 + bufsize2 > TEXTBUFLEN)
 		buf = palloc(bufsize1 + bufsize2);
 
@@ -862,7 +925,7 @@ strncoll_libc(const char *arg1, ssize_t len1, const char *arg2, ssize_t len2,
 		arg2n = buf2;
 	}
 
-	result = strcoll_l(arg1n, arg2n, locale->info.lt);
+	result = strcoll_l(arg1n, arg2n, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -886,8 +949,10 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	size_t		bufsize = srclen + 1;
 	size_t		result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	if (srclen == -1)
-		return strxfrm_l(dest, src, destsize, locale->info.lt);
+		return strxfrm_l(dest, src, destsize, libc->lt);
 
 	if (bufsize > TEXTBUFLEN)
 		buf = palloc(bufsize);
@@ -896,7 +961,7 @@ strnxfrm_libc(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	memcpy(buf, src, srclen);
 	buf[srclen] = '\0';
 
-	result = strxfrm_l(dest, buf, destsize, locale->info.lt);
+	result = strxfrm_l(dest, buf, destsize, libc->lt);
 
 	if (buf != sbuf)
 		pfree(buf);
@@ -994,6 +1059,8 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	int			r;
 	int			result;
 
+	struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 	Assert(GetDatabaseEncoding() == PG_UTF8);
 
 	if (len1 == -1)
@@ -1038,7 +1105,7 @@ strncoll_libc_win32_utf8(const char *arg1, ssize_t len1, const char *arg2,
 	((LPWSTR) a2p)[r] = 0;
 
 	errno = 0;
-	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
+	result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, libc->lt);
 	if (result == 2147483647)	/* _NLSCMPERROR; missing from mingw headers */
 		ereport(ERROR,
 				(errmsg("could not compare Unicode strings: %m")));
@@ -1167,8 +1234,10 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 	{
+		struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, libc->lt);
 	}
 
 	return result;
@@ -1227,8 +1296,10 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		}
 		else
 		{
+			struct libc_provider *libc = (struct libc_provider *) locale->provider_data;
+
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, libc->lt);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 44ff60a25b4..d258ca756ab 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -151,22 +151,7 @@ struct pg_locale_struct
 	const struct collate_methods *collate;	/* NULL if collate_is_c */
 	const struct ctype_methods *ctype;	/* NULL if ctype_is_c */
 
-	union
-	{
-		struct
-		{
-			const char *locale;
-			bool		casemap_full;
-		}			builtin;
-		locale_t	lt;
-#ifdef USE_ICU
-		struct
-		{
-			const char *locale;
-			UCollator  *ucol;
-		}			icu;
-#endif
-	}			info;
+	void	   *provider_data;
 };
 
 extern void init_database_collation(void);
-- 
2.43.0

v16-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v16-0004-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From cb441061f447d3ac2faa8cdb2f768ca42c5d8f29 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v16 4/4] Don't include ICU headers in pg_locale.h.

---
 src/backend/commands/collationcmds.c  | 4 ++++
 src/backend/utils/adt/formatting.c    | 4 ----
 src/backend/utils/adt/pg_locale.c     | 4 ++++
 src/backend/utils/adt/pg_locale_icu.c | 1 +
 src/backend/utils/adt/varlena.c       | 4 ++++
 src/include/utils/pg_locale.h         | 4 ----
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8acbfbbeda0..a57fe93c387 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 5bd1e01f7e4..b3d5e0436ee 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -70,10 +70,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/int.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 451ac4e2d9b..49bb7fc0104 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -33,6 +33,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 497be95a869..d37a76a9af1 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 3e4d5568bde..6fbc984c534 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index d258ca756ab..1ff07da4cb3 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,10 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-#include <unicode/ucol.h>
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
-- 
2.43.0

#19

Peter Eisentraut

peter@eisentraut.org

7 months ago

In reply to: Jeff Davis (#18)

Re: Collation & ctype method table, and extension hooks

On 12.06.25 07:49, Jeff Davis wrote:

On Fri, 2025-02-07 at 11:19 -0800, Jeff Davis wrote:

Attached v15. Just a rebase.

Attached v16.

* commit this on the grounds that it's a desirable code improvement
and
the worst-case regression isn't a major concern; or

I plan to commit this soon after branching. There's a general consensus
that enabling multi-lib provider support is a good idea, and turning
the provider behavior into method tables is a prerequisite for that. I
doubt the performance issue will be a serious concern and I don't see a
good way to avoid it.

Patch 0001 and 0002 seem okay to me.

I wish we could take this further and also run the "ctype is c" case
through the method table. Right now, there are still a bunch of
open-coded special cases all over the place, which could be unified. I
guess this isn't any worse than before, but maybe this could be a future
project?

Patch 0003 I don't understand. It replaces type safety by no type
safety, and it doesn't have any explanation or comments. I suppose you
have further plans in this direction, but until we have seen those and
have more clarification and explanation, I would hold this back.

Patch 0004 seems ok. But maybe you could explain this better in the
commit message, like remove includes from pg_locale.h but instead put
them in the .c files as needed, and explain why this is possible or
suitable now.

#20

Jeff Davis

pgsql@j-davis.com

7 months ago

In reply to: Peter Eisentraut (#19)

Re: Collation & ctype method table, and extension hooks

On Sun, 2025-06-29 at 12:43 +0200, Peter Eisentraut wrote:

I wish we could take this further and also run the "ctype is c" case
through the method table. Right now, there are still a bunch of
open-coded special cases all over the place, which could be unified.
I
guess this isn't any worse than before, but maybe this could be a
future
project?

+1. A few things need to be sorted out, but I don't see any major
problem with that.

Patch 0003 I don't understand. It replaces type safety by no type
safety, and it doesn't have any explanation or comments. I suppose
you
have further plans in this direction, but until we have seen those
and
have more clarification and explanation, I would hold this back.

Part of it is simply #include cleanliness, because we can't do v16-0004
if we have the provider-specific details in the union. I don't really
like the idea of including ICU headers (indirectly) so many places.
Another part is that I'd like to abstract the providers more completely
-- I've alluded to that a few times but I haven't made an independent
proposal for that yet. Also, the union doesn't offer a lot of type
safety, so I don't see it as a big loss.

But it's not critical right now either, so I won't push for it.

Patch 0004 seems ok. But maybe you could explain this better in the
commit message, like remove includes from pg_locale.h but instead put
them in the .c files as needed, and explain why this is possible or
suitable now.

It goes with v16-0003, so I will hold this back for now as well.

Regards,
Jeff Davis

#21

Peter Eisentraut

peter@eisentraut.org

about 1 month ago

In reply to: Jeff Davis (#20)

Re: Collation & ctype method table, and extension hooks

This thread was still open in the commitfest and showed up on my dashboard.

My understanding is that v16-0001 and v16-0002 have been committed, and
per the discussion below, the remaining patches v16-0003 and v16-0004
have been withdrawn for now. So I will close this commitfest entry.

Show quoted text

On 30.06.25 21:21, Jeff Davis wrote:

On Sun, 2025-06-29 at 12:43 +0200, Peter Eisentraut wrote:

I wish we could take this further and also run the "ctype is c" case
through the method table. Right now, there are still a bunch of
open-coded special cases all over the place, which could be unified.
I
guess this isn't any worse than before, but maybe this could be a
future
project?

+1. A few things need to be sorted out, but I don't see any major
problem with that.

Patch 0003 I don't understand. It replaces type safety by no type
safety, and it doesn't have any explanation or comments. I suppose
you
have further plans in this direction, but until we have seen those
and
have more clarification and explanation, I would hold this back.

Part of it is simply #include cleanliness, because we can't do v16-0004
if we have the provider-specific details in the union. I don't really
like the idea of including ICU headers (indirectly) so many places.
Another part is that I'd like to abstract the providers more completely
-- I've alluded to that a few times but I haven't made an independent
proposal for that yet. Also, the union doesn't offer a lot of type
safety, so I don't see it as a big loss.

But it's not critical right now either, so I won't push for it.

Patch 0004 seems ok. But maybe you could explain this better in the
commit message, like remove includes from pg_locale.h but instead put
them in the .c files as needed, and explain why this is possible or
suitable now.

It goes with v16-0003, so I will hold this back for now as well.

Regards,
Jeff Davis