Allow tailoring of ICU locales with custom rules

Started by Peter Eisentrautabout 3 years ago18 messages

peter.eisentraut@enterprisedb.com

about 3 years ago

1 attachment(s)

This patch exposes the ICU facility to add custom collation rules to a
standard collation. This would allow users to customize any ICU
collation to whatever they want. A very simple example from the
documentation/tests:

CREATE COLLATION en_custom
(provider = icu, locale = 'en', rules = '&a < g');

This places "g" after "a" before "b". Details about the syntax can be
found at
<https://unicode-org.github.io/icu/userguide/collation/customization/>.

The code is pretty straightforward. It mainly just records these rules
in the catalog and feeds them to ICU when creating the collator object.

Attachments:

0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchtext/plain; charset=UTF-8; name=0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchDownload

From b0d42407a60e116d3ccb0ed04505aa362f8a6a1d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.
---
 doc/src/sgml/catalogs.sgml                    | 18 +++++++
 doc/src/sgml/ref/create_collation.sgml        | 22 +++++++++
 doc/src/sgml/ref/create_database.sgml         | 12 +++++
 src/backend/catalog/pg_collation.c            |  5 ++
 src/backend/commands/collationcmds.c          | 21 ++++++--
 src/backend/commands/dbcommands.c             | 49 +++++++++++++++++--
 src/backend/utils/adt/pg_locale.c             | 41 +++++++++++++++-
 src/backend/utils/init/postinit.c             | 11 ++++-
 src/include/catalog/pg_collation.h            |  2 +
 src/include/catalog/pg_database.h             |  3 ++
 src/include/utils/pg_locale.h                 |  1 +
 .../regress/expected/collate.icu.utf8.out     | 30 ++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql     | 13 +++++
 13 files changed, 219 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9316b811ac..afa9f28ef9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
@@ -3106,6 +3115,15 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
+    [ RULES = <replaceable>rules</replaceable>, ]
     [ VERSION = <replaceable>version</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
@@ -149,6 +150,19 @@ <title>Parameters</title>
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><replaceable>rules</replaceable></term>
+
+     <listitem>
+      <para>
+       Specifies additional collation rules to customize the behavior of the
+       collation.  This is supported for ICU only.  See <ulink
+       url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+       for details on the syntax.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>version</replaceable></term>
 
@@ -228,6 +242,14 @@ <title>Examples</title>
 </programlisting>
   </para>
 
+  <para>
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+<programlisting>
+<![CDATA[CREATE COLLATION en_custom (provider = icu, locale = 'en', rules = '&a < g');]]>
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index ea38c64731..aa6f121a81 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><replaceable class="parameter">icu_rules</replaceable></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        collation.  This is supported for ICU only.  See <ulink
+        url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+        for details on the syntax.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><replaceable>locale_provider</replaceable></term>
 
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index aa8fbe1a98..7ed9de3891 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *colliculocale,
+				const char *collicurules,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
 	else
 		nulls[Anum_pg_collation_colliculocale - 1] = true;
+	if (collicurules)
+		values[Anum_pg_collation_collicurules - 1] = CStringGetTextDatum(collicurules);
+	else
+		nulls[Anum_pg_collation_collicurules - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 81e54e0ce6..50f16f2764 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,10 +64,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
+	char	   *collicurules;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -99,6 +101,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "rules") == 0)
+			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -161,6 +165,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			colliculocale = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			collicurules = TextDatumGetCString(datum);
+		else
+			collicurules = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -182,6 +192,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collcollate = NULL;
 		collctype = NULL;
 		colliculocale = NULL;
+		collicurules = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -191,6 +202,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collisdeterministic = true;
 
+		if (rulesEl)
+			collicurules = defGetString(rulesEl);
+
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
@@ -297,6 +311,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collcollate,
 							 collctype,
 							 colliculocale,
+							 collicurules,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -710,7 +725,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			 */
 			collid = CollationCreate(localebuf, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 localebuf, localebuf, NULL,
+									 localebuf, localebuf, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 									 true, true);
 			if (OidIsValid(collid))
@@ -777,7 +792,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale, NULL,
+									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -839,7 +854,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 NULL, NULL, iculocstr,
+									 NULL, NULL, iculocstr, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6eb8742718..65d6ad46ef 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -119,6 +119,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char **dbIcurules,
 						char *dbLocProvider,
 						char **dbCollversion);
 static bool have_createdb_privilege(void);
@@ -676,6 +677,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
 	char	   *src_iculocale = NULL;
+	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
@@ -699,6 +701,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
 	DefElem    *diculocale = NULL;
+	DefElem    *dicurules = NULL;
 	DefElem    *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
+	char	   *dbicurules = NULL;
 	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
@@ -776,6 +780,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			diculocale = defel;
 		}
+		else if (strcmp(defel->defname, "icu_rules") == 0)
+		{
+			if (dicurules)
+				errorConflictingDefElem(defel, pstate);
+			dicurules = defel;
+		}
 		else if (strcmp(defel->defname, "locale_provider") == 0)
 		{
 			if (dlocprovider)
@@ -959,7 +969,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
 					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -1098,6 +1110,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		if (dblocprovider == COLLPROVIDER_ICU)
 		{
+			char	   *val1;
+			char	   *val2;
+
 			Assert(dbiculocale);
 			Assert(src_iculocale);
 			if (strcmp(dbiculocale, src_iculocale) != 0)
@@ -1106,6 +1121,19 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
 								dbiculocale, src_iculocale),
 						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+
+			val1 = dbicurules;
+			if (!val1)
+				val1 = "";
+			val2 = src_icurules;
+			if (!val2)
+				val2 = "";
+			if (strcmp(val1, val2) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
+								val1, val2),
+						 errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
 		}
 	}
 
@@ -1314,6 +1342,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
 	else
 		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
+	if (dbicurules)
+		new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
+	else
+		new_record_nulls[Anum_pg_database_daticurules - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -1527,7 +1559,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1727,7 +1759,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1837,7 +1869,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2600,6 +2632,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char **dbIcurules,
 			char *dbLocProvider,
 			char **dbCollversion)
 {
@@ -2716,6 +2749,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbIculocale = TextDatumGetCString(datum);
 				}
+				if (dbIcurules)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
+					if (isnull)
+						*dbIcurules = NULL;
+					else
+						*dbIcurules = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..191d2e8a68 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,6 +69,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -1402,6 +1403,7 @@ struct pg_locale_struct default_locale;
 
 void
 make_icu_collator(const char *iculocstr,
+				  const char *icurules,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
@@ -1418,6 +1420,35 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	/*
+	 * If rules are specified, we extract the rules of the standard collation,
+	 * add our own rules, and make a new collator with the combined rules.
+	 */
+	if (icurules)
+	{
+		const UChar *default_rules;
+		UChar	   *agg_rules;
+		UChar	   *my_rules;
+		int32_t		length;
+
+		default_rules = ucol_getRules(collator, &length);
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
+		u_strcpy(agg_rules, default_rules);
+		u_strcat(agg_rules, my_rules);
+
+		ucol_close(collator);
+
+		status = U_ZERO_ERROR;
+		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
+								  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+	}
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1580,11 +1611,19 @@ pg_newlocale_from_collation(Oid collid)
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
 			const char *iculocstr;
+			const char *icurules;
 
 			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
 			Assert(!isnull);
 			iculocstr = TextDatumGetCString(datum);
-			make_icu_collator(iculocstr, &result);
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+			if (!isnull)
+				icurules = TextDatumGetCString(datum);
+			else
+				icurules = NULL;
+
+			make_icu_collator(iculocstr, icurules, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index a990c833c5..55a035c062 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -421,10 +421,19 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
+		char	   *icurules;
+
 		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
 		Assert(!isnull);
 		iculocale = TextDatumGetCString(datum);
-		make_icu_collator(iculocale, &default_locale);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(iculocale, icurules, &default_locale);
 	}
 	else
 		iculocale = NULL;
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index 2190ccb5b8..ad2d767d65 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -43,6 +43,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
 	text		collctype BKI_DEFAULT(_null_);	/* LC_CTYPE setting */
 	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
+	text		collicurules BKI_DEFAULT(_null_);	/* ICU collation rules */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -91,6 +92,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *colliculocale,
+							const char *collicurules,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 611c95656a..48bc285863 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -71,6 +71,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* ICU locale ID */
 	text		daticulocale;
 
+	/* ICU collation rules */
+	text		daticurules BKI_DEFAULT(_null_);
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a875942123..989fe26cef 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -95,6 +95,7 @@ typedef struct pg_locale_struct *pg_locale_t;
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
 
 extern void make_icu_collator(const char *iculocstr,
+							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d4c8c6de38..ce9ed3c8bf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1190,6 +1190,36 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
  t        | t
 (1 row)
 
+-- rules
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+     a     
+-----------
+ Abernathy
+ apple
+ bird
+ Boston
+ Graham
+ green
+(6 rows)
+
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+     a     
+-----------
+ Abernathy
+ apple
+ green
+ bird
+ Boston
+ Graham
+(6 rows)
+
+DROP TABLE test7;
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+ERROR:  could not open collator for locale "" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b0ddc7db44..aa95c1ec42 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -472,6 +472,19 @@ CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=p
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
 
 
+-- rules
+
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+DROP TABLE test7;
+
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+
+
 -- nondeterministic collations
 
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-- 
2.38.1

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 3 years ago

In reply to: Peter Eisentraut (#1)

1 attachment(s)

Re: Allow tailoring of ICU locales with custom rules

Patch needed a rebase; no functionality changes.

Show quoted text

On 14.12.22 10:26, Peter Eisentraut wrote:

This patch exposes the ICU facility to add custom collation rules to a
standard collation. This would allow users to customize any ICU
collation to whatever they want. A very simple example from the
documentation/tests:

CREATE COLLATION en_custom
(provider = icu, locale = 'en', rules = '&a < g');

This places "g" after "a" before "b". Details about the syntax can be
found at
<https://unicode-org.github.io/icu/userguide/collation/customization/>.

The code is pretty straightforward. It mainly just records these rules
in the catalog and feeds them to ICU when creating the collator object.

Attachments:

v2-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchtext/plain; charset=UTF-8; name=v2-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchDownload

From ae729e2d5d37a382b7aaaa13ea95897ccb35d0c1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH v2] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml                    | 18 +++++++
 doc/src/sgml/ref/create_collation.sgml        | 22 +++++++++
 doc/src/sgml/ref/create_database.sgml         | 12 +++++
 src/backend/catalog/pg_collation.c            |  5 ++
 src/backend/commands/collationcmds.c          | 23 +++++++--
 src/backend/commands/dbcommands.c             | 49 +++++++++++++++++--
 src/backend/utils/adt/pg_locale.c             | 41 +++++++++++++++-
 src/backend/utils/init/postinit.c             | 11 ++++-
 src/include/catalog/pg_collation.h            |  2 +
 src/include/catalog/pg_database.h             |  3 ++
 src/include/utils/pg_locale.h                 |  1 +
 .../regress/expected/collate.icu.utf8.out     | 30 ++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql     | 13 +++++
 13 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9316b811ac..afa9f28ef9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
@@ -3106,6 +3115,15 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
+    [ RULES = <replaceable>rules</replaceable>, ]
     [ VERSION = <replaceable>version</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
@@ -149,6 +150,19 @@ <title>Parameters</title>
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><replaceable>rules</replaceable></term>
+
+     <listitem>
+      <para>
+       Specifies additional collation rules to customize the behavior of the
+       collation.  This is supported for ICU only.  See <ulink
+       url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+       for details on the syntax.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>version</replaceable></term>
 
@@ -228,6 +242,14 @@ <title>Examples</title>
 </programlisting>
   </para>
 
+  <para>
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+<programlisting>
+<![CDATA[CREATE COLLATION en_custom (provider = icu, locale = 'en', rules = '&a < g');]]>
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index ea38c64731..aa6f121a81 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><replaceable class="parameter">icu_rules</replaceable></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        collation.  This is supported for ICU only.  See <ulink
+        url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+        for details on the syntax.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><replaceable>locale_provider</replaceable></term>
 
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *colliculocale,
+				const char *collicurules,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
 	else
 		nulls[Anum_pg_collation_colliculocale - 1] = true;
+	if (collicurules)
+		values[Anum_pg_collation_collicurules - 1] = CStringGetTextDatum(collicurules);
+	else
+		nulls[Anum_pg_collation_collicurules - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 6a4311cc63..04e259b0ba 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,10 +64,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
+	char	   *collicurules;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -99,6 +101,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "rules") == 0)
+			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -161,6 +165,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			colliculocale = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			collicurules = TextDatumGetCString(datum);
+		else
+			collicurules = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -182,6 +192,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collcollate = NULL;
 		collctype = NULL;
 		colliculocale = NULL;
+		collicurules = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -191,6 +202,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collisdeterministic = true;
 
+		if (rulesEl)
+			collicurules = defGetString(rulesEl);
+
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
@@ -297,6 +311,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collcollate,
 							 collctype,
 							 colliculocale,
+							 collicurules,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -680,7 +695,7 @@ create_collation_from_locale(const char *locale, int nspid,
 	 */
 	collid = CollationCreate(locale, nspid, GetUserId(),
 							 COLLPROVIDER_LIBC, true, enc,
-							 locale, locale, NULL,
+							 locale, locale, NULL, NULL,
 							 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 							 true, true);
 	if (OidIsValid(collid))
@@ -755,7 +770,7 @@ win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
 
 		collid = CollationCreate(alias, param->nspid, GetUserId(),
 								 COLLPROVIDER_LIBC, true, enc,
-								 localebuf, localebuf, NULL,
+								 localebuf, localebuf, NULL, NULL,
 								 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 								 true, true);
 		if (OidIsValid(collid))
@@ -889,7 +904,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale, NULL,
+									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -951,7 +966,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 NULL, NULL, iculocstr,
+									 NULL, NULL, iculocstr, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 518ffca09a..36ce7d8f83 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -119,6 +119,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char **dbIcurules,
 						char *dbLocProvider,
 						char **dbCollversion);
 static bool have_createdb_privilege(void);
@@ -676,6 +677,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
 	char	   *src_iculocale = NULL;
+	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
@@ -699,6 +701,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
 	DefElem    *diculocale = NULL;
+	DefElem    *dicurules = NULL;
 	DefElem    *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
+	char	   *dbicurules = NULL;
 	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
@@ -776,6 +780,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			diculocale = defel;
 		}
+		else if (strcmp(defel->defname, "icu_rules") == 0)
+		{
+			if (dicurules)
+				errorConflictingDefElem(defel, pstate);
+			dicurules = defel;
+		}
 		else if (strcmp(defel->defname, "locale_provider") == 0)
 		{
 			if (dlocprovider)
@@ -959,7 +969,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
 					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -1098,6 +1110,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		if (dblocprovider == COLLPROVIDER_ICU)
 		{
+			char	   *val1;
+			char	   *val2;
+
 			Assert(dbiculocale);
 			Assert(src_iculocale);
 			if (strcmp(dbiculocale, src_iculocale) != 0)
@@ -1106,6 +1121,19 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
 								dbiculocale, src_iculocale),
 						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+
+			val1 = dbicurules;
+			if (!val1)
+				val1 = "";
+			val2 = src_icurules;
+			if (!val2)
+				val2 = "";
+			if (strcmp(val1, val2) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
+								val1, val2),
+						 errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
 		}
 	}
 
@@ -1314,6 +1342,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
 	else
 		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
+	if (dbicurules)
+		new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
+	else
+		new_record_nulls[Anum_pg_database_daticurules - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -1527,7 +1559,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1727,7 +1759,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1837,7 +1869,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2600,6 +2632,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char **dbIcurules,
 			char *dbLocProvider,
 			char **dbCollversion)
 {
@@ -2716,6 +2749,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbIculocale = TextDatumGetCString(datum);
 				}
+				if (dbIcurules)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
+					if (isnull)
+						*dbIcurules = NULL;
+					else
+						*dbIcurules = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 059e4fd79f..da514b9396 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,6 +69,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -1402,6 +1403,7 @@ struct pg_locale_struct default_locale;
 
 void
 make_icu_collator(const char *iculocstr,
+				  const char *icurules,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
@@ -1418,6 +1420,35 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	/*
+	 * If rules are specified, we extract the rules of the standard collation,
+	 * add our own rules, and make a new collator with the combined rules.
+	 */
+	if (icurules)
+	{
+		const UChar *default_rules;
+		UChar	   *agg_rules;
+		UChar	   *my_rules;
+		int32_t		length;
+
+		default_rules = ucol_getRules(collator, &length);
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
+		u_strcpy(agg_rules, default_rules);
+		u_strcat(agg_rules, my_rules);
+
+		ucol_close(collator);
+
+		status = U_ZERO_ERROR;
+		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
+								  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+	}
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1580,11 +1611,19 @@ pg_newlocale_from_collation(Oid collid)
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
 			const char *iculocstr;
+			const char *icurules;
 
 			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
 			Assert(!isnull);
 			iculocstr = TextDatumGetCString(datum);
-			make_icu_collator(iculocstr, &result);
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+			if (!isnull)
+				icurules = TextDatumGetCString(datum);
+			else
+				icurules = NULL;
+
+			make_icu_collator(iculocstr, icurules, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ae5a85ed65..163119ee48 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -421,10 +421,19 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
+		char	   *icurules;
+
 		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
 		Assert(!isnull);
 		iculocale = TextDatumGetCString(datum);
-		make_icu_collator(iculocale, &default_locale);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(iculocale, icurules, &default_locale);
 	}
 	else
 		iculocale = NULL;
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index f9d5e88faf..bfa3568451 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -43,6 +43,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
 	text		collctype BKI_DEFAULT(_null_);	/* LC_CTYPE setting */
 	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
+	text		collicurules BKI_DEFAULT(_null_);	/* ICU collation rules */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -91,6 +92,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *colliculocale,
+							const char *collicurules,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 3da3492e92..a5c4efe086 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -71,6 +71,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* ICU locale ID */
 	text		daticulocale;
 
+	/* ICU collation rules */
+	text		daticurules BKI_DEFAULT(_null_);
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cede43440b..bcadefbf08 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -95,6 +95,7 @@ typedef struct pg_locale_struct *pg_locale_t;
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
 
 extern void make_icu_collator(const char *iculocstr,
+							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d4c8c6de38..ce9ed3c8bf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1190,6 +1190,36 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
  t        | t
 (1 row)
 
+-- rules
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+     a     
+-----------
+ Abernathy
+ apple
+ bird
+ Boston
+ Graham
+ green
+(6 rows)
+
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+     a     
+-----------
+ Abernathy
+ apple
+ green
+ bird
+ Boston
+ Graham
+(6 rows)
+
+DROP TABLE test7;
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+ERROR:  could not open collator for locale "" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b0ddc7db44..aa95c1ec42 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -472,6 +472,19 @@ CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=p
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
 
 
+-- rules
+
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+DROP TABLE test7;
+
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+
+
 -- nondeterministic collations
 
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);

base-commit: b82557ecc2ebbf649142740a1c5ce8d19089f620
-- 
2.39.0

vignesh C

vignesh21@gmail.com

about 3 years ago

In reply to: Peter Eisentraut (#2)

Re: Allow tailoring of ICU locales with custom rules

On Thu, 5 Jan 2023 at 20:45, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Patch needed a rebase; no functionality changes.

The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_4075.log, please post a rebased patch:

=== Applying patches on top of PostgreSQL commit ID
d952373a987bad331c0e499463159dd142ced1ef ===
=== applying patch
./v2-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patch
patching file doc/src/sgml/catalogs.sgml
patching file doc/src/sgml/ref/create_collation.sgml
patching file doc/src/sgml/ref/create_database.sgml
Hunk #1 FAILED at 192.
1 out of 1 hunk FAILED -- saving rejects to file
doc/src/sgml/ref/create_database.sgml.rej

[1]: http://cfbot.cputube.org/patch_41_4075.log

Regards,
Vignesh

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: vignesh C (#3)

1 attachment(s)

Re: Allow tailoring of ICU locales with custom rules

On 11.01.23 03:50, vignesh C wrote:

On Thu, 5 Jan 2023 at 20:45, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

Patch needed a rebase; no functionality changes.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

Updated patch attached.

Attachments:

v3-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchtext/plain; charset=UTF-8; name=v3-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchDownload

From 8744abe8e56e25b8d76d1201c4fa40af273a09de Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 14 Dec 2022 10:15:03 +0100
Subject: [PATCH v3] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml                    | 18 +++++++
 doc/src/sgml/ref/create_collation.sgml        | 22 +++++++++
 doc/src/sgml/ref/create_database.sgml         | 12 +++++
 src/backend/catalog/pg_collation.c            |  5 ++
 src/backend/commands/collationcmds.c          | 23 +++++++--
 src/backend/commands/dbcommands.c             | 49 +++++++++++++++++--
 src/backend/utils/adt/pg_locale.c             | 41 +++++++++++++++-
 src/backend/utils/init/postinit.c             | 11 ++++-
 src/include/catalog/pg_collation.h            |  2 +
 src/include/catalog/pg_database.h             |  3 ++
 src/include/utils/pg_locale.h                 |  1 +
 .../regress/expected/collate.icu.utf8.out     | 30 ++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql     | 13 +++++
 13 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
@@ -3106,6 +3115,15 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
+    [ RULES = <replaceable>rules</replaceable>, ]
     [ VERSION = <replaceable>version</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
@@ -149,6 +150,19 @@ <title>Parameters</title>
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><replaceable>rules</replaceable></term>
+
+     <listitem>
+      <para>
+       Specifies additional collation rules to customize the behavior of the
+       collation.  This is supported for ICU only.  See <ulink
+       url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+       for details on the syntax.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>version</replaceable></term>
 
@@ -228,6 +242,14 @@ <title>Examples</title>
 </programlisting>
   </para>
 
+  <para>
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+<programlisting>
+<![CDATA[CREATE COLLATION en_custom (provider = icu, locale = 'en', rules = '&a < g');]]>
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 2f034e2859..d6cc5646fa 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="create-database-icu-rules">
+      <term><replaceable class="parameter">icu_rules</replaceable></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        collation.  This is supported for ICU only.  See <ulink
+        url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+        for details on the syntax.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="create-database-locale-provider">
       <term><replaceable>locale_provider</replaceable></term>
 
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *colliculocale,
+				const char *collicurules,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
 	else
 		nulls[Anum_pg_collation_colliculocale - 1] = true;
+	if (collicurules)
+		values[Anum_pg_collation_collicurules - 1] = CStringGetTextDatum(collicurules);
+	else
+		nulls[Anum_pg_collation_collicurules - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 6a4311cc63..04e259b0ba 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,10 +64,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
+	char	   *collicurules;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -99,6 +101,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "rules") == 0)
+			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -161,6 +165,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			colliculocale = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			collicurules = TextDatumGetCString(datum);
+		else
+			collicurules = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -182,6 +192,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collcollate = NULL;
 		collctype = NULL;
 		colliculocale = NULL;
+		collicurules = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -191,6 +202,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collisdeterministic = true;
 
+		if (rulesEl)
+			collicurules = defGetString(rulesEl);
+
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
@@ -297,6 +311,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collcollate,
 							 collctype,
 							 colliculocale,
+							 collicurules,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -680,7 +695,7 @@ create_collation_from_locale(const char *locale, int nspid,
 	 */
 	collid = CollationCreate(locale, nspid, GetUserId(),
 							 COLLPROVIDER_LIBC, true, enc,
-							 locale, locale, NULL,
+							 locale, locale, NULL, NULL,
 							 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 							 true, true);
 	if (OidIsValid(collid))
@@ -755,7 +770,7 @@ win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
 
 		collid = CollationCreate(alias, param->nspid, GetUserId(),
 								 COLLPROVIDER_LIBC, true, enc,
-								 localebuf, localebuf, NULL,
+								 localebuf, localebuf, NULL, NULL,
 								 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 								 true, true);
 		if (OidIsValid(collid))
@@ -889,7 +904,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale, NULL,
+									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -951,7 +966,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 NULL, NULL, iculocstr,
+									 NULL, NULL, iculocstr, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 518ffca09a..36ce7d8f83 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -119,6 +119,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char **dbIcurules,
 						char *dbLocProvider,
 						char **dbCollversion);
 static bool have_createdb_privilege(void);
@@ -676,6 +677,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
 	char	   *src_iculocale = NULL;
+	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
@@ -699,6 +701,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
 	DefElem    *diculocale = NULL;
+	DefElem    *dicurules = NULL;
 	DefElem    *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
+	char	   *dbicurules = NULL;
 	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
@@ -776,6 +780,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			diculocale = defel;
 		}
+		else if (strcmp(defel->defname, "icu_rules") == 0)
+		{
+			if (dicurules)
+				errorConflictingDefElem(defel, pstate);
+			dicurules = defel;
+		}
 		else if (strcmp(defel->defname, "locale_provider") == 0)
 		{
 			if (dlocprovider)
@@ -959,7 +969,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
 					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -1098,6 +1110,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		if (dblocprovider == COLLPROVIDER_ICU)
 		{
+			char	   *val1;
+			char	   *val2;
+
 			Assert(dbiculocale);
 			Assert(src_iculocale);
 			if (strcmp(dbiculocale, src_iculocale) != 0)
@@ -1106,6 +1121,19 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
 								dbiculocale, src_iculocale),
 						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+
+			val1 = dbicurules;
+			if (!val1)
+				val1 = "";
+			val2 = src_icurules;
+			if (!val2)
+				val2 = "";
+			if (strcmp(val1, val2) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
+								val1, val2),
+						 errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
 		}
 	}
 
@@ -1314,6 +1342,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
 	else
 		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
+	if (dbicurules)
+		new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
+	else
+		new_record_nulls[Anum_pg_database_daticurules - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -1527,7 +1559,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1727,7 +1759,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1837,7 +1869,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2600,6 +2632,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char **dbIcurules,
 			char *dbLocProvider,
 			char **dbCollversion)
 {
@@ -2716,6 +2749,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbIculocale = TextDatumGetCString(datum);
 				}
+				if (dbIcurules)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
+					if (isnull)
+						*dbIcurules = NULL;
+					else
+						*dbIcurules = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 059e4fd79f..da514b9396 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,6 +69,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -1402,6 +1403,7 @@ struct pg_locale_struct default_locale;
 
 void
 make_icu_collator(const char *iculocstr,
+				  const char *icurules,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
@@ -1418,6 +1420,35 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	/*
+	 * If rules are specified, we extract the rules of the standard collation,
+	 * add our own rules, and make a new collator with the combined rules.
+	 */
+	if (icurules)
+	{
+		const UChar *default_rules;
+		UChar	   *agg_rules;
+		UChar	   *my_rules;
+		int32_t		length;
+
+		default_rules = ucol_getRules(collator, &length);
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
+		u_strcpy(agg_rules, default_rules);
+		u_strcat(agg_rules, my_rules);
+
+		ucol_close(collator);
+
+		status = U_ZERO_ERROR;
+		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
+								  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+	}
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1580,11 +1611,19 @@ pg_newlocale_from_collation(Oid collid)
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
 			const char *iculocstr;
+			const char *icurules;
 
 			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
 			Assert(!isnull);
 			iculocstr = TextDatumGetCString(datum);
-			make_icu_collator(iculocstr, &result);
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+			if (!isnull)
+				icurules = TextDatumGetCString(datum);
+			else
+				icurules = NULL;
+
+			make_icu_collator(iculocstr, icurules, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index ae5a85ed65..163119ee48 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -421,10 +421,19 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
+		char	   *icurules;
+
 		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
 		Assert(!isnull);
 		iculocale = TextDatumGetCString(datum);
-		make_icu_collator(iculocale, &default_locale);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(iculocale, icurules, &default_locale);
 	}
 	else
 		iculocale = NULL;
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index f9d5e88faf..bfa3568451 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -43,6 +43,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
 	text		collctype BKI_DEFAULT(_null_);	/* LC_CTYPE setting */
 	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
+	text		collicurules BKI_DEFAULT(_null_);	/* ICU collation rules */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -91,6 +92,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *colliculocale,
+							const char *collicurules,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 3da3492e92..a5c4efe086 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -71,6 +71,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* ICU locale ID */
 	text		daticulocale;
 
+	/* ICU collation rules */
+	text		daticurules BKI_DEFAULT(_null_);
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cede43440b..bcadefbf08 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -95,6 +95,7 @@ typedef struct pg_locale_struct *pg_locale_t;
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
 
 extern void make_icu_collator(const char *iculocstr,
+							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d4c8c6de38..ce9ed3c8bf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1190,6 +1190,36 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
  t        | t
 (1 row)
 
+-- rules
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+     a     
+-----------
+ Abernathy
+ apple
+ bird
+ Boston
+ Graham
+ green
+(6 rows)
+
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+     a     
+-----------
+ Abernathy
+ apple
+ green
+ bird
+ Boston
+ Graham
+(6 rows)
+
+DROP TABLE test7;
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+ERROR:  could not open collator for locale "" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b0ddc7db44..aa95c1ec42 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -472,6 +472,19 @@ CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=p
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
 
 
+-- rules
+
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+DROP TABLE test7;
+
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+
+
 -- nondeterministic collations
 
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);

base-commit: 20428d344a2964de6aaef9984fcd472f3c65d115
-- 
2.39.0

Laurenz Albe

laurenz.albe@cybertec.at

almost 3 years ago

In reply to: Peter Eisentraut (#4)

Re: Allow tailoring of ICU locales with custom rules

On Mon, 2023-01-16 at 12:18 +0100, Peter Eisentraut wrote:

Updated patch attached.

I like that patch. It applies and passes regression tests.

I played with it:

CREATE COLLATION german_phone (LOCALE = 'de-AT', PROVIDER = icu, RULES = '&oe < ö');

SELECT * FROM (VALUES ('od'), ('oe'), ('of'), ('p'), ('ö')) AS q(c)
ORDER BY c COLLATE german_phone;

c
════
od
oe
ö
of
p
(5 rows)

Cool so far. Now I created a database with that locale:

CREATE DATABASE teutsch LOCALE_PROVIDER icu ICU_LOCALE german_phone
LOCALE "de_AT.utf8" TEMPLATE template0;

Now the rules are not in "pg_database":

SELECT datcollate, daticulocale, daticurules FROM pg_database WHERE datname = 'teutsch';

datcollate │ daticulocale │ daticurules
════════════╪══════════════╪═════════════
de_AT.utf8 │ german_phone │ ∅
(1 row)

I connect to the database and try:

SELECT * FROM (VALUES ('od'), ('oe'), ('of'), ('p'), ('ö')) AS q(c)
ORDER BY c COLLATE german_phone;

ERROR: collation "german_phone" for encoding "UTF8" does not exist
LINE 1: ... ('oe'), ('of'), ('p'), ('ö')) AS q(c) ORDER BY c COLLATE ge...
^

Indeed, the collation isn't there...

I guess that it is not the fault of this patch that the collation isn't there,
but I think it is surprising. What good is a database collation that does not
exist in the database?

What might be the fault of this patch, however, is that "daticurules" is not
set in "pg_database". Looking at the code, that column seems to be copied
from the template database, but cannot be overridden.

Perhaps this only needs more documentation, but I am confused.

Yours,
Laurenz Albe

Daniel Verite

daniel@manitou-mail.org

almost 3 years ago

In reply to: Laurenz Albe (#5)

Re: Allow tailoring of ICU locales with custom rules

Laurenz Albe wrote:

Cool so far. Now I created a database with that locale:

CREATE DATABASE teutsch LOCALE_PROVIDER icu ICU_LOCALE german_phone
LOCALE "de_AT.utf8" TEMPLATE template0;

Now the rules are not in "pg_database":

The parameter after ICU_LOCALE is passed directly to ICU as a locale
ID, as opposed to refering a collation name in the current database.
This CREATE DATABASE doesn't fail because ICU accepts pretty much
anything as a locale ID, ignoring what it can't parse instead of
erroring out.

I think the way to express what you want should be:

CREATE DATABASE teutsch
LOCALE_PROVIDER icu
ICU_LOCALE 'de_AT'
LOCALE 'de_AT.utf8'
ICU_RULES '&a < g';

However it still leaves "daticurules" empty in the destination db,
because of an actual bug in the current patch.

Looking at createdb() in commands.c, it creates this variable:

@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
char *dbcollate = NULL;
char *dbctype = NULL;
char *dbiculocale = NULL;
+ char *dbicurules = NULL;
char dblocprovider = '\0';
char *canonname;
int encoding = -1;

and then reads it later

@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
		dblocprovider = src_locprovider;
	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;

/* Some encodings are client only */
if (!PG_VALID_BE_ENCODING(encoding))

but it forgets to assign it in between, so it stays NULL and src_icurules
is taken instead.

I guess that it is not the fault of this patch that the collation
isn't there, but I think it is surprising. What good is a database
collation that does not exist in the database?

Even if the above invocation of CREATE DATABASE worked as you
intuitively expected, by getting the characteristics from the
user-defined collation for the destination db, it still wouldn't work to
refer
to COLLATE "german_phone" in the destination database.
That's because there would be no "german_phone" entry in the
pg_collation of the destination db, as it's cloned from the template
db, which has no reason to have this collation.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Laurenz Albe

laurenz.albe@cybertec.at

almost 3 years ago

In reply to: Daniel Verite (#6)

Re: Allow tailoring of ICU locales with custom rules

On Sat, 2023-02-04 at 14:41 +0100, Daniel Verite wrote:

Laurenz Albe wrote:

Cool so far. Now I created a database with that locale:

CREATE DATABASE teutsch LOCALE_PROVIDER icu ICU_LOCALE german_phone
LOCALE "de_AT.utf8" TEMPLATE template0;

Now the rules are not in "pg_database":

The parameter after ICU_LOCALE is passed directly to ICU as a locale
ID, as opposed to refering a collation name in the current database.
This CREATE DATABASE doesn't fail because ICU accepts pretty much
anything as a locale ID, ignoring what it can't parse instead of
erroring out.

I think the way to express what you want should be:

CREATE DATABASE teutsch
LOCALE_PROVIDER icu
ICU_LOCALE 'de_AT'
LOCALE 'de_AT.utf8'
ICU_RULES '&a < g';

However it still leaves "daticurules" empty in the destination db,
because of an actual bug in the current patch.

I see. Thanks for the explanation.

I guess that it is not the fault of this patch that the collation
isn't there, but I think it is surprising. What good is a database
collation that does not exist in the database?

Even if the above invocation of CREATE DATABASE worked as you
intuitively expected, by getting the characteristics from the
user-defined collation for the destination db, it still wouldn't work to
refer
to COLLATE "german_phone" in the destination database.
That's because there would be no "german_phone" entry in the
pg_collation of the destination db, as it's cloned from the template
db, which has no reason to have this collation.

That makes sense. Then I think that the current behavior is buggy:
You should not be allowed to specify a collation that does not exist in
the template database. Otherwise you end up with something weird.

This is not the fault of this patch though.

Yours,
Laurenz Albe

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Daniel Verite (#6)

1 attachment(s)

Re: Allow tailoring of ICU locales with custom rules

On 04.02.23 14:41, Daniel Verite wrote:

However it still leaves "daticurules" empty in the destination db,
because of an actual bug in the current patch.

Looking at createdb() in commands.c, it creates this variable:

@@ -711,6 +714,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
char *dbcollate = NULL;
char *dbctype = NULL;
char *dbiculocale = NULL;
+ char *dbicurules = NULL;
char dblocprovider = '\0';
char *canonname;
int encoding = -1;

and then reads it later
@@ -1007,6 +1017,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
dblocprovider = src_locprovider;
if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
/* Some encodings are client only */
if (!PG_VALID_BE_ENCODING(encoding))

but it forgets to assign it in between, so it stays NULL and src_icurules
is taken instead.

Right. Here is a new patch with this fixed.

Attachments:

v4-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchtext/plain; charset=UTF-8; name=v4-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchDownload

From 7ca76032097397d685a313500c96a41b2c38ecc6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 6 Feb 2023 21:58:24 +0100
Subject: [PATCH v4] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml                    | 18 +++++++
 doc/src/sgml/ref/create_collation.sgml        | 22 ++++++++
 doc/src/sgml/ref/create_database.sgml         | 12 +++++
 src/backend/catalog/pg_collation.c            |  5 ++
 src/backend/commands/collationcmds.c          | 23 +++++++--
 src/backend/commands/dbcommands.c             | 51 +++++++++++++++++--
 src/backend/utils/adt/pg_locale.c             | 41 ++++++++++++++-
 src/backend/utils/init/postinit.c             | 11 +++-
 src/include/catalog/pg_collation.h            |  2 +
 src/include/catalog/pg_database.h             |  3 ++
 src/include/utils/pg_locale.h                 |  1 +
 .../regress/expected/collate.icu.utf8.out     | 30 +++++++++++
 src/test/regress/sql/collate.icu.utf8.sql     | 13 +++++
 13 files changed, 222 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
@@ -3106,6 +3115,15 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 58f5f0cd63..2c7266107e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
+    [ RULES = <replaceable>rules</replaceable>, ]
     [ VERSION = <replaceable>version</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
@@ -149,6 +150,19 @@ <title>Parameters</title>
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><replaceable>rules</replaceable></term>
+
+     <listitem>
+      <para>
+       Specifies additional collation rules to customize the behavior of the
+       collation.  This is supported for ICU only.  See <ulink
+       url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+       for details on the syntax.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>version</replaceable></term>
 
@@ -228,6 +242,14 @@ <title>Examples</title>
 </programlisting>
   </para>
 
+  <para>
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+<programlisting>
+<![CDATA[CREATE COLLATION en_custom (provider = icu, locale = 'en', rules = '&a < g');]]>
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index f3df2def86..860211e4d6 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -192,6 +192,18 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="create-database-icu-rules">
+      <term><replaceable class="parameter">icu_rules</replaceable></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        collation.  This is supported for ICU only.  See <ulink
+        url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+        for details on the syntax.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="create-database-locale-provider">
       <term><replaceable>locale_provider</replaceable></term>
 
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *colliculocale,
+				const char *collicurules,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
 	else
 		nulls[Anum_pg_collation_colliculocale - 1] = true;
+	if (collicurules)
+		values[Anum_pg_collation_collicurules - 1] = CStringGetTextDatum(collicurules);
+	else
+		nulls[Anum_pg_collation_collicurules - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 6a4311cc63..04e259b0ba 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,10 +64,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
+	char	   *collicurules;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -99,6 +101,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "rules") == 0)
+			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -161,6 +165,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			colliculocale = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			collicurules = TextDatumGetCString(datum);
+		else
+			collicurules = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -182,6 +192,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collcollate = NULL;
 		collctype = NULL;
 		colliculocale = NULL;
+		collicurules = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -191,6 +202,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collisdeterministic = true;
 
+		if (rulesEl)
+			collicurules = defGetString(rulesEl);
+
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
@@ -297,6 +311,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collcollate,
 							 collctype,
 							 colliculocale,
+							 collicurules,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -680,7 +695,7 @@ create_collation_from_locale(const char *locale, int nspid,
 	 */
 	collid = CollationCreate(locale, nspid, GetUserId(),
 							 COLLPROVIDER_LIBC, true, enc,
-							 locale, locale, NULL,
+							 locale, locale, NULL, NULL,
 							 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 							 true, true);
 	if (OidIsValid(collid))
@@ -755,7 +770,7 @@ win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
 
 		collid = CollationCreate(alias, param->nspid, GetUserId(),
 								 COLLPROVIDER_LIBC, true, enc,
-								 localebuf, localebuf, NULL,
+								 localebuf, localebuf, NULL, NULL,
 								 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 								 true, true);
 		if (OidIsValid(collid))
@@ -889,7 +904,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale, NULL,
+									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -951,7 +966,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 NULL, NULL, iculocstr,
+									 NULL, NULL, iculocstr, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 1f4ce2fb9c..6a06142541 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -119,6 +119,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char **dbIcurules,
 						char *dbLocProvider,
 						char **dbCollversion);
 static void remove_dbtablespaces(Oid db_id);
@@ -675,6 +676,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
 	char	   *src_iculocale = NULL;
+	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
@@ -698,6 +700,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
 	DefElem    *diculocale = NULL;
+	DefElem    *dicurules = NULL;
 	DefElem    *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
@@ -710,6 +713,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
+	char	   *dbicurules = NULL;
 	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
@@ -775,6 +779,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			diculocale = defel;
 		}
+		else if (strcmp(defel->defname, "icu_rules") == 0)
+		{
+			if (dicurules)
+				errorConflictingDefElem(defel, pstate);
+			dicurules = defel;
+		}
 		else if (strcmp(defel->defname, "locale_provider") == 0)
 		{
 			if (dlocprovider)
@@ -893,6 +903,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbctype = defGetString(dctype);
 	if (diculocale && diculocale->arg)
 		dbiculocale = defGetString(diculocale);
+	if (dicurules && dicurules->arg)
+		dbicurules = defGetString(dicurules);
 	if (dlocprovider && dlocprovider->arg)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
@@ -958,7 +970,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
 					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
@@ -1006,6 +1018,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -1097,6 +1111,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		if (dblocprovider == COLLPROVIDER_ICU)
 		{
+			char	   *val1;
+			char	   *val2;
+
 			Assert(dbiculocale);
 			Assert(src_iculocale);
 			if (strcmp(dbiculocale, src_iculocale) != 0)
@@ -1105,6 +1122,19 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
 								dbiculocale, src_iculocale),
 						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+
+			val1 = dbicurules;
+			if (!val1)
+				val1 = "";
+			val2 = src_icurules;
+			if (!val2)
+				val2 = "";
+			if (strcmp(val1, val2) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
+								val1, val2),
+						 errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
 		}
 	}
 
@@ -1313,6 +1343,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
 	else
 		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
+	if (dbicurules)
+		new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
+	else
+		new_record_nulls[Anum_pg_database_daticurules - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -1526,7 +1560,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1726,7 +1760,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1836,7 +1870,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2599,6 +2633,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char **dbIcurules,
 			char *dbLocProvider,
 			char **dbCollversion)
 {
@@ -2715,6 +2750,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbIculocale = TextDatumGetCString(datum);
 				}
+				if (dbIcurules)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
+					if (isnull)
+						*dbIcurules = NULL;
+					else
+						*dbIcurules = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 059e4fd79f..da514b9396 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,6 +69,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -1402,6 +1403,7 @@ struct pg_locale_struct default_locale;
 
 void
 make_icu_collator(const char *iculocstr,
+				  const char *icurules,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
@@ -1418,6 +1420,35 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	/*
+	 * If rules are specified, we extract the rules of the standard collation,
+	 * add our own rules, and make a new collator with the combined rules.
+	 */
+	if (icurules)
+	{
+		const UChar *default_rules;
+		UChar	   *agg_rules;
+		UChar	   *my_rules;
+		int32_t		length;
+
+		default_rules = ucol_getRules(collator, &length);
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
+		u_strcpy(agg_rules, default_rules);
+		u_strcat(agg_rules, my_rules);
+
+		ucol_close(collator);
+
+		status = U_ZERO_ERROR;
+		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
+								  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+	}
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1580,11 +1611,19 @@ pg_newlocale_from_collation(Oid collid)
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
 			const char *iculocstr;
+			const char *icurules;
 
 			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
 			Assert(!isnull);
 			iculocstr = TextDatumGetCString(datum);
-			make_icu_collator(iculocstr, &result);
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+			if (!isnull)
+				icurules = TextDatumGetCString(datum);
+			else
+				icurules = NULL;
+
+			make_icu_collator(iculocstr, icurules, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2f07ca7a0e..b0e20cc635 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -421,10 +421,19 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
+		char	   *icurules;
+
 		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
 		Assert(!isnull);
 		iculocale = TextDatumGetCString(datum);
-		make_icu_collator(iculocale, &default_locale);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(iculocale, icurules, &default_locale);
 	}
 	else
 		iculocale = NULL;
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index f9d5e88faf..bfa3568451 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -43,6 +43,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
 	text		collctype BKI_DEFAULT(_null_);	/* LC_CTYPE setting */
 	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
+	text		collicurules BKI_DEFAULT(_null_);	/* ICU collation rules */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -91,6 +92,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *colliculocale,
+							const char *collicurules,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 3da3492e92..a5c4efe086 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -71,6 +71,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* ICU locale ID */
 	text		daticulocale;
 
+	/* ICU collation rules */
+	text		daticurules BKI_DEFAULT(_null_);
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cede43440b..bcadefbf08 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -95,6 +95,7 @@ typedef struct pg_locale_struct *pg_locale_t;
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
 
 extern void make_icu_collator(const char *iculocstr,
+							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 4354dc07b8..2ac707b362 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1190,6 +1190,36 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
  t        | t
 (1 row)
 
+-- rules
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+     a     
+-----------
+ Abernathy
+ apple
+ bird
+ Boston
+ Graham
+ green
+(6 rows)
+
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+     a     
+-----------
+ Abernathy
+ apple
+ green
+ bird
+ Boston
+ Graham
+(6 rows)
+
+DROP TABLE test7;
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+ERROR:  could not open collator for locale "" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b0ddc7db44..aa95c1ec42 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -472,6 +472,19 @@ CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=p
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
 
 
+-- rules
+
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+DROP TABLE test7;
+
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+
+
 -- nondeterministic collations
 
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);

base-commit: cad56920513e5b7ccdd3d41b0ea893eb3383f863
-- 
2.39.1

Laurenz Albe

laurenz.albe@cybertec.at

almost 3 years ago

In reply to: Peter Eisentraut (#8)

Re: Allow tailoring of ICU locales with custom rules

On Mon, 2023-02-06 at 22:16 +0100, Peter Eisentraut wrote:

Right. Here is a new patch with this fixed.

Thanks. I played some more with it, and still are still some missing
odds and ends:

- There is a new option ICU_RULES to CREATE DATABASE, but it is not
reflected in \h CREATE DATABASE. sql_help_CREATE_DATABASE() needs to
be amended.

- There is no way to show the rules except by querying "pg_collation" or
"pg_database". I think it would be good to show the rules with
\dO+ and \l+.

- If I create a collation "x" with RULES and then create a database
with "ICU_LOCALE x", the rules are not copied over.

I don't know if that is intended or not, but it surprises me.
Should that be a WARNING? Or, since creating a database with a collation
that does not exist in "template0" doesn't make much sense (or does it?),
is there a way to forbid that?

Yours,
Laurenz Albe

#10

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Laurenz Albe (#9)

1 attachment(s)

Re: Allow tailoring of ICU locales with custom rules

On 14.02.23 17:53, Laurenz Albe wrote:

On Mon, 2023-02-06 at 22:16 +0100, Peter Eisentraut wrote:

Right. Here is a new patch with this fixed.

Thanks. I played some more with it, and still are still some missing
odds and ends:

- There is a new option ICU_RULES to CREATE DATABASE, but it is not
reflected in \h CREATE DATABASE. sql_help_CREATE_DATABASE() needs to
be amended.

Fixed.

- There is no way to show the rules except by querying "pg_collation" or
"pg_database". I think it would be good to show the rules with
\dO+ and \l+.

Fixed. I adjusted the order of the columns a bit, to make the overall
picture more sensible. The locale provider column is now earlier, since
it indicates which of the subsequent columns are applicable.

- If I create a collation "x" with RULES and then create a database
with "ICU_LOCALE x", the rules are not copied over.

I don't know if that is intended or not, but it surprises me.
Should that be a WARNING? Or, since creating a database with a collation
that does not exist in "template0" doesn't make much sense (or does it?),
is there a way to forbid that?

This is a misunderstanding of how things work. The value of the
database ICU_LOCALE attribute is passed to the ICU library. It does not
refer to a PostgreSQL collation object.

Attachments:

v5-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchtext/plain; charset=UTF-8; name=v5-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchDownload

From d6ee2e92af9d1a25fe316e5c93d8aa20179658da Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 20 Feb 2023 09:46:48 +0100
Subject: [PATCH v5] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml                    |  18 ++++
 doc/src/sgml/ref/create_collation.sgml        |  22 ++++
 doc/src/sgml/ref/create_database.sgml         |  13 +++
 src/backend/catalog/pg_collation.c            |   5 +
 src/backend/commands/collationcmds.c          |  23 +++-
 src/backend/commands/dbcommands.c             |  51 ++++++++-
 src/backend/utils/adt/pg_locale.c             |  41 ++++++-
 src/backend/utils/init/postinit.c             |  11 +-
 src/bin/psql/describe.c                       | 100 +++++++++++-------
 src/include/catalog/pg_collation.h            |   2 +
 src/include/catalog/pg_database.h             |   3 +
 src/include/utils/pg_locale.h                 |   1 +
 .../regress/expected/collate.icu.utf8.out     |  30 ++++++
 src/test/regress/expected/psql.out            |  18 ++--
 src/test/regress/sql/collate.icu.utf8.sql     |  13 +++
 15 files changed, 295 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
@@ -3106,6 +3115,15 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 136976165c..289f8147f1 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
+    [ RULES = <replaceable>rules</replaceable>, ]
     [ VERSION = <replaceable>version</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
@@ -149,6 +150,19 @@ <title>Parameters</title>
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><replaceable>rules</replaceable></term>
+
+     <listitem>
+      <para>
+       Specifies additional collation rules to customize the behavior of the
+       collation.  This is supported for ICU only.  See <ulink
+       url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+       for details on the syntax.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>version</replaceable></term>
 
@@ -228,6 +242,14 @@ <title>Examples</title>
 </programlisting>
   </para>
 
+  <para>
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+<programlisting>
+<![CDATA[CREATE COLLATION en_custom (provider = icu, locale = 'en', rules = '&a < g');]]>
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 57d13e34c2..6f62161b80 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -30,6 +30,7 @@
            [ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
            [ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
            [ ICU_LOCALE [=] <replaceable class="parameter">icu_locale</replaceable> ]
+           [ ICU_RULES [=] <replaceable class="parameter">icu_rules</replaceable> ]
            [ LOCALE_PROVIDER [=] <replaceable class="parameter">locale_provider</replaceable> ]
            [ COLLATION_VERSION = <replaceable>collation_version</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -192,6 +193,18 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="create-database-icu-rules">
+      <term><replaceable class="parameter">icu_rules</replaceable></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        collation.  This is supported for ICU only.  See <ulink
+        url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+        for details on the syntax.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="create-database-locale-provider">
       <term><replaceable>locale_provider</replaceable></term>
 
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *colliculocale,
+				const char *collicurules,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
 	else
 		nulls[Anum_pg_collation_colliculocale - 1] = true;
+	if (collicurules)
+		values[Anum_pg_collation_collicurules - 1] = CStringGetTextDatum(collicurules);
+	else
+		nulls[Anum_pg_collation_collicurules - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index eb62d285ea..c51e3afdb4 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,10 +64,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
+	char	   *collicurules;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -99,6 +101,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "rules") == 0)
+			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -161,6 +165,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			colliculocale = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			collicurules = TextDatumGetCString(datum);
+		else
+			collicurules = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -182,6 +192,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collcollate = NULL;
 		collctype = NULL;
 		colliculocale = NULL;
+		collicurules = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -191,6 +202,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collisdeterministic = true;
 
+		if (rulesEl)
+			collicurules = defGetString(rulesEl);
+
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
@@ -297,6 +311,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collcollate,
 							 collctype,
 							 colliculocale,
+							 collicurules,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -680,7 +695,7 @@ create_collation_from_locale(const char *locale, int nspid,
 	 */
 	collid = CollationCreate(locale, nspid, GetUserId(),
 							 COLLPROVIDER_LIBC, true, enc,
-							 locale, locale, NULL,
+							 locale, locale, NULL, NULL,
 							 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 							 true, true);
 	if (OidIsValid(collid))
@@ -755,7 +770,7 @@ win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
 
 		collid = CollationCreate(alias, param->nspid, GetUserId(),
 								 COLLPROVIDER_LIBC, true, enc,
-								 localebuf, localebuf, NULL,
+								 localebuf, localebuf, NULL, NULL,
 								 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 								 true, true);
 		if (OidIsValid(collid))
@@ -889,7 +904,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale, NULL,
+									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -951,7 +966,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 NULL, NULL, iculocstr,
+									 NULL, NULL, iculocstr, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index ef05633bb0..e75efffa08 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -119,6 +119,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char **dbIcurules,
 						char *dbLocProvider,
 						char **dbCollversion);
 static void remove_dbtablespaces(Oid db_id);
@@ -675,6 +676,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
 	char	   *src_iculocale = NULL;
+	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
@@ -698,6 +700,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
 	DefElem    *diculocale = NULL;
+	DefElem    *dicurules = NULL;
 	DefElem    *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
@@ -710,6 +713,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
+	char	   *dbicurules = NULL;
 	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
@@ -775,6 +779,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			diculocale = defel;
 		}
+		else if (strcmp(defel->defname, "icu_rules") == 0)
+		{
+			if (dicurules)
+				errorConflictingDefElem(defel, pstate);
+			dicurules = defel;
+		}
 		else if (strcmp(defel->defname, "locale_provider") == 0)
 		{
 			if (dlocprovider)
@@ -893,6 +903,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbctype = defGetString(dctype);
 	if (diculocale && diculocale->arg)
 		dbiculocale = defGetString(diculocale);
+	if (dicurules && dicurules->arg)
+		dbicurules = defGetString(dicurules);
 	if (dlocprovider && dlocprovider->arg)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
@@ -958,7 +970,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
 					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
@@ -1006,6 +1018,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -1097,6 +1111,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		if (dblocprovider == COLLPROVIDER_ICU)
 		{
+			char	   *val1;
+			char	   *val2;
+
 			Assert(dbiculocale);
 			Assert(src_iculocale);
 			if (strcmp(dbiculocale, src_iculocale) != 0)
@@ -1105,6 +1122,19 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
 								dbiculocale, src_iculocale),
 						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+
+			val1 = dbicurules;
+			if (!val1)
+				val1 = "";
+			val2 = src_icurules;
+			if (!val2)
+				val2 = "";
+			if (strcmp(val1, val2) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
+								val1, val2),
+						 errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
 		}
 	}
 
@@ -1313,6 +1343,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
 	else
 		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
+	if (dbicurules)
+		new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
+	else
+		new_record_nulls[Anum_pg_database_daticurules - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -1526,7 +1560,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1726,7 +1760,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1836,7 +1870,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2599,6 +2633,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char **dbIcurules,
 			char *dbLocProvider,
 			char **dbCollversion)
 {
@@ -2715,6 +2750,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbIculocale = TextDatumGetCString(datum);
 				}
+				if (dbIcurules)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
+					if (isnull)
+						*dbIcurules = NULL;
+					else
+						*dbIcurules = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 059e4fd79f..da514b9396 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,6 +69,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -1402,6 +1403,7 @@ struct pg_locale_struct default_locale;
 
 void
 make_icu_collator(const char *iculocstr,
+				  const char *icurules,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
@@ -1418,6 +1420,35 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	/*
+	 * If rules are specified, we extract the rules of the standard collation,
+	 * add our own rules, and make a new collator with the combined rules.
+	 */
+	if (icurules)
+	{
+		const UChar *default_rules;
+		UChar	   *agg_rules;
+		UChar	   *my_rules;
+		int32_t		length;
+
+		default_rules = ucol_getRules(collator, &length);
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
+		u_strcpy(agg_rules, default_rules);
+		u_strcat(agg_rules, my_rules);
+
+		ucol_close(collator);
+
+		status = U_ZERO_ERROR;
+		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
+								  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+	}
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1580,11 +1611,19 @@ pg_newlocale_from_collation(Oid collid)
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
 			const char *iculocstr;
+			const char *icurules;
 
 			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
 			Assert(!isnull);
 			iculocstr = TextDatumGetCString(datum);
-			make_icu_collator(iculocstr, &result);
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+			if (!isnull)
+				icurules = TextDatumGetCString(datum);
+			else
+				icurules = NULL;
+
+			make_icu_collator(iculocstr, icurules, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2f07ca7a0e..b0e20cc635 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -421,10 +421,19 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
+		char	   *icurules;
+
 		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
 		Assert(!isnull);
 		iculocale = TextDatumGetCString(datum);
-		make_icu_collator(iculocale, &default_locale);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(iculocale, icurules, &default_locale);
 	}
 	else
 		iculocale = NULL;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c8a0bb7b3a..b2455b9c47 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -928,38 +928,52 @@ listAllDbs(const char *pattern, bool verbose)
 	initPQExpBuffer(&buf);
 
 	printfPQExpBuffer(&buf,
-					  "SELECT d.datname as \"%s\",\n"
-					  "       pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
-					  "       pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n"
-					  "       d.datcollate as \"%s\",\n"
-					  "       d.datctype as \"%s\",\n",
+					  "SELECT\n"
+					  "  d.datname as \"%s\",\n"
+					  "  pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
+					  "  pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n",
 					  gettext_noop("Name"),
 					  gettext_noop("Owner"),
-					  gettext_noop("Encoding"),
-					  gettext_noop("Collate"),
-					  gettext_noop("Ctype"));
+					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "       d.daticulocale as \"%s\",\n"
-						  "       CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
-						  gettext_noop("ICU Locale"),
+						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
-						  "       NULL as \"%s\",\n"
-						  "       'libc' AS \"%s\",\n",
-						  gettext_noop("ICU Locale"),
+						  "  'libc' AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
-	appendPQExpBufferStr(&buf, "       ");
+	appendPQExpBuffer(&buf,
+					  "  d.datcollate as \"%s\",\n"
+					  "  d.datctype as \"%s\",\n",
+					  gettext_noop("Collate"),
+					  gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "  d.daticulocale as \"%s\",\n",
+						  gettext_noop("ICU Locale"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  NULL as \"%s\",\n",
+						  gettext_noop("ICU Locale"));
+	if (pset.sversion >= 160000)
+		appendPQExpBuffer(&buf,
+						  "  d.daticurules as \"%s\",\n",
+						  gettext_noop("ICU Rules"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  NULL as \"%s\",\n",
+						  gettext_noop("ICU Rules"));
+	appendPQExpBufferStr(&buf, "  ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')\n"
-						  "            THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))\n"
-						  "            ELSE 'No Access'\n"
-						  "       END as \"%s\""
-						  ",\n       t.spcname as \"%s\""
-						  ",\n       pg_catalog.shobj_description(d.oid, 'pg_database') as \"%s\"",
+						  ",\n  CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')\n"
+						  "       THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))\n"
+						  "       ELSE 'No Access'\n"
+						  "  END as \"%s\""
+						  ",\n  t.spcname as \"%s\""
+						  ",\n  pg_catalog.shobj_description(d.oid, 'pg_database') as \"%s\"",
 						  gettext_noop("Size"),
 						  gettext_noop("Tablespace"),
 						  gettext_noop("Description"));
@@ -4854,52 +4868,64 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, false, false, true, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, false, true, false};
 
 	initPQExpBuffer(&buf);
 
 	printfPQExpBuffer(&buf,
-					  "SELECT n.nspname AS \"%s\",\n"
-					  "       c.collname AS \"%s\",\n"
-					  "       c.collcollate AS \"%s\",\n"
-					  "       c.collctype AS \"%s\"",
+					  "SELECT\n"
+					  "  n.nspname AS \"%s\",\n"
+					  "  c.collname AS \"%s\",\n",
 					  gettext_noop("Schema"),
-					  gettext_noop("Name"),
+					  gettext_noop("Name"));
+
+	if (pset.sversion >= 100000)
+		appendPQExpBuffer(&buf,
+						  "  CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  'libc' AS \"%s\",\n",
+						  gettext_noop("Provider"));
+
+	appendPQExpBuffer(&buf,
+					  "  c.collcollate AS \"%s\",\n"
+					  "  c.collctype AS \"%s\",\n",
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
 
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  ",\n       c.colliculocale AS \"%s\"",
+						  "  c.colliculocale AS \"%s\",\n",
 						  gettext_noop("ICU Locale"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       c.collcollate AS \"%s\"",
+						  "  c.collcollate AS \"%s\",\n",
 						  gettext_noop("ICU Locale"));
 
-	if (pset.sversion >= 100000)
+	if (pset.sversion >= 160000)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",
-						  gettext_noop("Provider"));
+						  "  c.collicurules AS \"%s\",\n",
+						  gettext_noop("ICU Rules"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       'libc' AS \"%s\"",
-						  gettext_noop("Provider"));
+						  "  NULL AS \"%s\",\n",
+						  gettext_noop("ICU Rules"));
 
 	if (pset.sversion >= 120000)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE WHEN c.collisdeterministic THEN '%s' ELSE '%s' END AS \"%s\"",
+						  "  CASE WHEN c.collisdeterministic THEN '%s' ELSE '%s' END AS \"%s\"",
 						  gettext_noop("yes"), gettext_noop("no"),
 						  gettext_noop("Deterministic?"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       '%s' AS \"%s\"",
+						  "  '%s' AS \"%s\"",
 						  gettext_noop("yes"),
 						  gettext_noop("Deterministic?"));
 
 	if (verbose)
 		appendPQExpBuffer(&buf,
-						  ",\n       pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
+						  ",\n  pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
 						  gettext_noop("Description"));
 
 	appendPQExpBufferStr(&buf,
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index f9d5e88faf..bfa3568451 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -43,6 +43,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
 	text		collctype BKI_DEFAULT(_null_);	/* LC_CTYPE setting */
 	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
+	text		collicurules BKI_DEFAULT(_null_);	/* ICU collation rules */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -91,6 +92,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *colliculocale,
+							const char *collicurules,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 3da3492e92..a5c4efe086 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -71,6 +71,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* ICU locale ID */
 	text		daticulocale;
 
+	/* ICU collation rules */
+	text		daticurules BKI_DEFAULT(_null_);
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cede43440b..bcadefbf08 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -95,6 +95,7 @@ typedef struct pg_locale_struct *pg_locale_t;
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
 
 extern void make_icu_collator(const char *iculocstr,
+							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 4354dc07b8..2ac707b362 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1190,6 +1190,36 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
  t        | t
 (1 row)
 
+-- rules
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+     a     
+-----------
+ Abernathy
+ apple
+ bird
+ Boston
+ Graham
+ green
+(6 rows)
+
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+     a     
+-----------
+ Abernathy
+ apple
+ green
+ bird
+ Boston
+ Graham
+(6 rows)
+
+DROP TABLE test7;
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+ERROR:  could not open collator for locale "" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index 8fc62cebd2..ba66b8a2c5 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -6170,9 +6170,9 @@ List of schemas
 (0 rows)
 
 \dO "no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp "no.such.access.privilege"
@@ -6359,9 +6359,9 @@ cross-database references are not implemented: "no.such.schema"."no.such.languag
 (0 rows)
 
 \dO "no.such.schema"."no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp "no.such.schema"."no.such.access.privilege"
@@ -6502,9 +6502,9 @@ List of text search templates
 (0 rows)
 
 \dO regression."no.such.schema"."no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp regression."no.such.schema"."no.such.access.privilege"
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b0ddc7db44..aa95c1ec42 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -472,6 +472,19 @@ CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=p
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
 
 
+-- rules
+
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+DROP TABLE test7;
+
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+
+
 -- nondeterministic collations
 
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);

base-commit: 2cb82e2acfba069d00c6bd253d58df03d315672a
-- 
2.39.2

#11

Daniel Verite

daniel@manitou-mail.org

almost 3 years ago

In reply to: Peter Eisentraut (#10)

Re: Allow tailoring of ICU locales with custom rules

Peter Eisentraut wrote:

[patch v5]

Two quick comments:

- pg_dump support need to be added for CREATE COLLATION / DATABASE

- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#12

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Daniel Verite (#11)

1 attachment(s)

Re: Allow tailoring of ICU locales with custom rules

On 20.02.23 17:30, Daniel Verite wrote:

- pg_dump support need to be added for CREATE COLLATION / DATABASE

I have added that.

- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?

Right, that would be an initdb option. Is that too many initdb options
then? It would be easy to add, if we think it's worth it.

Attachments:

v6-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchtext/plain; charset=UTF-8; name=v6-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchDownload

From b566e8756fbf78da804f5538d68350cda7a9bab3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 22 Feb 2023 18:33:47 +0100
Subject: [PATCH v6] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml                    |  18 ++++
 doc/src/sgml/ref/create_collation.sgml        |  22 ++++
 doc/src/sgml/ref/create_database.sgml         |  13 +++
 src/backend/catalog/pg_collation.c            |   5 +
 src/backend/commands/collationcmds.c          |  23 +++-
 src/backend/commands/dbcommands.c             |  51 ++++++++-
 src/backend/utils/adt/pg_locale.c             |  41 ++++++-
 src/backend/utils/init/postinit.c             |  11 +-
 src/bin/pg_dump/pg_dump.c                     |  37 +++++++
 src/bin/psql/describe.c                       | 100 +++++++++++-------
 src/include/catalog/pg_collation.h            |   2 +
 src/include/catalog/pg_database.h             |   3 +
 src/include/utils/pg_locale.h                 |   1 +
 .../regress/expected/collate.icu.utf8.out     |  30 ++++++
 src/test/regress/expected/psql.out            |  18 ++--
 src/test/regress/sql/collate.icu.utf8.sql     |  13 +++
 16 files changed, 332 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
@@ -3106,6 +3115,15 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 136976165c..289f8147f1 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
+    [ RULES = <replaceable>rules</replaceable>, ]
     [ VERSION = <replaceable>version</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
@@ -149,6 +150,19 @@ <title>Parameters</title>
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><replaceable>rules</replaceable></term>
+
+     <listitem>
+      <para>
+       Specifies additional collation rules to customize the behavior of the
+       collation.  This is supported for ICU only.  See <ulink
+       url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+       for details on the syntax.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>version</replaceable></term>
 
@@ -228,6 +242,14 @@ <title>Examples</title>
 </programlisting>
   </para>
 
+  <para>
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+<programlisting>
+<![CDATA[CREATE COLLATION en_custom (provider = icu, locale = 'en', rules = '&a < g');]]>
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 57d13e34c2..6f62161b80 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -30,6 +30,7 @@
            [ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
            [ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
            [ ICU_LOCALE [=] <replaceable class="parameter">icu_locale</replaceable> ]
+           [ ICU_RULES [=] <replaceable class="parameter">icu_rules</replaceable> ]
            [ LOCALE_PROVIDER [=] <replaceable class="parameter">locale_provider</replaceable> ]
            [ COLLATION_VERSION = <replaceable>collation_version</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -192,6 +193,18 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="create-database-icu-rules">
+      <term><replaceable class="parameter">icu_rules</replaceable></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        collation.  This is supported for ICU only.  See <ulink
+        url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+        for details on the syntax.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="create-database-locale-provider">
       <term><replaceable>locale_provider</replaceable></term>
 
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *colliculocale,
+				const char *collicurules,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
 	else
 		nulls[Anum_pg_collation_colliculocale - 1] = true;
+	if (collicurules)
+		values[Anum_pg_collation_collicurules - 1] = CStringGetTextDatum(collicurules);
+	else
+		nulls[Anum_pg_collation_collicurules - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index eb62d285ea..c51e3afdb4 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,10 +64,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
+	char	   *collicurules;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -99,6 +101,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "rules") == 0)
+			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -161,6 +165,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			colliculocale = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			collicurules = TextDatumGetCString(datum);
+		else
+			collicurules = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -182,6 +192,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collcollate = NULL;
 		collctype = NULL;
 		colliculocale = NULL;
+		collicurules = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -191,6 +202,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collisdeterministic = true;
 
+		if (rulesEl)
+			collicurules = defGetString(rulesEl);
+
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
@@ -297,6 +311,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collcollate,
 							 collctype,
 							 colliculocale,
+							 collicurules,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -680,7 +695,7 @@ create_collation_from_locale(const char *locale, int nspid,
 	 */
 	collid = CollationCreate(locale, nspid, GetUserId(),
 							 COLLPROVIDER_LIBC, true, enc,
-							 locale, locale, NULL,
+							 locale, locale, NULL, NULL,
 							 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 							 true, true);
 	if (OidIsValid(collid))
@@ -755,7 +770,7 @@ win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
 
 		collid = CollationCreate(alias, param->nspid, GetUserId(),
 								 COLLPROVIDER_LIBC, true, enc,
-								 localebuf, localebuf, NULL,
+								 localebuf, localebuf, NULL, NULL,
 								 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 								 true, true);
 		if (OidIsValid(collid))
@@ -889,7 +904,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale, NULL,
+									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -951,7 +966,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 NULL, NULL, iculocstr,
+									 NULL, NULL, iculocstr, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index a0259cc593..7063a5a7ed 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -119,6 +119,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char **dbIcurules,
 						char *dbLocProvider,
 						char **dbCollversion);
 static void remove_dbtablespaces(Oid db_id);
@@ -675,6 +676,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
 	char	   *src_iculocale = NULL;
+	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
@@ -698,6 +700,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
 	DefElem    *diculocale = NULL;
+	DefElem    *dicurules = NULL;
 	DefElem    *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
@@ -710,6 +713,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
+	char	   *dbicurules = NULL;
 	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
@@ -775,6 +779,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			diculocale = defel;
 		}
+		else if (strcmp(defel->defname, "icu_rules") == 0)
+		{
+			if (dicurules)
+				errorConflictingDefElem(defel, pstate);
+			dicurules = defel;
+		}
 		else if (strcmp(defel->defname, "locale_provider") == 0)
 		{
 			if (dlocprovider)
@@ -893,6 +903,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbctype = defGetString(dctype);
 	if (diculocale && diculocale->arg)
 		dbiculocale = defGetString(diculocale);
+	if (dicurules && dicurules->arg)
+		dbicurules = defGetString(dicurules);
 	if (dlocprovider && dlocprovider->arg)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
@@ -958,7 +970,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
 					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
@@ -1006,6 +1018,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -1097,6 +1111,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		if (dblocprovider == COLLPROVIDER_ICU)
 		{
+			char	   *val1;
+			char	   *val2;
+
 			Assert(dbiculocale);
 			Assert(src_iculocale);
 			if (strcmp(dbiculocale, src_iculocale) != 0)
@@ -1105,6 +1122,19 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
 								dbiculocale, src_iculocale),
 						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+
+			val1 = dbicurules;
+			if (!val1)
+				val1 = "";
+			val2 = src_icurules;
+			if (!val2)
+				val2 = "";
+			if (strcmp(val1, val2) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
+								val1, val2),
+						 errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
 		}
 	}
 
@@ -1313,6 +1343,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
 	else
 		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
+	if (dbicurules)
+		new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
+	else
+		new_record_nulls[Anum_pg_database_daticurules - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -1526,7 +1560,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1726,7 +1760,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1836,7 +1870,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2599,6 +2633,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char **dbIcurules,
 			char *dbLocProvider,
 			char **dbCollversion)
 {
@@ -2715,6 +2750,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbIculocale = TextDatumGetCString(datum);
 				}
+				if (dbIcurules)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
+					if (isnull)
+						*dbIcurules = NULL;
+					else
+						*dbIcurules = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 059e4fd79f..da514b9396 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,6 +69,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -1402,6 +1403,7 @@ struct pg_locale_struct default_locale;
 
 void
 make_icu_collator(const char *iculocstr,
+				  const char *icurules,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
@@ -1418,6 +1420,35 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	/*
+	 * If rules are specified, we extract the rules of the standard collation,
+	 * add our own rules, and make a new collator with the combined rules.
+	 */
+	if (icurules)
+	{
+		const UChar *default_rules;
+		UChar	   *agg_rules;
+		UChar	   *my_rules;
+		int32_t		length;
+
+		default_rules = ucol_getRules(collator, &length);
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
+		u_strcpy(agg_rules, default_rules);
+		u_strcat(agg_rules, my_rules);
+
+		ucol_close(collator);
+
+		status = U_ZERO_ERROR;
+		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
+								  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+	}
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1580,11 +1611,19 @@ pg_newlocale_from_collation(Oid collid)
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
 			const char *iculocstr;
+			const char *icurules;
 
 			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
 			Assert(!isnull);
 			iculocstr = TextDatumGetCString(datum);
-			make_icu_collator(iculocstr, &result);
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+			if (!isnull)
+				icurules = TextDatumGetCString(datum);
+			else
+				icurules = NULL;
+
+			make_icu_collator(iculocstr, icurules, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2f07ca7a0e..b0e20cc635 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -421,10 +421,19 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
+		char	   *icurules;
+
 		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
 		Assert(!isnull);
 		iculocale = TextDatumGetCString(datum);
-		make_icu_collator(iculocale, &default_locale);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(iculocale, icurules, &default_locale);
 	}
 	else
 		iculocale = NULL;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1a06eeaf6a..60ae19ef12 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2844,6 +2844,7 @@ dumpDatabase(Archive *fout)
 				i_collate,
 				i_ctype,
 				i_daticulocale,
+				i_daticurules,
 				i_frozenxid,
 				i_minmxid,
 				i_datacl,
@@ -2862,6 +2863,7 @@ dumpDatabase(Archive *fout)
 			   *collate,
 			   *ctype,
 			   *iculocale,
+			   *icurules,
 			   *datistemplate,
 			   *datconnlimit,
 			   *tablespace;
@@ -2888,6 +2890,10 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(dbQry, "datlocprovider, daticulocale, datcollversion, ");
 	else
 		appendPQExpBufferStr(dbQry, "'c' AS datlocprovider, NULL AS daticulocale, NULL AS datcollversion, ");
+	if (fout->remoteVersion >= 160000)
+		appendPQExpBufferStr(dbQry, "daticurules, ");
+	else
+		appendPQExpBufferStr(dbQry, "NULL AS daticurules, ");
 	appendPQExpBufferStr(dbQry,
 						 "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, "
 						 "shobj_description(oid, 'pg_database') AS description "
@@ -2905,6 +2911,7 @@ dumpDatabase(Archive *fout)
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
 	i_daticulocale = PQfnumber(res, "daticulocale");
+	i_daticurules = PQfnumber(res, "daticurules");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
 	i_minmxid = PQfnumber(res, "datminmxid");
 	i_datacl = PQfnumber(res, "datacl");
@@ -2926,6 +2933,10 @@ dumpDatabase(Archive *fout)
 		iculocale = PQgetvalue(res, 0, i_daticulocale);
 	else
 		iculocale = NULL;
+	if (!PQgetisnull(res, 0, i_daticurules))
+		icurules = PQgetvalue(res, 0, i_daticurules);
+	else
+		icurules = NULL;
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
 	minmxid = atooid(PQgetvalue(res, 0, i_minmxid));
 	dbdacl.acl = PQgetvalue(res, 0, i_datacl);
@@ -2991,6 +3002,11 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ICU_LOCALE = ");
 		appendStringLiteralAH(creaQry, iculocale, fout);
 	}
+	if (icurules)
+	{
+		appendPQExpBufferStr(creaQry, " ICU_RULES = ");
+		appendStringLiteralAH(creaQry, icurules, fout);
+	}
 
 	/*
 	 * For binary upgrade, carry over the collation version.  For normal
@@ -13154,10 +13170,12 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	int			i_collcollate;
 	int			i_collctype;
 	int			i_colliculocale;
+	int			i_collicurules;
 	const char *collprovider;
 	const char *collcollate;
 	const char *collctype;
 	const char *colliculocale;
+	const char *collicurules;
 
 	/* Do nothing in data-only dump */
 	if (dopt->dataOnly)
@@ -13195,6 +13213,13 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 		appendPQExpBufferStr(query,
 							 "NULL AS colliculocale, ");
 
+	if (fout->remoteVersion >= 160000)
+		appendPQExpBufferStr(query,
+							 "collicurules, ");
+	else
+		appendPQExpBufferStr(query,
+							 "NULL AS collicurules, ");
+
 	appendPQExpBuffer(query,
 					  "collcollate, "
 					  "collctype "
@@ -13209,6 +13234,7 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	i_collcollate = PQfnumber(res, "collcollate");
 	i_collctype = PQfnumber(res, "collctype");
 	i_colliculocale = PQfnumber(res, "colliculocale");
+	i_collicurules = PQfnumber(res, "collicurules");
 
 	collprovider = PQgetvalue(res, 0, i_collprovider);
 
@@ -13227,6 +13253,11 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	else
 		colliculocale = NULL;
 
+	if (!PQgetisnull(res, 0, i_collicurules))
+		collicurules = PQgetvalue(res, 0, i_collicurules);
+	else
+		collicurules = NULL;
+
 	appendPQExpBuffer(delq, "DROP COLLATION %s;\n",
 					  fmtQualifiedDumpable(collinfo));
 
@@ -13272,6 +13303,12 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 		}
 	}
 
+	if (collicurules)
+	{
+		appendPQExpBufferStr(q, ", rules = ");
+		appendStringLiteralAH(q, collicurules, fout);
+	}
+
 	/*
 	 * For binary upgrade, carry over the collation version.  For normal
 	 * dump/restore, omit the version, so that it is computed upon restore.
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c8a0bb7b3a..b2455b9c47 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -928,38 +928,52 @@ listAllDbs(const char *pattern, bool verbose)
 	initPQExpBuffer(&buf);
 
 	printfPQExpBuffer(&buf,
-					  "SELECT d.datname as \"%s\",\n"
-					  "       pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
-					  "       pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n"
-					  "       d.datcollate as \"%s\",\n"
-					  "       d.datctype as \"%s\",\n",
+					  "SELECT\n"
+					  "  d.datname as \"%s\",\n"
+					  "  pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
+					  "  pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n",
 					  gettext_noop("Name"),
 					  gettext_noop("Owner"),
-					  gettext_noop("Encoding"),
-					  gettext_noop("Collate"),
-					  gettext_noop("Ctype"));
+					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "       d.daticulocale as \"%s\",\n"
-						  "       CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
-						  gettext_noop("ICU Locale"),
+						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
-						  "       NULL as \"%s\",\n"
-						  "       'libc' AS \"%s\",\n",
-						  gettext_noop("ICU Locale"),
+						  "  'libc' AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
-	appendPQExpBufferStr(&buf, "       ");
+	appendPQExpBuffer(&buf,
+					  "  d.datcollate as \"%s\",\n"
+					  "  d.datctype as \"%s\",\n",
+					  gettext_noop("Collate"),
+					  gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "  d.daticulocale as \"%s\",\n",
+						  gettext_noop("ICU Locale"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  NULL as \"%s\",\n",
+						  gettext_noop("ICU Locale"));
+	if (pset.sversion >= 160000)
+		appendPQExpBuffer(&buf,
+						  "  d.daticurules as \"%s\",\n",
+						  gettext_noop("ICU Rules"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  NULL as \"%s\",\n",
+						  gettext_noop("ICU Rules"));
+	appendPQExpBufferStr(&buf, "  ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')\n"
-						  "            THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))\n"
-						  "            ELSE 'No Access'\n"
-						  "       END as \"%s\""
-						  ",\n       t.spcname as \"%s\""
-						  ",\n       pg_catalog.shobj_description(d.oid, 'pg_database') as \"%s\"",
+						  ",\n  CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')\n"
+						  "       THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))\n"
+						  "       ELSE 'No Access'\n"
+						  "  END as \"%s\""
+						  ",\n  t.spcname as \"%s\""
+						  ",\n  pg_catalog.shobj_description(d.oid, 'pg_database') as \"%s\"",
 						  gettext_noop("Size"),
 						  gettext_noop("Tablespace"),
 						  gettext_noop("Description"));
@@ -4854,52 +4868,64 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, false, false, true, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, false, true, false};
 
 	initPQExpBuffer(&buf);
 
 	printfPQExpBuffer(&buf,
-					  "SELECT n.nspname AS \"%s\",\n"
-					  "       c.collname AS \"%s\",\n"
-					  "       c.collcollate AS \"%s\",\n"
-					  "       c.collctype AS \"%s\"",
+					  "SELECT\n"
+					  "  n.nspname AS \"%s\",\n"
+					  "  c.collname AS \"%s\",\n",
 					  gettext_noop("Schema"),
-					  gettext_noop("Name"),
+					  gettext_noop("Name"));
+
+	if (pset.sversion >= 100000)
+		appendPQExpBuffer(&buf,
+						  "  CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  'libc' AS \"%s\",\n",
+						  gettext_noop("Provider"));
+
+	appendPQExpBuffer(&buf,
+					  "  c.collcollate AS \"%s\",\n"
+					  "  c.collctype AS \"%s\",\n",
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
 
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  ",\n       c.colliculocale AS \"%s\"",
+						  "  c.colliculocale AS \"%s\",\n",
 						  gettext_noop("ICU Locale"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       c.collcollate AS \"%s\"",
+						  "  c.collcollate AS \"%s\",\n",
 						  gettext_noop("ICU Locale"));
 
-	if (pset.sversion >= 100000)
+	if (pset.sversion >= 160000)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",
-						  gettext_noop("Provider"));
+						  "  c.collicurules AS \"%s\",\n",
+						  gettext_noop("ICU Rules"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       'libc' AS \"%s\"",
-						  gettext_noop("Provider"));
+						  "  NULL AS \"%s\",\n",
+						  gettext_noop("ICU Rules"));
 
 	if (pset.sversion >= 120000)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE WHEN c.collisdeterministic THEN '%s' ELSE '%s' END AS \"%s\"",
+						  "  CASE WHEN c.collisdeterministic THEN '%s' ELSE '%s' END AS \"%s\"",
 						  gettext_noop("yes"), gettext_noop("no"),
 						  gettext_noop("Deterministic?"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       '%s' AS \"%s\"",
+						  "  '%s' AS \"%s\"",
 						  gettext_noop("yes"),
 						  gettext_noop("Deterministic?"));
 
 	if (verbose)
 		appendPQExpBuffer(&buf,
-						  ",\n       pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
+						  ",\n  pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
 						  gettext_noop("Description"));
 
 	appendPQExpBufferStr(&buf,
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index f9d5e88faf..bfa3568451 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -43,6 +43,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
 	text		collctype BKI_DEFAULT(_null_);	/* LC_CTYPE setting */
 	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
+	text		collicurules BKI_DEFAULT(_null_);	/* ICU collation rules */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -91,6 +92,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *colliculocale,
+							const char *collicurules,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 3da3492e92..a5c4efe086 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -71,6 +71,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* ICU locale ID */
 	text		daticulocale;
 
+	/* ICU collation rules */
+	text		daticurules BKI_DEFAULT(_null_);
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index cede43440b..bcadefbf08 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -95,6 +95,7 @@ typedef struct pg_locale_struct *pg_locale_t;
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
 
 extern void make_icu_collator(const char *iculocstr,
+							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 4354dc07b8..2ac707b362 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1190,6 +1190,36 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
  t        | t
 (1 row)
 
+-- rules
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+     a     
+-----------
+ Abernathy
+ apple
+ bird
+ Boston
+ Graham
+ green
+(6 rows)
+
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+     a     
+-----------
+ Abernathy
+ apple
+ green
+ bird
+ Boston
+ Graham
+(6 rows)
+
+DROP TABLE test7;
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+ERROR:  could not open collator for locale "" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index 8fc62cebd2..ba66b8a2c5 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -6170,9 +6170,9 @@ List of schemas
 (0 rows)
 
 \dO "no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp "no.such.access.privilege"
@@ -6359,9 +6359,9 @@ cross-database references are not implemented: "no.such.schema"."no.such.languag
 (0 rows)
 
 \dO "no.such.schema"."no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp "no.such.schema"."no.such.access.privilege"
@@ -6502,9 +6502,9 @@ List of text search templates
 (0 rows)
 
 \dO regression."no.such.schema"."no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp regression."no.such.schema"."no.such.access.privilege"
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b0ddc7db44..aa95c1ec42 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -472,6 +472,19 @@ CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=p
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
 
 
+-- rules
+
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+DROP TABLE test7;
+
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+
+
 -- nondeterministic collations
 
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);

base-commit: 2ddab010c2777c6a965cea82dc1b809ddc33ecc1
-- 
2.39.2

#13

Laurenz Albe

laurenz.albe@cybertec.at

almost 3 years ago

In reply to: Peter Eisentraut (#12)

Re: Allow tailoring of ICU locales with custom rules

On Wed, 2023-02-22 at 18:35 +0100, Peter Eisentraut wrote:

- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?

Right, that would be an initdb option. Is that too many initdb options
then? It would be easy to add, if we think it's worth it.

An alternative would be to document that you can drop "template1" and
create it again using the ICU collation rules you need.

But I'd prefer an "initdb" option.

Yours,
Laurenz Albe

#14

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Laurenz Albe (#13)

1 attachment(s)

Re: Allow tailoring of ICU locales with custom rules

On 02.03.23 16:39, Laurenz Albe wrote:

On Wed, 2023-02-22 at 18:35 +0100, Peter Eisentraut wrote:

- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?

Right, that would be an initdb option. Is that too many initdb options
then? It would be easy to add, if we think it's worth it.

An alternative would be to document that you can drop "template1" and
create it again using the ICU collation rules you need.

But I'd prefer an "initdb" option.

Ok, here is a version with an initdb option and also a createdb option
(to expose the CREATE DATABASE option).

You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules='&a < c < b < e < d'

;-)

Attachments:

v7-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchtext/plain; charset=UTF-8; name=v7-0001-Allow-tailoring-of-ICU-locales-with-custom-rules.patchDownload

From 615763ccf5a1c18c3da1286eb4c86d19eb397ac0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 3 Mar 2023 11:46:56 +0100
Subject: [PATCH v7] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml                    |  18 ++++
 doc/src/sgml/ref/create_collation.sgml        |  22 ++++
 doc/src/sgml/ref/create_database.sgml         |  14 +++
 doc/src/sgml/ref/createdb.sgml                |  10 ++
 doc/src/sgml/ref/initdb.sgml                  |  10 ++
 src/backend/catalog/pg_collation.c            |   5 +
 src/backend/commands/collationcmds.c          |  23 +++-
 src/backend/commands/dbcommands.c             |  51 ++++++++-
 src/backend/utils/adt/pg_locale.c             |  41 ++++++-
 src/backend/utils/init/postinit.c             |  11 +-
 src/bin/initdb/initdb.c                       |  15 ++-
 src/bin/pg_dump/pg_dump.c                     |  37 +++++++
 src/bin/psql/describe.c                       | 100 +++++++++++-------
 src/bin/scripts/createdb.c                    |  11 ++
 src/include/catalog/pg_collation.h            |   2 +
 src/include/catalog/pg_database.dat           |   2 +-
 src/include/catalog/pg_database.h             |   3 +
 src/include/utils/pg_locale.h                 |   1 +
 .../regress/expected/collate.icu.utf8.out     |  30 ++++++
 src/test/regress/expected/psql.out            |  18 ++--
 src/test/regress/sql/collate.icu.utf8.sql     |  13 +++
 21 files changed, 379 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
@@ -3106,6 +3115,15 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticurules</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation rules for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 136976165c..289f8147f1 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
     [ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
     [ PROVIDER = <replaceable>provider</replaceable>, ]
     [ DETERMINISTIC = <replaceable>boolean</replaceable>, ]
+    [ RULES = <replaceable>rules</replaceable>, ]
     [ VERSION = <replaceable>version</replaceable> ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
@@ -149,6 +150,19 @@ <title>Parameters</title>
      </listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><replaceable>rules</replaceable></term>
+
+     <listitem>
+      <para>
+       Specifies additional collation rules to customize the behavior of the
+       collation.  This is supported for ICU only.  See <ulink
+       url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+       for details on the syntax.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><replaceable>version</replaceable></term>
 
@@ -228,6 +242,14 @@ <title>Examples</title>
 </programlisting>
   </para>
 
+  <para>
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+<programlisting>
+<![CDATA[CREATE COLLATION en_custom (provider = icu, locale = 'en', rules = '&a < g');]]>
+</programlisting>
+  </para>
+
   <para>
    To create a collation from an existing collation:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 57d13e34c2..13793bb6b7 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -30,6 +30,7 @@
            [ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
            [ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
            [ ICU_LOCALE [=] <replaceable class="parameter">icu_locale</replaceable> ]
+           [ ICU_RULES [=] <replaceable class="parameter">icu_rules</replaceable> ]
            [ LOCALE_PROVIDER [=] <replaceable class="parameter">locale_provider</replaceable> ]
            [ COLLATION_VERSION = <replaceable>collation_version</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
@@ -192,6 +193,19 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="create-database-icu-rules">
+      <term><replaceable class="parameter">icu_rules</replaceable></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        default collation of this database.  This is supported for ICU only.
+        See <ulink
+        url="https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+        for details on the syntax.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="create-database-locale-provider">
       <term><replaceable>locale_provider</replaceable></term>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 671cd362d9..e23419ba6c 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -157,6 +157,16 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-rules=<replaceable class="parameter">rules</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        default collation of this database.  This is supported for ICU only.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 5b2bdac101..c96164195d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -239,6 +239,16 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry id="app-initdb-icu-rules">
+      <term><option>--icu-rules=<replaceable>rules</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies additional collation rules to customize the behavior of the
+        default collation.  This is supported for ICU only.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
 				const char *colliculocale,
+				const char *collicurules,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -194,6 +195,10 @@ CollationCreate(const char *collname, Oid collnamespace,
 		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
 	else
 		nulls[Anum_pg_collation_colliculocale - 1] = true;
+	if (collicurules)
+		values[Anum_pg_collation_collicurules - 1] = CStringGetTextDatum(collicurules);
+	else
+		nulls[Anum_pg_collation_collicurules - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index eb62d285ea..c51e3afdb4 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -64,10 +64,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *lcctypeEl = NULL;
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
+	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
+	char	   *collicurules;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -99,6 +101,8 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 			defelp = &providerEl;
 		else if (strcmp(defel->defname, "deterministic") == 0)
 			defelp = &deterministicEl;
+		else if (strcmp(defel->defname, "rules") == 0)
+			defelp = &rulesEl;
 		else if (strcmp(defel->defname, "version") == 0)
 			defelp = &versionEl;
 		else
@@ -161,6 +165,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			colliculocale = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+		if (!isnull)
+			collicurules = TextDatumGetCString(datum);
+		else
+			collicurules = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -182,6 +192,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		collcollate = NULL;
 		collctype = NULL;
 		colliculocale = NULL;
+		collicurules = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -191,6 +202,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collisdeterministic = true;
 
+		if (rulesEl)
+			collicurules = defGetString(rulesEl);
+
 		if (versionEl)
 			collversion = defGetString(versionEl);
 
@@ -297,6 +311,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collcollate,
 							 collctype,
 							 colliculocale,
+							 collicurules,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -680,7 +695,7 @@ create_collation_from_locale(const char *locale, int nspid,
 	 */
 	collid = CollationCreate(locale, nspid, GetUserId(),
 							 COLLPROVIDER_LIBC, true, enc,
-							 locale, locale, NULL,
+							 locale, locale, NULL, NULL,
 							 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 							 true, true);
 	if (OidIsValid(collid))
@@ -755,7 +770,7 @@ win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
 
 		collid = CollationCreate(alias, param->nspid, GetUserId(),
 								 COLLPROVIDER_LIBC, true, enc,
-								 localebuf, localebuf, NULL,
+								 localebuf, localebuf, NULL, NULL,
 								 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 								 true, true);
 		if (OidIsValid(collid))
@@ -889,7 +904,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale, NULL,
+									 locale, locale, NULL, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -951,7 +966,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 NULL, NULL, iculocstr,
+									 NULL, NULL, iculocstr, NULL,
 									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index a0259cc593..7063a5a7ed 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -119,6 +119,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char **dbIcurules,
 						char *dbLocProvider,
 						char **dbCollversion);
 static void remove_dbtablespaces(Oid db_id);
@@ -675,6 +676,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
 	char	   *src_iculocale = NULL;
+	char	   *src_icurules = NULL;
 	char		src_locprovider = '\0';
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
@@ -698,6 +700,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
 	DefElem    *diculocale = NULL;
+	DefElem    *dicurules = NULL;
 	DefElem    *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
@@ -710,6 +713,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
+	char	   *dbicurules = NULL;
 	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
@@ -775,6 +779,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			diculocale = defel;
 		}
+		else if (strcmp(defel->defname, "icu_rules") == 0)
+		{
+			if (dicurules)
+				errorConflictingDefElem(defel, pstate);
+			dicurules = defel;
+		}
 		else if (strcmp(defel->defname, "locale_provider") == 0)
 		{
 			if (dlocprovider)
@@ -893,6 +903,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbctype = defGetString(dctype);
 	if (diculocale && diculocale->arg)
 		dbiculocale = defGetString(diculocale);
+	if (dicurules && dicurules->arg)
+		dbicurules = defGetString(dicurules);
 	if (dlocprovider && dlocprovider->arg)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
@@ -958,7 +970,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collate, &src_ctype, &src_iculocale, &src_icurules, &src_locprovider,
 					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
@@ -1006,6 +1018,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbiculocale = src_iculocale;
+	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbicurules = src_icurules;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -1097,6 +1111,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		if (dblocprovider == COLLPROVIDER_ICU)
 		{
+			char	   *val1;
+			char	   *val2;
+
 			Assert(dbiculocale);
 			Assert(src_iculocale);
 			if (strcmp(dbiculocale, src_iculocale) != 0)
@@ -1105,6 +1122,19 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
 								dbiculocale, src_iculocale),
 						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+
+			val1 = dbicurules;
+			if (!val1)
+				val1 = "";
+			val2 = src_icurules;
+			if (!val2)
+				val2 = "";
+			if (strcmp(val1, val2) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU collation rules (%s) are incompatible with the ICU collation rules of the template database (%s)",
+								val1, val2),
+						 errhint("Use the same ICU collation rules as in the template database, or use template0 as template.")));
 		}
 	}
 
@@ -1313,6 +1343,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
 	else
 		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
+	if (dbicurules)
+		new_record[Anum_pg_database_daticurules - 1] = CStringGetTextDatum(dbicurules);
+	else
+		new_record_nulls[Anum_pg_database_daticurules - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -1526,7 +1560,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1726,7 +1760,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1836,7 +1870,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -2599,6 +2633,7 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
 			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char **dbIcurules,
 			char *dbLocProvider,
 			char **dbCollversion)
 {
@@ -2715,6 +2750,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					else
 						*dbIculocale = TextDatumGetCString(datum);
 				}
+				if (dbIcurules)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticurules, &isnull);
+					if (isnull)
+						*dbIcurules = NULL;
+					else
+						*dbIcurules = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 4aa5eaa984..1d3d4d86d3 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -69,6 +69,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ustring.h>
 #endif
 
 #ifdef __GLIBC__
@@ -1421,6 +1422,7 @@ struct pg_locale_struct default_locale;
 
 void
 make_icu_collator(const char *iculocstr,
+				  const char *icurules,
 				  struct pg_locale_struct *resultp)
 {
 #ifdef USE_ICU
@@ -1437,6 +1439,35 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	/*
+	 * If rules are specified, we extract the rules of the standard collation,
+	 * add our own rules, and make a new collator with the combined rules.
+	 */
+	if (icurules)
+	{
+		const UChar *default_rules;
+		UChar	   *agg_rules;
+		UChar	   *my_rules;
+		int32_t		length;
+
+		default_rules = ucol_getRules(collator, &length);
+		icu_to_uchar(&my_rules, icurules, strlen(icurules));
+
+		agg_rules = palloc_array(UChar, u_strlen(default_rules) + u_strlen(my_rules) + 1);
+		u_strcpy(agg_rules, default_rules);
+		u_strcat(agg_rules, my_rules);
+
+		ucol_close(collator);
+
+		status = U_ZERO_ERROR;
+		collator = ucol_openRules(agg_rules, u_strlen(agg_rules),
+								  UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH, NULL, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\" with rules \"%s\": %s",
+							iculocstr, icurules, u_errorName(status))));
+	}
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1608,11 +1639,19 @@ pg_newlocale_from_collation(Oid collid)
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
 			const char *iculocstr;
+			const char *icurules;
 
 			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
 			Assert(!isnull);
 			iculocstr = TextDatumGetCString(datum);
-			make_icu_collator(iculocstr, &result);
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicurules, &isnull);
+			if (!isnull)
+				icurules = TextDatumGetCString(datum);
+			else
+				icurules = NULL;
+
+			make_icu_collator(iculocstr, icurules, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 2f07ca7a0e..b0e20cc635 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -421,10 +421,19 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 	if (dbform->datlocprovider == COLLPROVIDER_ICU)
 	{
+		char	   *icurules;
+
 		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
 		Assert(!isnull);
 		iculocale = TextDatumGetCString(datum);
-		make_icu_collator(iculocale, &default_locale);
+
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticurules, &isnull);
+		if (!isnull)
+			icurules = TextDatumGetCString(datum);
+		else
+			icurules = NULL;
+
+		make_icu_collator(iculocale, icurules, &default_locale);
 	}
 	else
 		iculocale = NULL;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7a58c33ace..5e3c6a27c4 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -135,6 +135,7 @@ static char *lc_time = NULL;
 static char *lc_messages = NULL;
 static char locale_provider = COLLPROVIDER_LIBC;
 static char *icu_locale = NULL;
+static char *icu_rules = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1312,7 +1313,10 @@ bootstrap_template1(void)
 							  escape_quotes_bki(lc_ctype));
 
 	bki_lines = replace_token(bki_lines, "ICU_LOCALE",
-							  locale_provider == COLLPROVIDER_ICU ? escape_quotes_bki(icu_locale) : "_null_");
+							  icu_locale ? escape_quotes_bki(icu_locale) : "_null_");
+
+	bki_lines = replace_token(bki_lines, "ICU_RULES",
+							  icu_rules ? escape_quotes_bki(icu_rules) : "_null_");
 
 	sprintf(buf, "%c", locale_provider);
 	bki_lines = replace_token(bki_lines, "LOCALE_PROVIDER", buf);
@@ -2107,6 +2111,7 @@ usage(const char *progname)
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
 	printf(_("      --icu-locale=LOCALE   set ICU locale ID for new databases\n"));
+	printf(_("      --icu-rules=RULES     set additional ICU collation rules for new databases\n"));
 	printf(_("  -k, --data-checksums      use data page checksums\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
@@ -2767,6 +2772,7 @@ main(int argc, char *argv[])
 		{"discard-caches", no_argument, NULL, 14},
 		{"locale-provider", required_argument, NULL, 15},
 		{"icu-locale", required_argument, NULL, 16},
+		{"icu-rules", required_argument, NULL, 17},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -2924,6 +2930,9 @@ main(int argc, char *argv[])
 			case 16:
 				icu_locale = pg_strdup(optarg);
 				break;
+			case 17:
+				icu_rules = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -2954,6 +2963,10 @@ main(int argc, char *argv[])
 		pg_fatal("%s cannot be specified unless locale provider \"%s\" is chosen",
 				 "--icu-locale", "icu");
 
+	if (icu_rules && locale_provider != COLLPROVIDER_ICU)
+		pg_fatal("%s cannot be specified unless locale provider \"%s\" is chosen",
+				 "--icu-rules", "icu");
+
 	atexit(cleanup_directories_atexit);
 
 	/* If we only need to fsync, just do it and exit */
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 74d806c77b..4217908f84 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2843,6 +2843,7 @@ dumpDatabase(Archive *fout)
 				i_collate,
 				i_ctype,
 				i_daticulocale,
+				i_daticurules,
 				i_frozenxid,
 				i_minmxid,
 				i_datacl,
@@ -2861,6 +2862,7 @@ dumpDatabase(Archive *fout)
 			   *collate,
 			   *ctype,
 			   *iculocale,
+			   *icurules,
 			   *datistemplate,
 			   *datconnlimit,
 			   *tablespace;
@@ -2887,6 +2889,10 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(dbQry, "datlocprovider, daticulocale, datcollversion, ");
 	else
 		appendPQExpBufferStr(dbQry, "'c' AS datlocprovider, NULL AS daticulocale, NULL AS datcollversion, ");
+	if (fout->remoteVersion >= 160000)
+		appendPQExpBufferStr(dbQry, "daticurules, ");
+	else
+		appendPQExpBufferStr(dbQry, "NULL AS daticurules, ");
 	appendPQExpBufferStr(dbQry,
 						 "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, "
 						 "shobj_description(oid, 'pg_database') AS description "
@@ -2904,6 +2910,7 @@ dumpDatabase(Archive *fout)
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
 	i_daticulocale = PQfnumber(res, "daticulocale");
+	i_daticurules = PQfnumber(res, "daticurules");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
 	i_minmxid = PQfnumber(res, "datminmxid");
 	i_datacl = PQfnumber(res, "datacl");
@@ -2925,6 +2932,10 @@ dumpDatabase(Archive *fout)
 		iculocale = PQgetvalue(res, 0, i_daticulocale);
 	else
 		iculocale = NULL;
+	if (!PQgetisnull(res, 0, i_daticurules))
+		icurules = PQgetvalue(res, 0, i_daticurules);
+	else
+		icurules = NULL;
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
 	minmxid = atooid(PQgetvalue(res, 0, i_minmxid));
 	dbdacl.acl = PQgetvalue(res, 0, i_datacl);
@@ -2990,6 +3001,11 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ICU_LOCALE = ");
 		appendStringLiteralAH(creaQry, iculocale, fout);
 	}
+	if (icurules)
+	{
+		appendPQExpBufferStr(creaQry, " ICU_RULES = ");
+		appendStringLiteralAH(creaQry, icurules, fout);
+	}
 
 	/*
 	 * For binary upgrade, carry over the collation version.  For normal
@@ -13153,10 +13169,12 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	int			i_collcollate;
 	int			i_collctype;
 	int			i_colliculocale;
+	int			i_collicurules;
 	const char *collprovider;
 	const char *collcollate;
 	const char *collctype;
 	const char *colliculocale;
+	const char *collicurules;
 
 	/* Do nothing in data-only dump */
 	if (dopt->dataOnly)
@@ -13194,6 +13212,13 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 		appendPQExpBufferStr(query,
 							 "NULL AS colliculocale, ");
 
+	if (fout->remoteVersion >= 160000)
+		appendPQExpBufferStr(query,
+							 "collicurules, ");
+	else
+		appendPQExpBufferStr(query,
+							 "NULL AS collicurules, ");
+
 	appendPQExpBuffer(query,
 					  "collcollate, "
 					  "collctype "
@@ -13208,6 +13233,7 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	i_collcollate = PQfnumber(res, "collcollate");
 	i_collctype = PQfnumber(res, "collctype");
 	i_colliculocale = PQfnumber(res, "colliculocale");
+	i_collicurules = PQfnumber(res, "collicurules");
 
 	collprovider = PQgetvalue(res, 0, i_collprovider);
 
@@ -13226,6 +13252,11 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	else
 		colliculocale = NULL;
 
+	if (!PQgetisnull(res, 0, i_collicurules))
+		collicurules = PQgetvalue(res, 0, i_collicurules);
+	else
+		collicurules = NULL;
+
 	appendPQExpBuffer(delq, "DROP COLLATION %s;\n",
 					  fmtQualifiedDumpable(collinfo));
 
@@ -13271,6 +13302,12 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 		}
 	}
 
+	if (collicurules)
+	{
+		appendPQExpBufferStr(q, ", rules = ");
+		appendStringLiteralAH(q, collicurules, fout);
+	}
+
 	/*
 	 * For binary upgrade, carry over the collation version.  For normal
 	 * dump/restore, omit the version, so that it is computed upon restore.
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 2084f5ccda..99e28f607e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -923,38 +923,52 @@ listAllDbs(const char *pattern, bool verbose)
 	initPQExpBuffer(&buf);
 
 	printfPQExpBuffer(&buf,
-					  "SELECT d.datname as \"%s\",\n"
-					  "       pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
-					  "       pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n"
-					  "       d.datcollate as \"%s\",\n"
-					  "       d.datctype as \"%s\",\n",
+					  "SELECT\n"
+					  "  d.datname as \"%s\",\n"
+					  "  pg_catalog.pg_get_userbyid(d.datdba) as \"%s\",\n"
+					  "  pg_catalog.pg_encoding_to_char(d.encoding) as \"%s\",\n",
 					  gettext_noop("Name"),
 					  gettext_noop("Owner"),
-					  gettext_noop("Encoding"),
-					  gettext_noop("Collate"),
-					  gettext_noop("Ctype"));
+					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "       d.daticulocale as \"%s\",\n"
-						  "       CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
-						  gettext_noop("ICU Locale"),
+						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
-						  "       NULL as \"%s\",\n"
-						  "       'libc' AS \"%s\",\n",
-						  gettext_noop("ICU Locale"),
+						  "  'libc' AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
-	appendPQExpBufferStr(&buf, "       ");
+	appendPQExpBuffer(&buf,
+					  "  d.datcollate as \"%s\",\n"
+					  "  d.datctype as \"%s\",\n",
+					  gettext_noop("Collate"),
+					  gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "  d.daticulocale as \"%s\",\n",
+						  gettext_noop("ICU Locale"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  NULL as \"%s\",\n",
+						  gettext_noop("ICU Locale"));
+	if (pset.sversion >= 160000)
+		appendPQExpBuffer(&buf,
+						  "  d.daticurules as \"%s\",\n",
+						  gettext_noop("ICU Rules"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  NULL as \"%s\",\n",
+						  gettext_noop("ICU Rules"));
+	appendPQExpBufferStr(&buf, "  ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')\n"
-						  "            THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))\n"
-						  "            ELSE 'No Access'\n"
-						  "       END as \"%s\""
-						  ",\n       t.spcname as \"%s\""
-						  ",\n       pg_catalog.shobj_description(d.oid, 'pg_database') as \"%s\"",
+						  ",\n  CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')\n"
+						  "       THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))\n"
+						  "       ELSE 'No Access'\n"
+						  "  END as \"%s\""
+						  ",\n  t.spcname as \"%s\""
+						  ",\n  pg_catalog.shobj_description(d.oid, 'pg_database') as \"%s\"",
 						  gettext_noop("Size"),
 						  gettext_noop("Tablespace"),
 						  gettext_noop("Description"));
@@ -4849,52 +4863,64 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, false, false, true, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, false, true, false};
 
 	initPQExpBuffer(&buf);
 
 	printfPQExpBuffer(&buf,
-					  "SELECT n.nspname AS \"%s\",\n"
-					  "       c.collname AS \"%s\",\n"
-					  "       c.collcollate AS \"%s\",\n"
-					  "       c.collctype AS \"%s\"",
+					  "SELECT\n"
+					  "  n.nspname AS \"%s\",\n"
+					  "  c.collname AS \"%s\",\n",
 					  gettext_noop("Schema"),
-					  gettext_noop("Name"),
+					  gettext_noop("Name"));
+
+	if (pset.sversion >= 100000)
+		appendPQExpBuffer(&buf,
+						  "  CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "  'libc' AS \"%s\",\n",
+						  gettext_noop("Provider"));
+
+	appendPQExpBuffer(&buf,
+					  "  c.collcollate AS \"%s\",\n"
+					  "  c.collctype AS \"%s\",\n",
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
 
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  ",\n       c.colliculocale AS \"%s\"",
+						  "  c.colliculocale AS \"%s\",\n",
 						  gettext_noop("ICU Locale"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       c.collcollate AS \"%s\"",
+						  "  c.collcollate AS \"%s\",\n",
 						  gettext_noop("ICU Locale"));
 
-	if (pset.sversion >= 100000)
+	if (pset.sversion >= 160000)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",
-						  gettext_noop("Provider"));
+						  "  c.collicurules AS \"%s\",\n",
+						  gettext_noop("ICU Rules"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       'libc' AS \"%s\"",
-						  gettext_noop("Provider"));
+						  "  NULL AS \"%s\",\n",
+						  gettext_noop("ICU Rules"));
 
 	if (pset.sversion >= 120000)
 		appendPQExpBuffer(&buf,
-						  ",\n       CASE WHEN c.collisdeterministic THEN '%s' ELSE '%s' END AS \"%s\"",
+						  "  CASE WHEN c.collisdeterministic THEN '%s' ELSE '%s' END AS \"%s\"",
 						  gettext_noop("yes"), gettext_noop("no"),
 						  gettext_noop("Deterministic?"));
 	else
 		appendPQExpBuffer(&buf,
-						  ",\n       '%s' AS \"%s\"",
+						  "  '%s' AS \"%s\"",
 						  gettext_noop("yes"),
 						  gettext_noop("Deterministic?"));
 
 	if (verbose)
 		appendPQExpBuffer(&buf,
-						  ",\n       pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
+						  ",\n  pg_catalog.obj_description(c.oid, 'pg_collation') AS \"%s\"",
 						  gettext_noop("Description"));
 
 	appendPQExpBufferStr(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 80859dadc4..b4205c4fa5 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -41,6 +41,7 @@ main(int argc, char *argv[])
 		{"maintenance-db", required_argument, NULL, 3},
 		{"locale-provider", required_argument, NULL, 4},
 		{"icu-locale", required_argument, NULL, 5},
+		{"icu-rules", required_argument, NULL, 6},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -67,6 +68,7 @@ main(int argc, char *argv[])
 	char	   *locale = NULL;
 	char	   *locale_provider = NULL;
 	char	   *icu_locale = NULL;
+	char	   *icu_rules = NULL;
 
 	PQExpBufferData sql;
 
@@ -134,6 +136,9 @@ main(int argc, char *argv[])
 			case 5:
 				icu_locale = pg_strdup(optarg);
 				break;
+			case 6:
+				icu_rules = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -231,6 +236,11 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(&sql, " ICU_LOCALE ");
 		appendStringLiteralConn(&sql, icu_locale, conn);
 	}
+	if (icu_rules)
+	{
+		appendPQExpBufferStr(&sql, " ICU_RULES ");
+		appendStringLiteralConn(&sql, icu_rules, conn);
+	}
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -288,6 +298,7 @@ help(const char *progname)
 	printf(_("      --lc-collate=LOCALE      LC_COLLATE setting for the database\n"));
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
 	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
+	printf(_("      --icu-rules=RULES        ICU rules setting for the database\n"));
 	printf(_("      --locale-provider={libc|icu}\n"
 			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index f9d5e88faf..bfa3568451 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -43,6 +43,7 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
 	text		collctype BKI_DEFAULT(_null_);	/* LC_CTYPE setting */
 	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
+	text		collicurules BKI_DEFAULT(_null_);	/* ICU collation rules */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -91,6 +92,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
 							const char *colliculocale,
+							const char *collicurules,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index 4b1bc29363..68dcac1a6e 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -18,6 +18,6 @@
   datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
   datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', daticulocale => 'ICU_LOCALE', datacl => '_null_' },
+  datctype => 'LC_CTYPE', daticulocale => 'ICU_LOCALE', daticurules => 'ICU_RULES', datacl => '_null_' },
 
 ]
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 3da3492e92..d004f4dc8a 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -71,6 +71,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* ICU locale ID */
 	text		daticulocale;
 
+	/* ICU collation rules */
+	text		daticurules;
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b8f22875a8..f9ce428233 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -95,6 +95,7 @@ typedef struct pg_locale_struct *pg_locale_t;
 extern PGDLLIMPORT struct pg_locale_struct default_locale;
 
 extern void make_icu_collator(const char *iculocstr,
+							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
 extern bool pg_locale_deterministic(pg_locale_t locale);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 4354dc07b8..2ac707b362 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1190,6 +1190,36 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
  t        | t
 (1 row)
 
+-- rules
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+     a     
+-----------
+ Abernathy
+ apple
+ bird
+ Boston
+ Graham
+ green
+(6 rows)
+
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+     a     
+-----------
+ Abernathy
+ apple
+ green
+ bird
+ Boston
+ Graham
+(6 rows)
+
+DROP TABLE test7;
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+ERROR:  could not open collator for locale "" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index b75a74d294..c00e28361c 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -6199,9 +6199,9 @@ List of schemas
 (0 rows)
 
 \dO "no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp "no.such.access.privilege"
@@ -6388,9 +6388,9 @@ cross-database references are not implemented: "no.such.schema"."no.such.languag
 (0 rows)
 
 \dO "no.such.schema"."no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp "no.such.schema"."no.such.access.privilege"
@@ -6531,9 +6531,9 @@ List of text search templates
 (0 rows)
 
 \dO regression."no.such.schema"."no.such.collation"
-                            List of collations
- Schema | Name | Collate | Ctype | ICU Locale | Provider | Deterministic? 
---------+------+---------+-------+------------+----------+----------------
+                                  List of collations
+ Schema | Name | Provider | Collate | Ctype | ICU Locale | ICU Rules | Deterministic? 
+--------+------+----------+---------+-------+------------+-----------+----------------
 (0 rows)
 
 \dp regression."no.such.schema"."no.such.access.privilege"
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b0ddc7db44..aa95c1ec42 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -472,6 +472,19 @@ CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=p
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
 
 
+-- rules
+
+CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
+CREATE TABLE test7 (a text);
+-- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
+INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
+SELECT * FROM test7 ORDER BY a COLLATE "en-x-icu";
+SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
+DROP TABLE test7;
+
+CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
+
+
 -- nondeterministic collations
 
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);

base-commit: b6a0d469cae4410a05b5e109748278065a931b68
-- 
2.39.2

#15

Jeff Davis

pgsql@j-davis.com

almost 3 years ago

In reply to: Peter Eisentraut (#14)

Re: Allow tailoring of ICU locales with custom rules

On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:

You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules='&a < c < b < e < d'

;-)

Would we be the first major database to support custom collation rules?
This sounds useful for testing, experimentation, hacking, etc.

What are some of the use cases? Is it helpful to comply with unusual or
outdated standards or formats? Maybe there are people using special
delimiters/terminators and they need them to be treated a certain way
during comparisons?

Regards,
Jeff Davis

#16

Laurenz Albe

laurenz.albe@cybertec.at

almost 3 years ago

In reply to: Jeff Davis (#15)

Re: Allow tailoring of ICU locales with custom rules

On Tue, 2023-03-07 at 22:06 -0800, Jeff Davis wrote:

On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:

You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules='&a < c < b < e < d'

;-)

Would we be the first major database to support custom collation rules?
This sounds useful for testing, experimentation, hacking, etc.

What are some of the use cases? Is it helpful to comply with unusual or
outdated standards or formats? Maybe there are people using special
delimiters/terminators and they need them to be treated a certain way
during comparisons?

I regularly see complaints about the sort order; recently this one:
/messages/by-id/CAFCRh--xt-J8awOavhB216kom6TQnaP35TTVEQQS5bHH7gMemQ@mail.gmail.com

So being able to influence the sort order is useful.

Yours,
Laurenz Albe

#17

Laurenz Albe

laurenz.albe@cybertec.at

almost 3 years ago

In reply to: Peter Eisentraut (#14)

Re: Allow tailoring of ICU locales with custom rules

On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:

Ok, here is a version with an initdb option and also a createdb option
(to expose the CREATE DATABASE option).

You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules='&a < c < b < e < d'

Looks good. I cannot get it to misbehave, "make check-world" is successful
(the regression tests misbehave in interesting ways when running
"make installcheck" on a cluster created with non-standard ICU rules, but
that can be expected).

I checked the documentation, tested "pg_dump" support, everything fine.

I'll mark it as "ready for committer".

Yours,
Laurenz Albe

#18

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Laurenz Albe (#17)

Re: Allow tailoring of ICU locales with custom rules

On 08.03.23 15:18, Laurenz Albe wrote:

On Fri, 2023-03-03 at 13:45 +0100, Peter Eisentraut wrote:

Ok, here is a version with an initdb option and also a createdb option
(to expose the CREATE DATABASE option).

You can mess with people by setting up your databases like this:

initdb -D data --locale-provider=icu --icu-rules='&a < c < b < e < d'

Looks good. I cannot get it to misbehave, "make check-world" is successful
(the regression tests misbehave in interesting ways when running
"make installcheck" on a cluster created with non-standard ICU rules, but
that can be expected).

I checked the documentation, tested "pg_dump" support, everything fine.

I'll mark it as "ready for committer".

committed