ICU for global collation

Started by Peter Eisentrautover 6 years ago111 messages

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 6 years ago

1 attachment(s)

Here is an initial patch to add the option to use ICU as the global
collation provider, a long-requested feature.

To activate, use something like

initdb --collation-provider=icu --locale=...

A trick here is that since we need to also still set the normal POSIX
locales, the --locale value needs to be valid as both a POSIX locale and
a ICU locale. If that doesn't work out, there is also a way to specify
it separately, e.g.,

initdb --collation-provider=icu --locale=en_US.utf8 --icu-locale=en

This complexity is unfortunate, but I don't see a way around it right now.

There are also options for createdb and CREATE DATABASE to do this for a
particular database only.

Besides this, the implementation is quite small: When starting up a
database, we create an ICU collator object, store it in a global
variable, and then use it when appropriate. All the ICU code for
creating and invoking those collators already exists of course.

For the version tracking, I use the pg_collation row for the "default"
collation. Again, this mostly reuses existing code and concepts.

Nondeterministic collations are not supported for the global collation,
because then LIKE and regular expressions don't work and that breaks
some system views. This needs some separate research.

To test, run the existing regression tests against a database
initialized with ICU. Perhaps some options for pg_regress could
facilitate that.

I fear that the Localization chapter in the documentation will need a
bit of a rewrite after this, because the hitherto separately treated
concepts of locale and collation are fusing together. I haven't done
that here yet, but that would be the plan for later.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v1-0001-Add-option-to-use-ICU-as-global-collation-provide.patchtext/plain; charset=UTF-8; name=v1-0001-Add-option-to-use-ICU-as-global-collation-provide.patch; x-mac-creator=0; x-mac-type=0Download

From 071a955d44b588d4633030e7a7d06c5cfb4ff838 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 20 Aug 2019 16:02:46 +0200
Subject: [PATCH v1] Add option to use ICU as global collation provider

This adds the option to use ICU as the default collation provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.
---
 doc/src/sgml/ref/createdb.sgml       |   9 ++
 doc/src/sgml/ref/initdb.sgml         |  23 ++++
 src/backend/access/hash/hashfunc.c   |  18 ++-
 src/backend/commands/dbcommands.c    |  52 ++++++++-
 src/backend/regex/regc_pg_locale.c   |   7 +-
 src/backend/utils/adt/formatting.c   |   6 +
 src/backend/utils/adt/like.c         |  20 +++-
 src/backend/utils/adt/like_support.c |   2 +
 src/backend/utils/adt/pg_locale.c    | 168 ++++++++++++++++-----------
 src/backend/utils/adt/varchar.c      |  22 +++-
 src/backend/utils/adt/varlena.c      |  26 ++++-
 src/backend/utils/init/postinit.c    |  21 ++++
 src/bin/initdb/Makefile              |   2 +
 src/bin/initdb/initdb.c              |  63 ++++++++--
 src/bin/initdb/t/001_initdb.pl       |  18 ++-
 src/bin/pg_dump/pg_dump.c            |  16 +++
 src/bin/psql/describe.c              |   8 ++
 src/bin/scripts/Makefile             |   2 +
 src/bin/scripts/createdb.c           |   9 ++
 src/bin/scripts/t/020_createdb.pl    |  19 ++-
 src/include/catalog/pg_database.dat  |   2 +-
 src/include/catalog/pg_database.h    |   3 +
 src/include/utils/pg_locale.h        |   6 +
 23 files changed, 417 insertions(+), 105 deletions(-)

diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 8fc8128bf9..5b73afad91 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -85,6 +85,15 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        Specifies the collation provider for the database.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">tablespace</replaceable></option></term>
       <term><option>--tablespace=<replaceable class="parameter">tablespace</replaceable></option></term>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..9ad7b2e112 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -165,6 +165,18 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        This option sets the collation provider for databases created in the
+        new cluster.  It can be overridden in the <command>CREATE
+        DATABASE</command> command when new databases are subsequently
+        created.  The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">directory</replaceable></option></term>
       <term><option>--pgdata=<replaceable class="parameter">directory</replaceable></option></term>
@@ -209,6 +221,17 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU collation provider is used.  If
+        this is not specified, the value from the <option>--locale</option>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index 6ec1ec3df3..2f8f220549 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -255,8 +255,13 @@ hashtext(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -311,8 +316,13 @@ hashtextextended(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 95881a8550..a00f08682d 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -35,6 +35,7 @@
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_subscription.h"
@@ -86,7 +87,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 						MultiXactId *dbMinMultiP,
-						Oid *dbTablespace, char **dbCollate, char **dbCtype);
+						Oid *dbTablespace, char **dbCollate, char **dbCtype,
+						char *dbCollProvider);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -106,6 +108,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	int			src_encoding;
 	char	   *src_collate;
 	char	   *src_ctype;
+	char		src_collprovider;
 	bool		src_istemplate;
 	bool		src_allowconn;
 	Oid			src_lastsysoid;
@@ -127,6 +130,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dlocale = NULL;
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
+	DefElem	   *dcollprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
@@ -135,6 +139,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	const char *dbtemplate = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
+	char		dbcollprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
 	bool		dbistemplate = false;
@@ -212,6 +217,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 parser_errposition(pstate, defel->location)));
 			dctype = defel;
 		}
+		else if (strcmp(defel->defname, "collation_provider") == 0)
+		{
+			if (dcollprovider)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			dcollprovider = defel;
+		}
 		else if (strcmp(defel->defname, "is_template") == 0)
 		{
 			if (distemplate)
@@ -301,6 +315,23 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = defGetString(dcollate);
 	if (dctype && dctype->arg)
 		dbctype = defGetString(dctype);
+	if (dcollprovider && dcollprovider->arg)
+	{
+		char	   *collproviderstr = defGetString(dcollprovider);
+
+#ifdef USE_ICU
+		if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			dbcollprovider = COLLPROVIDER_ICU;
+		else
+#endif
+		if (pg_strcasecmp(collproviderstr, "libc") == 0)
+			dbcollprovider = COLLPROVIDER_LIBC;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("unrecognized collation provider: %s",
+							collproviderstr)));
+	}
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -350,7 +381,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn, &src_lastsysoid,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype))
+					 &src_collate, &src_ctype, &src_collprovider))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -376,6 +407,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = src_collate;
 	if (dbctype == NULL)
 		dbctype = src_ctype;
+	if (dbcollprovider == '\0')
+		dbcollprovider = src_collprovider;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -383,6 +416,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 				 errmsg("invalid server encoding %d", encoding)));
 
+	if (dbcollprovider == COLLPROVIDER_LIBC)
+	{
 	/* Check that the chosen locales are valid, and get canonical spellings */
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
@@ -396,6 +431,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
+	}
 
 	/*
 	 * Check that the new encoding and locale settings match the source
@@ -559,6 +595,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbcollate));
 	new_record[Anum_pg_database_datctype - 1] =
 		DirectFunctionCall1(namein, CStringGetDatum(dbctype));
+	new_record[Anum_pg_database_datcollprovider - 1] = CharGetDatum(dbcollprovider);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -832,7 +869,7 @@ dropdb(const char *dbname, bool missing_ok)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1023,7 +1060,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1136,7 +1173,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -1768,7 +1805,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 			MultiXactId *dbMinMultiP,
-			Oid *dbTablespace, char **dbCollate, char **dbCtype)
+			Oid *dbTablespace, char **dbCollate, char **dbCtype,
+			char *dbCollProvider)
 {
 	bool		result = false;
 	Relation	relation;
@@ -1865,6 +1903,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					*dbCollate = pstrdup(NameStr(dbform->datcollate));
 				if (dbCtype)
 					*dbCtype = pstrdup(NameStr(dbform->datctype));
+				if (dbCollProvider)
+					*dbCollProvider = dbform->datcollprovider;
 				ReleaseSysCache(tuple);
 				result = true;
 				break;
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 4a808b7606..510bd71371 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -241,7 +241,12 @@ pg_set_regex_collation(Oid collation)
 	else
 	{
 		if (collation == DEFAULT_COLLATION_OID)
-			pg_regex_locale = 0;
+		{
+			if (global_locale.provider == COLLPROVIDER_ICU)
+				pg_regex_locale = &global_locale;
+			else
+				pg_regex_locale = 0;
+		}
 		else if (OidIsValid(collation))
 		{
 			/*
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b3115e4bea..bb5a992b9f 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1570,6 +1570,8 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1693,6 +1695,8 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1817,6 +1821,8 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 13d5cb083c..57dd3fe59d 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -150,9 +150,14 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
 static inline int
 GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation)
 {
-	if (collation && !lc_ctype_is_c(collation) && collation != DEFAULT_COLLATION_OID)
+	if (collation && !lc_ctype_is_c(collation))
 	{
-		pg_locale_t locale = pg_newlocale_from_collation(collation);
+		pg_locale_t locale = 0;
+
+		if (collation != DEFAULT_COLLATION_OID)
+			locale = pg_newlocale_from_collation(collation);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 
 		if (locale && !locale->deterministic)
 			ereport(ERROR,
@@ -195,11 +200,14 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		}
 		locale = pg_newlocale_from_collation(collation);
 
-		if (locale && !locale->deterministic)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("nondeterministic collations are not supported for ILIKE")));
 	}
+	else if (global_locale.provider == COLLPROVIDER_ICU)
+		locale = &global_locale;
+
+	if (locale && !locale->deterministic)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
 	 * For efficiency reasons, in the single byte case we don't call lower()
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index c8fec7863f..09a28aab8e 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -966,6 +966,8 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 			}
 			locale = pg_newlocale_from_collation(collation);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 	}
 
 	if (typeid != BYTEAOID)
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b2f08ead45..168890b6f6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1206,6 +1206,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_COLLATE, NULL);
@@ -1256,6 +1259,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_CTYPE, NULL);
@@ -1284,6 +1290,89 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+struct pg_locale_struct global_locale;
+
+void
+make_icu_collator(const char *collcollate, const char *collctype,
+				  struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+	UCollator  *collator;
+	UErrorCode	status;
+
+	if (strcmp(collcollate, collctype) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("collations with different collate and ctype values are not supported by ICU")));
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(collcollate, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open collator for locale \"%s\": %s",
+						collcollate, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+		icu_set_collation_attributes(collator, collcollate);
+
+	/* We will leak this string if we get an error below :-( */
+	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext,
+														   collcollate);
+	resultp->info.icu.ucol = collator;
+#else							/* not USE_ICU */
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build"), \
+			 errhint("You need to rebuild PostgreSQL using --with-icu.")));
+#endif							/* not USE_ICU */
+}
+
+void
+check_collation_version(HeapTuple colltuple)
+{
+	Form_pg_collation collform;
+	Datum		collversion;
+	bool		isnull;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+
+	collversion = SysCacheGetAttr(COLLOID, colltuple, Anum_pg_collation_collversion,
+								  &isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 NameStr(collform->collcollate));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE
+			 * COLLATION for a libc locale, or manually creating a mess in
+			 * the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was specified",
+							NameStr(collform->collname))));
+		}
+		collversionstr = TextDatumGetCString(collversion);
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
+	}
+}
 
 /* simple subroutine for reporting errors from newlocale() */
 #ifdef HAVE_LOCALE_T
@@ -1357,8 +1446,6 @@ pg_newlocale_from_collation(Oid collid)
 		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
-		Datum		collversion;
-		bool		isnull;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 		if (!HeapTupleIsValid(tp))
@@ -1428,72 +1515,10 @@ pg_newlocale_from_collation(Oid collid)
 		}
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
-			UCollator  *collator;
-			UErrorCode	status;
-
-			if (strcmp(collcollate, collctype) != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("collations with different collate and ctype values are not supported by ICU")));
-
-			status = U_ZERO_ERROR;
-			collator = ucol_open(collcollate, &status);
-			if (U_FAILURE(status))
-				ereport(ERROR,
-						(errmsg("could not open collator for locale \"%s\": %s",
-								collcollate, u_errorName(status))));
-
-			if (U_ICU_VERSION_MAJOR_NUM < 54)
-				icu_set_collation_attributes(collator, collcollate);
-
-			/* We will leak this string if we get an error below :-( */
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext,
-														 collcollate);
-			result.info.icu.ucol = collator;
-#else							/* not USE_ICU */
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build"), \
-					 errhint("You need to rebuild PostgreSQL using --with-icu.")));
-#endif							/* not USE_ICU */
+			make_icu_collator(collcollate, collctype, &result);
 		}
 
-		collversion = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-									  &isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION for a libc locale, or manually creating a mess in
-				 * the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was specified",
-								NameStr(collform->collname))));
-			}
-			collversionstr = TextDatumGetCString(collversion);
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
+		check_collation_version(tp);
 
 		ReleaseSysCache(tp);
 
@@ -1520,6 +1545,17 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion;
 
+	if (collprovider == COLLPROVIDER_DEFAULT)
+	{
+#ifdef USE_ICU
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			collversion = get_collation_actual_version(global_locale.provider,
+													   global_locale.info.icu.locale);
+		else
+#endif
+			collversion = NULL;
+	}
+	else
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 332dc860c4..703b87a7c2 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -751,7 +751,7 @@ bpchareq(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -789,7 +789,7 @@ bpcharne(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -995,8 +995,13 @@ hashbpchar(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -1055,8 +1060,13 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index fa08b55eb6..e59cf2d49e 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1156,8 +1156,13 @@ text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -1493,6 +1498,8 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 
 		if (collid != DEFAULT_COLLATION_OID)
 			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 		/*
 		 * memcmp() can't tell us which of two unequal strings sorts first,
@@ -1714,7 +1721,7 @@ texteq(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1768,7 +1775,7 @@ textne(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1880,8 +1887,13 @@ text_starts_with(PG_FUNCTION_ARGS)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -1996,6 +2008,8 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		 */
 		if (collid != DEFAULT_COLLATION_OID)
 			locale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 
 		/*
 		 * There is a further exception on Windows.  When the database
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 43b9f17f72..83a36f619d 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -31,6 +31,7 @@
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
@@ -404,6 +405,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	collate = NameStr(dbform->datcollate);
 	ctype = NameStr(dbform->datctype);
 
+	if (dbform->datcollprovider == COLLPROVIDER_LIBC)
+	{
 	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
 		ereport(FATAL,
 				(errmsg("database locale is incompatible with operating system"),
@@ -417,6 +420,24 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 				 errdetail("The database was initialized with LC_CTYPE \"%s\", "
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
+	}
+	else if (dbform->datcollprovider == COLLPROVIDER_ICU)
+	{
+		make_icu_collator(collate, ctype, &global_locale);
+	}
+
+	global_locale.provider = dbform->datcollprovider;
+	global_locale.deterministic = true;	// TODO
+
+	{
+		HeapTuple	tp;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(DEFAULT_COLLATION_OID));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", DEFAULT_COLLATION_OID);
+		check_collation_version(tp);
+		ReleaseSysCache(tp);
+	}
 
 	/* Make the locale settings visible as GUC variables, too */
 	SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_OVERRIDE);
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index 7c404430a9..a9335a8ba6 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -61,6 +61,8 @@ clean distclean maintainer-clean:
 # ensure that changes in datadir propagate into object file
 initdb.o: initdb.c $(top_builddir)/src/Makefile.global
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d9bd..62c310040a 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -129,6 +129,8 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static char collation_provider[] = {COLLPROVIDER_LIBC, '\0'};
+static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1412,10 +1414,13 @@ bootstrap_template1(void)
 							  encodingid_to_string(encodingid));
 
 	bki_lines = replace_token(bki_lines, "LC_COLLATE",
-							  escape_quotes_bki(lc_collate));
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : lc_collate));
 
 	bki_lines = replace_token(bki_lines, "LC_CTYPE",
-							  escape_quotes_bki(lc_ctype));
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : lc_ctype));
+
+	bki_lines = replace_token(bki_lines, "COLLPROVIDER",
+							  collation_provider);
 
 	/* Also ensure backend isn't confused by this environment var: */
 	unsetenv("PGCLIENTENCODING");
@@ -1707,6 +1712,12 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
+	/*
+	 * Set version of the default collation.
+	 */
+	PG_CMD_PRINTF("UPDATE pg_collation SET collversion = pg_collation_actual_version(oid) WHERE oid = %d;\n\n",
+				  DEFAULT_COLLATION_OID);
+
 	/*
 	 * Add an SQL-standard name.  We don't want to pin this, so it doesn't go
 	 * in pg_collation.h.  But add it before reading system collations, so
@@ -1995,8 +2006,6 @@ make_template0(FILE *cmdfd)
 {
 	const char *const *line;
 	static const char *const template0_setup[] = {
-		"CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n\n",
-
 		/*
 		 * We use the OID of template0 to determine datlastsysoid
 		 */
@@ -2021,6 +2030,9 @@ make_template0(FILE *cmdfd)
 		NULL
 	};
 
+	PG_CMD_PRINTF("CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false COLLATION_PROVIDER = %s;\n\n",
+				  collation_provider[0] == COLLPROVIDER_ICU ? "icu" : "libc");
+
 	for (line = template0_setup; *line; line++)
 		PG_CMD_PUTS(*line);
 }
@@ -2293,13 +2305,14 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale)
+			icu_locale = locale;
 	}
 
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
 	 */
-
 	check_locale_name(LC_CTYPE, lc_ctype, &canonname);
 	lc_ctype = canonname;
 	check_locale_name(LC_COLLATE, lc_collate, &canonname);
@@ -2318,6 +2331,18 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+
+	/*
+	 * If ICU is selected but no ICU locale has been given, take the
+	 * lc_collate locale and chop off any encoding suffix.  This should give
+	 * the user a configuration that resembles their operating system's locale
+	 * setup.
+	 */
+	if (collation_provider[0] == COLLPROVIDER_ICU && !icu_locale)
+	{
+		icu_locale = pg_strdup(lc_collate);
+		icu_locale[strcspn(icu_locale, ".")] = '\0';
+	}
 }
 
 /*
@@ -2333,9 +2358,12 @@ usage(const char *progname)
 	printf(_("  -A, --auth=METHOD         default authentication method for local connections\n"));
 	printf(_("      --auth-host=METHOD    default authentication method for local TCP/IP connections\n"));
 	printf(_("      --auth-local=METHOD   default authentication method for local-socket connections\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                            set default collation provider for new databases\n"));
 	printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
+	printf(_("      --icu-locale          set ICU locale for new databases\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
 			 "      --lc-monetary=, --lc-numeric=, --lc-time=LOCALE\n"
@@ -2510,7 +2538,8 @@ setup_locale_encoding(void)
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0)
+		strcmp(lc_ctype, lc_messages) == 0 &&
+		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
@@ -2527,9 +2556,13 @@ setup_locale_encoding(void)
 			   lc_monetary,
 			   lc_numeric,
 			   lc_time);
+		if (icu_locale)
+			printf(_("  ICU:      %s\n"), icu_locale);
 	}
 
-	if (!encoding)
+	if (!encoding && collation_provider[0] == COLLPROVIDER_ICU)
+		encodingid = PG_UTF8;
+	else if (!encoding)
 	{
 		int			ctype_enc;
 
@@ -3029,6 +3062,8 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"collation-provider", required_argument, NULL, 13},
+		{"icu-locale", required_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3167,6 +3202,20 @@ main(int argc, char *argv[])
 			case 'g':
 				SetDataDirectoryCreatePerm(PG_DIR_MODE_GROUP);
 				break;
+			case 13:
+				if (strcmp(optarg, "icu") == 0)
+					collation_provider[0] = COLLPROVIDER_ICU;
+				else if (strcmp(optarg, "libc") == 0)
+					collation_provider[0] = COLLPROVIDER_LIBC;
+				else
+				{
+					pg_log_error("unrecognized collation provider: %s", optarg);
+					exit(1);
+				}
+				break;
+			case 14:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 8387b945d3..90f6fc8f14 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -8,7 +8,7 @@
 use File::stat qw{lstat};
 use PostgresNode;
 use TestLib;
-use Test::More tests => 22;
+use Test::More tests => 24;
 
 my $tempdir = TestLib::tempdir;
 my $xlogdir = "$tempdir/pgxlog";
@@ -89,3 +89,19 @@
 	ok(check_mode_recursive($datadir_group, 0750, 0640),
 		'check PGDATA permissions');
 }
+
+# Collation provider tests
+
+if ($ENV{with_icu} eq 'yes')
+{
+	command_ok(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+			   'collation provider ICU');
+}
+else
+{
+	command_fails(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+				  'collation provider ICU fails since no ICU support');
+}
+
+command_fails(['initdb', '--no-sync', '--collation-provider=xyz', "$tempdir/dataX"],
+			  'fails for invalid collation provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 34981401bf..b1932c227f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2638,6 +2638,7 @@ dumpDatabase(Archive *fout)
 				i_datname,
 				i_dba,
 				i_encoding,
+				i_datcollprovider,
 				i_collate,
 				i_ctype,
 				i_frozenxid,
@@ -2652,6 +2653,7 @@ dumpDatabase(Archive *fout)
 	const char *datname,
 			   *dba,
 			   *encoding,
+			   *datcollprovider,
 			   *collate,
 			   *ctype,
 			   *datacl,
@@ -2680,6 +2682,7 @@ dumpDatabase(Archive *fout)
 		appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
 						  "(%s datdba) AS dba, "
 						  "pg_encoding_to_char(encoding) AS encoding, "
+						  "datcollprovider, "
 						  "datcollate, datctype, datfrozenxid, datminmxid, "
 						  "(SELECT array_agg(acl ORDER BY row_n) FROM "
 						  "  (SELECT acl, row_n FROM "
@@ -2772,6 +2775,7 @@ dumpDatabase(Archive *fout)
 	i_datname = PQfnumber(res, "datname");
 	i_dba = PQfnumber(res, "dba");
 	i_encoding = PQfnumber(res, "encoding");
+	i_datcollprovider = PQfnumber(res, "datcollprovider");
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
@@ -2787,6 +2791,7 @@ dumpDatabase(Archive *fout)
 	datname = PQgetvalue(res, 0, i_datname);
 	dba = PQgetvalue(res, 0, i_dba);
 	encoding = PQgetvalue(res, 0, i_encoding);
+	datcollprovider = PQgetvalue(res, 0, i_datcollprovider);
 	collate = PQgetvalue(res, 0, i_collate);
 	ctype = PQgetvalue(res, 0, i_ctype);
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
@@ -2812,6 +2817,17 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ENCODING = ");
 		appendStringLiteralAH(creaQry, encoding, fout);
 	}
+	if (strlen(datcollprovider) > 0)
+	{
+		appendPQExpBufferStr(creaQry, " COLLATION_PROVIDER = ");
+		if (datcollprovider[0] == 'c')
+			appendPQExpBufferStr(creaQry, "libc");
+		else if (datcollprovider[0] == 'i')
+			appendPQExpBufferStr(creaQry, "icu");
+		else
+			fatal("unrecognized collation provider: %s",
+				  datcollprovider);
+	}
 	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
 	{
 		appendPQExpBufferStr(creaQry, " LOCALE = ");
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 774cc764ff..0a38eb15b1 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -879,6 +879,14 @@ listAllDbs(const char *pattern, bool verbose)
 						  "       d.datctype as \"%s\",\n",
 						  gettext_noop("Collate"),
 						  gettext_noop("Ctype"));
+	if (pset.sversion >= 130000)
+		appendPQExpBuffer(&buf,
+						  "       CASE d.datcollprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("Provider"));
 	appendPQExpBufferStr(&buf, "       ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose && pset.sversion >= 80200)
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index ede665090f..ef4f8593dc 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -53,6 +53,8 @@ clean distclean maintainer-clean:
 	rm -f common.o scripts_parallel.o $(WIN32RES)
 	rm -rf tmp_check
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 64bcc20cb4..5944fd3f63 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -37,6 +37,7 @@ main(int argc, char *argv[])
 		{"lc-ctype", required_argument, NULL, 2},
 		{"locale", required_argument, NULL, 'l'},
 		{"maintenance-db", required_argument, NULL, 3},
+		{"collation-provider", required_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -59,6 +60,7 @@ main(int argc, char *argv[])
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *locale = NULL;
+	char	   *collation_provider = NULL;
 
 	PQExpBufferData sql;
 
@@ -117,6 +119,9 @@ main(int argc, char *argv[])
 			case 3:
 				maintenance_db = pg_strdup(optarg);
 				break;
+			case 4:
+				collation_provider = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -193,6 +198,8 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " LC_COLLATE '%s'", lc_collate);
 	if (lc_ctype)
 		appendPQExpBuffer(&sql, " LC_CTYPE '%s'", lc_ctype);
+	if (collation_provider)
+		appendPQExpBuffer(&sql, " COLLATION_PROVIDER %s", collation_provider);
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -250,6 +257,8 @@ help(const char *progname)
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DBNAME] [DESCRIPTION]\n"), progname);
 	printf(_("\nOptions:\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                               collation provider for the database\n"));
 	printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index c0f6067a92..9e8220335f 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -3,7 +3,7 @@
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 13;
+use Test::More tests => 16;
 
 program_help_ok('createdb');
 program_version_ok('createdb');
@@ -22,5 +22,22 @@
 	qr/statement: CREATE DATABASE foobar2 ENCODING 'LATIN1'/,
 	'create database with encoding');
 
+if ($ENV{with_icu} eq 'yes')
+{
+	$node->issues_sql_like(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar3' ],
+		qr/statement: CREATE DATABASE foobar3 .* COLLATION_PROVIDER icu/,
+		'create database with ICU');
+}
+else
+{
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar3' ],
+		'create database with ICU fails since no ICU support');
+	pass;
+}
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
+$node->command_fails([ 'createdb', '-T', 'template0', '--collation-provider=xyz', 'foobarX' ],
+	'fails for invalid collation provider');
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index 89bd75d024..f261cdd838 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -15,7 +15,7 @@
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'default template for new databases',
   datname => 'template1', encoding => 'ENCODING', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datistemplate => 't', datallowconn => 't',
+  datctype => 'LC_CTYPE', datcollprovider => 'COLLPROVIDER', datistemplate => 't', datallowconn => 't',
   datconnlimit => '-1', datlastsysoid => '0', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datacl => '_null_' },
 
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 06fea45f53..ab3c0951df 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -46,6 +46,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* LC_CTYPE setting */
 	NameData	datctype;
 
+	/* see pg_collation.collprovider */
+	char		datcollprovider;
+
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b4b3aa5843..17fcee1e89 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -101,6 +101,12 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+extern struct pg_locale_struct global_locale;
+
+extern void make_icu_collator(const char *collcollate, const char *collctype,
+							  struct pg_locale_struct *resultp);
+extern void check_collation_version(HeapTuple colltuple);
+
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);

base-commit: 56f8f9624ba050c7c47dd97547b7fafb866f2bdd
-- 
2.22.0

Andrey Borodin

x4mmm@yandex-team.ru

over 6 years ago

In reply to: Peter Eisentraut (#1)

Re: ICU for global collation

Hi!

20 авг. 2019 г., в 19:21, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):

Here is an initial patch to add the option to use ICU as the global
collation provider, a long-requested feature.

To activate, use something like

initdb --collation-provider=icu --locale=...

A trick here is that since we need to also still set the normal POSIX
locales, the --locale value needs to be valid as both a POSIX locale and
a ICU locale. If that doesn't work out, there is also a way to specify
it separately, e.g.,

initdb --collation-provider=icu --locale=en_US.utf8 --icu-locale=en

Thanks! This is very awaited feature.

Seems like user cannot change locale for database if icu is already chosen?

postgres=# create database a template template0 collation_provider icu lc_collate 'en_US.utf8';
CREATE DATABASE
postgres=# \c a
2019-08-21 11:43:40.379 +05 [41509] FATAL: collations with different collate and ctype values are not supported by ICU
FATAL: collations with different collate and ctype values are not supported by ICU
Previous connection kept

Am I missing something?

BTW, psql does not know about collation_provider.

Best regards, Andrey Borodin.

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 6 years ago

In reply to: Andrey Borodin (#2)

Re: ICU for global collation

On 2019-08-21 08:56, Andrey Borodin wrote:

postgres=# create database a template template0 collation_provider icu lc_collate 'en_US.utf8';
CREATE DATABASE
postgres=# \c a
2019-08-21 11:43:40.379 +05 [41509] FATAL: collations with different collate and ctype values are not supported by ICU
FATAL: collations with different collate and ctype values are not supported by ICU

Try

create database a template template0 collation_provider icu locale
'en_US.utf8';

which sets both lc_collate and lc_ctype. But 'en_US.utf8' is not a
valid ICU locale name. Perhaps use 'en' or 'en-US'.

I'm making a note that we should prevent creating a database with a
faulty locale configuration in the first place instead of failing when
we're connecting.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andrey Borodin

x4mmm@yandex-team.ru

over 6 years ago

In reply to: Peter Eisentraut (#3)

Re: ICU for global collation

21 авг. 2019 г., в 12:23, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):

On 2019-08-21 08:56, Andrey Borodin wrote:

postgres=# create database a template template0 collation_provider icu lc_collate 'en_US.utf8';
CREATE DATABASE
postgres=# \c a
2019-08-21 11:43:40.379 +05 [41509] FATAL: collations with different collate and ctype values are not supported by ICU
FATAL: collations with different collate and ctype values are not supported by ICU

Try

create database a template template0 collation_provider icu locale
'en_US.utf8';

which sets both lc_collate and lc_ctype. But 'en_US.utf8' is not a
valid ICU locale name. Perhaps use 'en' or 'en-US'.

I'm making a note that we should prevent creating a database with a
faulty locale configuration in the first place instead of failing when
we're connecting.

Yes, the problem is input with lc_collate is accepted
postgres=# create database a template template0 collation_provider icu lc_collate 'en_US.utf8';
CREATE DATABASE
postgres=# \c a
2019-09-11 10:01:00.373 +05 [56878] FATAL: collations with different collate and ctype values are not supported by ICU
FATAL: collations with different collate and ctype values are not supported by ICU
Previous connection kept
postgres=# create database b template template0 collation_provider icu locale 'en_US.utf8';
CREATE DATABASE
postgres=# \c b
You are now connected to database "b" as user "x4mmm".

I get same output with 'en' or 'en-US'.

Also, cluster initialized --with-icu started on binaries without icu just fine.
And only after some time, I've got that messages "ERROR: ICU is not supported in this build".
Is it expected behavior? Maybe we should refuse to start without icu?

Best regards, Andrey Borodin.

Daniel Verite

daniel@manitou-mail.org

over 6 years ago

In reply to: Peter Eisentraut (#1)

Re: ICU for global collation

Hi,

When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).

That's a problem for the backend code that depends on libc functions
that themselves depend on LC_CTYPE, such as the full text search parser
and dictionaries.

For instance, if you start the instance with a C locale
(LC_ALL=C pg_ctl...) , and tries to use FTS in an ICU UTF-8 database,
it doesn't work:

template1=# create database "fr-utf8"
template 'template0' encoding UTF8
locale 'fr'
collation_provider 'icu';

template1=# \c fr-utf8
You are now connected to database "fr-utf8" as user "daniel".

fr-utf8=# show lc_ctype;
lc_ctype
----------
fr
(1 row)

fr-utf8=# select to_tsvector('été');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.

If I peek into the "real" LC_CTYPE when connected to this database,
I can see it's "C":

fr-utf8=# create extension plperl;
CREATE EXTENSION

fr-utf8=# create function lc_ctype() returns text as '$ENV{LC_CTYPE};'
language plperl;
CREATE FUNCTION

fr-utf8=# select lc_ctype();
lc_ctype
----------
C

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Marius Timmer

marius.timmer@uni-muenster.de

over 6 years ago

In reply to: Daniel Verite (#5)

2 attachment(s)

Re: ICU for global collation

Hi everyone,

like the others before me we (the university of Münster) are happy to
see this feature as well. Thank you this.

When I applied the patch two weeks ago I run into the issue that initdb
did not recognize the new parameters (collation-provider and icu-locale)
but I guess it was caused by my own stupidity.

When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).

I am able to recreate the issue described by Daniel on my machine.

Now it works as expected. I just had to update the patch since commit
3f6b3be3 had modified two lines which resulted in conflicts. You find
the updated patch as attachement to this mail.

Best regards,

Marius Timmer

--
Westfälische Wilhelms-Universität Münster (WWU)
Zentrum für Informationsverarbeitung (ZIV)
Röntgenstraße 7-13
Besucheradresse: Einsteinstraße 60 - Raum 107
48149 Münster
+49 251 83 31158
marius.timmer@uni-muenster.de
https://www.uni-muenster.de/ZIV

Attachments:

v1-0002-Add-option-to-use-ICU-as-global-collation-provide_rebased.patchtext/x-patch; name=v1-0002-Add-option-to-use-ICU-as-global-collation-provide_rebased.patchDownload

From 0b520194ed164feaeac94af25ddf1429cf4ab24f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 7 Oct 2019 12:21:36 +0200
Subject: [PATCH v1.2] Add option to use ICU as global collation provider

This adds the option to use ICU as the default collation provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.
---
 doc/src/sgml/ref/createdb.sgml       |   9 ++
 doc/src/sgml/ref/initdb.sgml         |  23 ++++
 src/backend/access/hash/hashfunc.c   |  18 ++-
 src/backend/commands/dbcommands.c    |  52 ++++++++-
 src/backend/regex/regc_pg_locale.c   |   7 +-
 src/backend/utils/adt/formatting.c   |   6 +
 src/backend/utils/adt/like.c         |  20 +++-
 src/backend/utils/adt/like_support.c |   2 +
 src/backend/utils/adt/pg_locale.c    | 168 ++++++++++++++++-----------
 src/backend/utils/adt/varchar.c      |  22 +++-
 src/backend/utils/adt/varlena.c      |  26 ++++-
 src/backend/utils/init/postinit.c    |  21 ++++
 src/bin/initdb/Makefile              |   2 +
 src/bin/initdb/initdb.c              |  63 ++++++++--
 src/bin/initdb/t/001_initdb.pl       |  18 ++-
 src/bin/pg_dump/pg_dump.c            |  16 +++
 src/bin/psql/describe.c              |   8 ++
 src/bin/scripts/Makefile             |   2 +
 src/bin/scripts/createdb.c           |   9 ++
 src/bin/scripts/t/020_createdb.pl    |  19 ++-
 src/include/catalog/pg_database.dat  |   2 +-
 src/include/catalog/pg_database.h    |   3 +
 src/include/utils/pg_locale.h        |   6 +
 23 files changed, 417 insertions(+), 105 deletions(-)

diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 8fc8128bf9..5b73afad91 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -85,6 +85,15 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        Specifies the collation provider for the database.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">tablespace</replaceable></option></term>
       <term><option>--tablespace=<replaceable class="parameter">tablespace</replaceable></option></term>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..9ad7b2e112 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -165,6 +165,18 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        This option sets the collation provider for databases created in the
+        new cluster.  It can be overridden in the <command>CREATE
+        DATABASE</command> command when new databases are subsequently
+        created.  The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">directory</replaceable></option></term>
       <term><option>--pgdata=<replaceable class="parameter">directory</replaceable></option></term>
@@ -209,6 +221,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU collation provider is used.  If
+        this is not specified, the value from the <option>--locale</option>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index 6ec1ec3df3..2f8f220549 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -255,8 +255,13 @@ hashtext(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -311,8 +316,13 @@ hashtextextended(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 01d66212e9..e068c02d18 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -36,6 +36,7 @@
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_subscription.h"
@@ -87,7 +88,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 						MultiXactId *dbMinMultiP,
-						Oid *dbTablespace, char **dbCollate, char **dbCtype);
+						Oid *dbTablespace, char **dbCollate, char **dbCtype,
+						char *dbCollProvider);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -107,6 +109,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	int			src_encoding = -1;
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
+	char		src_collprovider;
 	bool		src_istemplate;
 	bool		src_allowconn;
 	Oid			src_lastsysoid = InvalidOid;
@@ -128,6 +131,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dlocale = NULL;
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
+	DefElem	   *dcollprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
@@ -136,6 +140,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	const char *dbtemplate = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
+	char		dbcollprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
 	bool		dbistemplate = false;
@@ -213,6 +218,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 parser_errposition(pstate, defel->location)));
 			dctype = defel;
 		}
+		else if (strcmp(defel->defname, "collation_provider") == 0)
+		{
+			if (dcollprovider)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			dcollprovider = defel;
+		}
 		else if (strcmp(defel->defname, "is_template") == 0)
 		{
 			if (distemplate)
@@ -302,6 +316,23 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = defGetString(dcollate);
 	if (dctype && dctype->arg)
 		dbctype = defGetString(dctype);
+	if (dcollprovider && dcollprovider->arg)
+	{
+		char	   *collproviderstr = defGetString(dcollprovider);
+
+#ifdef USE_ICU
+		if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			dbcollprovider = COLLPROVIDER_ICU;
+		else
+#endif
+		if (pg_strcasecmp(collproviderstr, "libc") == 0)
+			dbcollprovider = COLLPROVIDER_LIBC;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("unrecognized collation provider: %s",
+							collproviderstr)));
+	}
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -351,7 +382,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn, &src_lastsysoid,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype))
+					 &src_collate, &src_ctype, &src_collprovider))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -377,6 +408,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = src_collate;
 	if (dbctype == NULL)
 		dbctype = src_ctype;
+	if (dbcollprovider == '\0')
+		dbcollprovider = src_collprovider;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -384,6 +417,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 				 errmsg("invalid server encoding %d", encoding)));
 
+	if (dbcollprovider == COLLPROVIDER_LIBC)
+	{
 	/* Check that the chosen locales are valid, and get canonical spellings */
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
@@ -397,6 +432,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
+	}
 
 	/*
 	 * Check that the new encoding and locale settings match the source
@@ -560,6 +596,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbcollate));
 	new_record[Anum_pg_database_datctype - 1] =
 		DirectFunctionCall1(namein, CStringGetDatum(dbctype));
+	new_record[Anum_pg_database_datcollprovider - 1] = CharGetDatum(dbcollprovider);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -833,7 +870,7 @@ dropdb(const char *dbname, bool missing_ok)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1024,7 +1061,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1137,7 +1174,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -1769,7 +1806,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 			MultiXactId *dbMinMultiP,
-			Oid *dbTablespace, char **dbCollate, char **dbCtype)
+			Oid *dbTablespace, char **dbCollate, char **dbCtype,
+			char *dbCollProvider)
 {
 	bool		result = false;
 	Relation	relation;
@@ -1866,6 +1904,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					*dbCollate = pstrdup(NameStr(dbform->datcollate));
 				if (dbCtype)
 					*dbCtype = pstrdup(NameStr(dbform->datctype));
+				if (dbCollProvider)
+					*dbCollProvider = dbform->datcollprovider;
 				ReleaseSysCache(tuple);
 				result = true;
 				break;
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 4a808b7606..510bd71371 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -241,7 +241,12 @@ pg_set_regex_collation(Oid collation)
 	else
 	{
 		if (collation == DEFAULT_COLLATION_OID)
-			pg_regex_locale = 0;
+		{
+			if (global_locale.provider == COLLPROVIDER_ICU)
+				pg_regex_locale = &global_locale;
+			else
+				pg_regex_locale = 0;
+		}
 		else if (OidIsValid(collation))
 		{
 			/*
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index f7175df8da..dca8ca566f 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1674,6 +1674,8 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1798,6 +1800,8 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1923,6 +1927,8 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 13d5cb083c..57dd3fe59d 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -150,9 +150,14 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
 static inline int
 GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation)
 {
-	if (collation && !lc_ctype_is_c(collation) && collation != DEFAULT_COLLATION_OID)
+	if (collation && !lc_ctype_is_c(collation))
 	{
-		pg_locale_t locale = pg_newlocale_from_collation(collation);
+		pg_locale_t locale = 0;
+
+		if (collation != DEFAULT_COLLATION_OID)
+			locale = pg_newlocale_from_collation(collation);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 
 		if (locale && !locale->deterministic)
 			ereport(ERROR,
@@ -195,11 +200,14 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		}
 		locale = pg_newlocale_from_collation(collation);
 
-		if (locale && !locale->deterministic)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("nondeterministic collations are not supported for ILIKE")));
 	}
+	else if (global_locale.provider == COLLPROVIDER_ICU)
+		locale = &global_locale;
+
+	if (locale && !locale->deterministic)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
 	 * For efficiency reasons, in the single byte case we don't call lower()
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index c8fec7863f..09a28aab8e 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -966,6 +966,8 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 			}
 			locale = pg_newlocale_from_collation(collation);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 	}
 
 	if (typeid != BYTEAOID)
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b2f08ead45..168890b6f6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1206,6 +1206,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_COLLATE, NULL);
@@ -1256,6 +1259,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_CTYPE, NULL);
@@ -1284,6 +1290,89 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+struct pg_locale_struct global_locale;
+
+void
+make_icu_collator(const char *collcollate, const char *collctype,
+				  struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+	UCollator  *collator;
+	UErrorCode	status;
+
+	if (strcmp(collcollate, collctype) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("collations with different collate and ctype values are not supported by ICU")));
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(collcollate, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open collator for locale \"%s\": %s",
+						collcollate, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+		icu_set_collation_attributes(collator, collcollate);
+
+	/* We will leak this string if we get an error below :-( */
+	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext,
+														   collcollate);
+	resultp->info.icu.ucol = collator;
+#else							/* not USE_ICU */
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build"), \
+			 errhint("You need to rebuild PostgreSQL using --with-icu.")));
+#endif							/* not USE_ICU */
+}
+
+void
+check_collation_version(HeapTuple colltuple)
+{
+	Form_pg_collation collform;
+	Datum		collversion;
+	bool		isnull;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+
+	collversion = SysCacheGetAttr(COLLOID, colltuple, Anum_pg_collation_collversion,
+								  &isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 NameStr(collform->collcollate));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE
+			 * COLLATION for a libc locale, or manually creating a mess in
+			 * the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was specified",
+							NameStr(collform->collname))));
+		}
+		collversionstr = TextDatumGetCString(collversion);
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
+	}
+}
 
 /* simple subroutine for reporting errors from newlocale() */
 #ifdef HAVE_LOCALE_T
@@ -1357,8 +1446,6 @@ pg_newlocale_from_collation(Oid collid)
 		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
-		Datum		collversion;
-		bool		isnull;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 		if (!HeapTupleIsValid(tp))
@@ -1428,72 +1515,10 @@ pg_newlocale_from_collation(Oid collid)
 		}
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
-			UCollator  *collator;
-			UErrorCode	status;
-
-			if (strcmp(collcollate, collctype) != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("collations with different collate and ctype values are not supported by ICU")));
-
-			status = U_ZERO_ERROR;
-			collator = ucol_open(collcollate, &status);
-			if (U_FAILURE(status))
-				ereport(ERROR,
-						(errmsg("could not open collator for locale \"%s\": %s",
-								collcollate, u_errorName(status))));
-
-			if (U_ICU_VERSION_MAJOR_NUM < 54)
-				icu_set_collation_attributes(collator, collcollate);
-
-			/* We will leak this string if we get an error below :-( */
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext,
-														 collcollate);
-			result.info.icu.ucol = collator;
-#else							/* not USE_ICU */
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build"), \
-					 errhint("You need to rebuild PostgreSQL using --with-icu.")));
-#endif							/* not USE_ICU */
+			make_icu_collator(collcollate, collctype, &result);
 		}
 
-		collversion = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-									  &isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION for a libc locale, or manually creating a mess in
-				 * the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was specified",
-								NameStr(collform->collname))));
-			}
-			collversionstr = TextDatumGetCString(collversion);
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
+		check_collation_version(tp);
 
 		ReleaseSysCache(tp);
 
@@ -1520,6 +1545,17 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion;
 
+	if (collprovider == COLLPROVIDER_DEFAULT)
+	{
+#ifdef USE_ICU
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			collversion = get_collation_actual_version(global_locale.provider,
+													   global_locale.info.icu.locale);
+		else
+#endif
+			collversion = NULL;
+	}
+	else
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index e63a4e553b..d1b6ccc6a3 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -751,7 +751,7 @@ bpchareq(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -789,7 +789,7 @@ bpcharne(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -995,8 +995,13 @@ hashbpchar(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -1055,8 +1060,13 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 722b2c722d..0543a0688c 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1156,8 +1156,13 @@ text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -1499,6 +1504,8 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 
 		if (collid != DEFAULT_COLLATION_OID)
 			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 		/*
 		 * memcmp() can't tell us which of two unequal strings sorts first,
@@ -1720,7 +1727,7 @@ texteq(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1774,7 +1781,7 @@ textne(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1886,8 +1893,13 @@ text_starts_with(PG_FUNCTION_ARGS)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -2002,6 +2014,8 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		 */
 		if (collid != DEFAULT_COLLATION_OID)
 			locale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 
 		/*
 		 * There is a further exception on Windows.  When the database
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 29c5ec7b58..1a91b42798 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -31,6 +31,7 @@
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
@@ -404,6 +405,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	collate = NameStr(dbform->datcollate);
 	ctype = NameStr(dbform->datctype);
 
+	if (dbform->datcollprovider == COLLPROVIDER_LIBC)
+	{
 	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
 		ereport(FATAL,
 				(errmsg("database locale is incompatible with operating system"),
@@ -417,6 +420,24 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 				 errdetail("The database was initialized with LC_CTYPE \"%s\", "
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
+	}
+	else if (dbform->datcollprovider == COLLPROVIDER_ICU)
+	{
+		make_icu_collator(collate, ctype, &global_locale);
+	}
+
+	global_locale.provider = dbform->datcollprovider;
+	global_locale.deterministic = true;	// TODO
+
+	{
+		HeapTuple	tp;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(DEFAULT_COLLATION_OID));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", DEFAULT_COLLATION_OID);
+		check_collation_version(tp);
+		ReleaseSysCache(tp);
+	}
 
 	/* Make the locale settings visible as GUC variables, too */
 	SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_OVERRIDE);
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index 7c404430a9..a9335a8ba6 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -61,6 +61,8 @@ clean distclean maintainer-clean:
 # ensure that changes in datadir propagate into object file
 initdb.o: initdb.c $(top_builddir)/src/Makefile.global
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d9bd..62c310040a 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -129,6 +129,8 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static char collation_provider[] = {COLLPROVIDER_LIBC, '\0'};
+static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1412,10 +1414,13 @@ bootstrap_template1(void)
 							  encodingid_to_string(encodingid));
 
 	bki_lines = replace_token(bki_lines, "LC_COLLATE",
-							  escape_quotes_bki(lc_collate));
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : lc_collate));
 
 	bki_lines = replace_token(bki_lines, "LC_CTYPE",
-							  escape_quotes_bki(lc_ctype));
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : lc_ctype));
+
+	bki_lines = replace_token(bki_lines, "COLLPROVIDER",
+							  collation_provider);
 
 	/* Also ensure backend isn't confused by this environment var: */
 	unsetenv("PGCLIENTENCODING");
@@ -1707,6 +1712,12 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
+	/*
+	 * Set version of the default collation.
+	 */
+	PG_CMD_PRINTF("UPDATE pg_collation SET collversion = pg_collation_actual_version(oid) WHERE oid = %d;\n\n",
+				  DEFAULT_COLLATION_OID);
+
 	/*
 	 * Add an SQL-standard name.  We don't want to pin this, so it doesn't go
 	 * in pg_collation.h.  But add it before reading system collations, so
@@ -1995,8 +2006,6 @@ make_template0(FILE *cmdfd)
 {
 	const char *const *line;
 	static const char *const template0_setup[] = {
-		"CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n\n",
-
 		/*
 		 * We use the OID of template0 to determine datlastsysoid
 		 */
@@ -2021,6 +2030,9 @@ make_template0(FILE *cmdfd)
 		NULL
 	};
 
+	PG_CMD_PRINTF("CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false COLLATION_PROVIDER = %s;\n\n",
+				  collation_provider[0] == COLLPROVIDER_ICU ? "icu" : "libc");
+
 	for (line = template0_setup; *line; line++)
 		PG_CMD_PUTS(*line);
 }
@@ -2293,13 +2305,14 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale)
+			icu_locale = locale;
 	}
 
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
 	 */
-
 	check_locale_name(LC_CTYPE, lc_ctype, &canonname);
 	lc_ctype = canonname;
 	check_locale_name(LC_COLLATE, lc_collate, &canonname);
@@ -2318,6 +2331,18 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+
+	/*
+	 * If ICU is selected but no ICU locale has been given, take the
+	 * lc_collate locale and chop off any encoding suffix.  This should give
+	 * the user a configuration that resembles their operating system's locale
+	 * setup.
+	 */
+	if (collation_provider[0] == COLLPROVIDER_ICU && !icu_locale)
+	{
+		icu_locale = pg_strdup(lc_collate);
+		icu_locale[strcspn(icu_locale, ".")] = '\0';
+	}
 }
 
 /*
@@ -2333,9 +2358,12 @@ usage(const char *progname)
 	printf(_("  -A, --auth=METHOD         default authentication method for local connections\n"));
 	printf(_("      --auth-host=METHOD    default authentication method for local TCP/IP connections\n"));
 	printf(_("      --auth-local=METHOD   default authentication method for local-socket connections\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                            set default collation provider for new databases\n"));
 	printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
+	printf(_("      --icu-locale          set ICU locale for new databases\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
 			 "      --lc-monetary=, --lc-numeric=, --lc-time=LOCALE\n"
@@ -2510,7 +2538,8 @@ setup_locale_encoding(void)
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0)
+		strcmp(lc_ctype, lc_messages) == 0 &&
+		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
@@ -2527,9 +2556,13 @@ setup_locale_encoding(void)
 			   lc_monetary,
 			   lc_numeric,
 			   lc_time);
+		if (icu_locale)
+			printf(_("  ICU:      %s\n"), icu_locale);
 	}
 
-	if (!encoding)
+	if (!encoding && collation_provider[0] == COLLPROVIDER_ICU)
+		encodingid = PG_UTF8;
+	else if (!encoding)
 	{
 		int			ctype_enc;
 
@@ -3029,6 +3062,8 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"collation-provider", required_argument, NULL, 13},
+		{"icu-locale", required_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3167,6 +3202,20 @@ main(int argc, char *argv[])
 			case 'g':
 				SetDataDirectoryCreatePerm(PG_DIR_MODE_GROUP);
 				break;
+			case 13:
+				if (strcmp(optarg, "icu") == 0)
+					collation_provider[0] = COLLPROVIDER_ICU;
+				else if (strcmp(optarg, "libc") == 0)
+					collation_provider[0] = COLLPROVIDER_LIBC;
+				else
+				{
+					pg_log_error("unrecognized collation provider: %s", optarg);
+					exit(1);
+				}
+				break;
+			case 14:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 8387b945d3..90f6fc8f14 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -8,7 +8,7 @@ use Fcntl ':mode';
 use File::stat qw{lstat};
 use PostgresNode;
 use TestLib;
-use Test::More tests => 22;
+use Test::More tests => 24;
 
 my $tempdir = TestLib::tempdir;
 my $xlogdir = "$tempdir/pgxlog";
@@ -89,3 +89,19 @@ SKIP:
 	ok(check_mode_recursive($datadir_group, 0750, 0640),
 		'check PGDATA permissions');
 }
+
+# Collation provider tests
+
+if ($ENV{with_icu} eq 'yes')
+{
+	command_ok(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+			   'collation provider ICU');
+}
+else
+{
+	command_fails(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+				  'collation provider ICU fails since no ICU support');
+}
+
+command_fails(['initdb', '--no-sync', '--collation-provider=xyz', "$tempdir/dataX"],
+			  'fails for invalid collation provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f01fea5b91..9d7842583b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2638,6 +2638,7 @@ dumpDatabase(Archive *fout)
 				i_datname,
 				i_dba,
 				i_encoding,
+				i_datcollprovider,
 				i_collate,
 				i_ctype,
 				i_frozenxid,
@@ -2652,6 +2653,7 @@ dumpDatabase(Archive *fout)
 	const char *datname,
 			   *dba,
 			   *encoding,
+			   *datcollprovider,
 			   *collate,
 			   *ctype,
 			   *datacl,
@@ -2680,6 +2682,7 @@ dumpDatabase(Archive *fout)
 		appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
 						  "(%s datdba) AS dba, "
 						  "pg_encoding_to_char(encoding) AS encoding, "
+						  "datcollprovider, "
 						  "datcollate, datctype, datfrozenxid, datminmxid, "
 						  "(SELECT array_agg(acl ORDER BY row_n) FROM "
 						  "  (SELECT acl, row_n FROM "
@@ -2772,6 +2775,7 @@ dumpDatabase(Archive *fout)
 	i_datname = PQfnumber(res, "datname");
 	i_dba = PQfnumber(res, "dba");
 	i_encoding = PQfnumber(res, "encoding");
+	i_datcollprovider = PQfnumber(res, "datcollprovider");
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
@@ -2787,6 +2791,7 @@ dumpDatabase(Archive *fout)
 	datname = PQgetvalue(res, 0, i_datname);
 	dba = PQgetvalue(res, 0, i_dba);
 	encoding = PQgetvalue(res, 0, i_encoding);
+	datcollprovider = PQgetvalue(res, 0, i_datcollprovider);
 	collate = PQgetvalue(res, 0, i_collate);
 	ctype = PQgetvalue(res, 0, i_ctype);
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
@@ -2812,6 +2817,17 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ENCODING = ");
 		appendStringLiteralAH(creaQry, encoding, fout);
 	}
+	if (strlen(datcollprovider) > 0)
+	{
+		appendPQExpBufferStr(creaQry, " COLLATION_PROVIDER = ");
+		if (datcollprovider[0] == 'c')
+			appendPQExpBufferStr(creaQry, "libc");
+		else if (datcollprovider[0] == 'i')
+			appendPQExpBufferStr(creaQry, "icu");
+		else
+			fatal("unrecognized collation provider: %s",
+				  datcollprovider);
+	}
 	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
 	{
 		appendPQExpBufferStr(creaQry, " LOCALE = ");
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index d7c0fc0c1e..23fdbb92ae 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -880,6 +880,14 @@ listAllDbs(const char *pattern, bool verbose)
 						  "       d.datctype as \"%s\",\n",
 						  gettext_noop("Collate"),
 						  gettext_noop("Ctype"));
+	if (pset.sversion >= 130000)
+		appendPQExpBuffer(&buf,
+						  "       CASE d.datcollprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("Provider"));
 	appendPQExpBufferStr(&buf, "       ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose && pset.sversion >= 80200)
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index ede665090f..ef4f8593dc 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -53,6 +53,8 @@ clean distclean maintainer-clean:
 	rm -f common.o scripts_parallel.o $(WIN32RES)
 	rm -rf tmp_check
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 64bcc20cb4..5944fd3f63 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -37,6 +37,7 @@ main(int argc, char *argv[])
 		{"lc-ctype", required_argument, NULL, 2},
 		{"locale", required_argument, NULL, 'l'},
 		{"maintenance-db", required_argument, NULL, 3},
+		{"collation-provider", required_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -59,6 +60,7 @@ main(int argc, char *argv[])
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *locale = NULL;
+	char	   *collation_provider = NULL;
 
 	PQExpBufferData sql;
 
@@ -117,6 +119,9 @@ main(int argc, char *argv[])
 			case 3:
 				maintenance_db = pg_strdup(optarg);
 				break;
+			case 4:
+				collation_provider = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -193,6 +198,8 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " LC_COLLATE '%s'", lc_collate);
 	if (lc_ctype)
 		appendPQExpBuffer(&sql, " LC_CTYPE '%s'", lc_ctype);
+	if (collation_provider)
+		appendPQExpBuffer(&sql, " COLLATION_PROVIDER %s", collation_provider);
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -250,6 +257,8 @@ help(const char *progname)
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DBNAME] [DESCRIPTION]\n"), progname);
 	printf(_("\nOptions:\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                               collation provider for the database\n"));
 	printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index c0f6067a92..9e8220335f 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 13;
+use Test::More tests => 16;
 
 program_help_ok('createdb');
 program_version_ok('createdb');
@@ -22,5 +22,22 @@ $node->issues_sql_like(
 	qr/statement: CREATE DATABASE foobar2 ENCODING 'LATIN1'/,
 	'create database with encoding');
 
+if ($ENV{with_icu} eq 'yes')
+{
+	$node->issues_sql_like(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar3' ],
+		qr/statement: CREATE DATABASE foobar3 .* COLLATION_PROVIDER icu/,
+		'create database with ICU');
+}
+else
+{
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar3' ],
+		'create database with ICU fails since no ICU support');
+	pass;
+}
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
+$node->command_fails([ 'createdb', '-T', 'template0', '--collation-provider=xyz', 'foobarX' ],
+	'fails for invalid collation provider');
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index 89bd75d024..f261cdd838 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -15,7 +15,7 @@
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'default template for new databases',
   datname => 'template1', encoding => 'ENCODING', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datistemplate => 't', datallowconn => 't',
+  datctype => 'LC_CTYPE', datcollprovider => 'COLLPROVIDER', datistemplate => 't', datallowconn => 't',
   datconnlimit => '-1', datlastsysoid => '0', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datacl => '_null_' },
 
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 06fea45f53..ab3c0951df 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -46,6 +46,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* LC_CTYPE setting */
 	NameData	datctype;
 
+	/* see pg_collation.collprovider */
+	char		datcollprovider;
+
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b4b3aa5843..17fcee1e89 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -101,6 +101,12 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+extern struct pg_locale_struct global_locale;
+
+extern void make_icu_collator(const char *collcollate, const char *collctype,
+							  struct pg_locale_struct *resultp);
+extern void check_collation_version(HeapTuple colltuple);
+
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
-- 
2.20.1

smime.p7sapplication/pkcs7-signature; name=smime.p7sDownload

Thomas Munro

thomas.munro@gmail.com

about 6 years ago

In reply to: Marius Timmer (#6)

1 attachment(s)

Re: ICU for global collation

On Wed, Oct 9, 2019 at 12:16 AM Marius Timmer
<marius.timmer@uni-muenster.de> wrote:

like the others before me we (the university of Münster) are happy to
see this feature as well. Thank you this.

When I applied the patch two weeks ago I run into the issue that initdb
did not recognize the new parameters (collation-provider and icu-locale)
but I guess it was caused by my own stupidity.

When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).

I am able to recreate the issue described by Daniel on my machine.

Now it works as expected. I just had to update the patch since commit
3f6b3be3 had modified two lines which resulted in conflicts. You find
the updated patch as attachement to this mail.

I rebased this patch, and tweaked get_collation_action_version() very
slightly so that you get collation version change detection (of the
ersatz kind provided by commit d5ac14f9) for the default collation
even when not using ICU. Please see attached.

+struct pg_locale_struct global_locale;

Why not "default_locale"? Where is the terminology "global" coming from?

+ Specifies the collation provider for the database.

"for the database's default collation"?

Attachments:

v2-0001-Add-option-to-use-ICU-as-global-collation-provide.patchapplication/octet-stream; name=v2-0001-Add-option-to-use-ICU-as-global-collation-provide.patchDownload

From 90fad2d7524620dca35d667aa3266cdb8fb16cac Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 7 Oct 2019 12:21:36 +0200
Subject: [PATCH v2] Add option to use ICU as global collation provider

This adds the option to use ICU as the default collation provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.
---
 doc/src/sgml/ref/createdb.sgml       |   9 ++
 doc/src/sgml/ref/initdb.sgml         |  23 ++++
 src/backend/access/hash/hashfunc.c   |  18 ++-
 src/backend/commands/dbcommands.c    |  52 ++++++++-
 src/backend/regex/regc_pg_locale.c   |   7 +-
 src/backend/utils/adt/formatting.c   |   6 +
 src/backend/utils/adt/like.c         |  20 +++-
 src/backend/utils/adt/like_support.c |   2 +
 src/backend/utils/adt/pg_locale.c    | 164 ++++++++++++++++-----------
 src/backend/utils/adt/varchar.c      |  22 +++-
 src/backend/utils/adt/varlena.c      |  26 ++++-
 src/backend/utils/init/postinit.c    |  21 ++++
 src/bin/initdb/Makefile              |   2 +
 src/bin/initdb/initdb.c              |  63 ++++++++--
 src/bin/initdb/t/001_initdb.pl       |  18 ++-
 src/bin/pg_dump/pg_dump.c            |  16 +++
 src/bin/psql/describe.c              |   8 ++
 src/bin/scripts/Makefile             |   2 +
 src/bin/scripts/createdb.c           |   9 ++
 src/bin/scripts/t/020_createdb.pl    |  19 +++-
 src/include/catalog/pg_database.dat  |   2 +-
 src/include/catalog/pg_database.h    |   3 +
 src/include/utils/pg_locale.h        |   6 +
 23 files changed, 413 insertions(+), 105 deletions(-)

diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 8fc8128bf9..5b73afad91 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -85,6 +85,15 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        Specifies the collation provider for the database.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">tablespace</replaceable></option></term>
       <term><option>--tablespace=<replaceable class="parameter">tablespace</replaceable></option></term>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..9ad7b2e112 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -165,6 +165,18 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        This option sets the collation provider for databases created in the
+        new cluster.  It can be overridden in the <command>CREATE
+        DATABASE</command> command when new databases are subsequently
+        created.  The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">directory</replaceable></option></term>
       <term><option>--pgdata=<replaceable class="parameter">directory</replaceable></option></term>
@@ -209,6 +221,17 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU collation provider is used.  If
+        this is not specified, the value from the <option>--locale</option>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index 6ec1ec3df3..2f8f220549 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -255,8 +255,13 @@ hashtext(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -311,8 +316,13 @@ hashtextextended(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 01d66212e9..e068c02d18 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -36,6 +36,7 @@
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_subscription.h"
@@ -87,7 +88,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 						MultiXactId *dbMinMultiP,
-						Oid *dbTablespace, char **dbCollate, char **dbCtype);
+						Oid *dbTablespace, char **dbCollate, char **dbCtype,
+						char *dbCollProvider);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -107,6 +109,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	int			src_encoding = -1;
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
+	char		src_collprovider;
 	bool		src_istemplate;
 	bool		src_allowconn;
 	Oid			src_lastsysoid = InvalidOid;
@@ -128,6 +131,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dlocale = NULL;
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
+	DefElem	   *dcollprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
@@ -136,6 +140,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	const char *dbtemplate = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
+	char		dbcollprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
 	bool		dbistemplate = false;
@@ -213,6 +218,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 						 parser_errposition(pstate, defel->location)));
 			dctype = defel;
 		}
+		else if (strcmp(defel->defname, "collation_provider") == 0)
+		{
+			if (dcollprovider)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			dcollprovider = defel;
+		}
 		else if (strcmp(defel->defname, "is_template") == 0)
 		{
 			if (distemplate)
@@ -302,6 +316,23 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = defGetString(dcollate);
 	if (dctype && dctype->arg)
 		dbctype = defGetString(dctype);
+	if (dcollprovider && dcollprovider->arg)
+	{
+		char	   *collproviderstr = defGetString(dcollprovider);
+
+#ifdef USE_ICU
+		if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			dbcollprovider = COLLPROVIDER_ICU;
+		else
+#endif
+		if (pg_strcasecmp(collproviderstr, "libc") == 0)
+			dbcollprovider = COLLPROVIDER_LIBC;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("unrecognized collation provider: %s",
+							collproviderstr)));
+	}
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -351,7 +382,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn, &src_lastsysoid,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype))
+					 &src_collate, &src_ctype, &src_collprovider))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -377,6 +408,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = src_collate;
 	if (dbctype == NULL)
 		dbctype = src_ctype;
+	if (dbcollprovider == '\0')
+		dbcollprovider = src_collprovider;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -384,6 +417,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 				 errmsg("invalid server encoding %d", encoding)));
 
+	if (dbcollprovider == COLLPROVIDER_LIBC)
+	{
 	/* Check that the chosen locales are valid, and get canonical spellings */
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
@@ -397,6 +432,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
+	}
 
 	/*
 	 * Check that the new encoding and locale settings match the source
@@ -560,6 +596,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbcollate));
 	new_record[Anum_pg_database_datctype - 1] =
 		DirectFunctionCall1(namein, CStringGetDatum(dbctype));
+	new_record[Anum_pg_database_datcollprovider - 1] = CharGetDatum(dbcollprovider);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -833,7 +870,7 @@ dropdb(const char *dbname, bool missing_ok)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1024,7 +1061,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1137,7 +1174,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -1769,7 +1806,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 			MultiXactId *dbMinMultiP,
-			Oid *dbTablespace, char **dbCollate, char **dbCtype)
+			Oid *dbTablespace, char **dbCollate, char **dbCtype,
+			char *dbCollProvider)
 {
 	bool		result = false;
 	Relation	relation;
@@ -1866,6 +1904,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					*dbCollate = pstrdup(NameStr(dbform->datcollate));
 				if (dbCtype)
 					*dbCtype = pstrdup(NameStr(dbform->datctype));
+				if (dbCollProvider)
+					*dbCollProvider = dbform->datcollprovider;
 				ReleaseSysCache(tuple);
 				result = true;
 				break;
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 4a808b7606..510bd71371 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -241,7 +241,12 @@ pg_set_regex_collation(Oid collation)
 	else
 	{
 		if (collation == DEFAULT_COLLATION_OID)
-			pg_regex_locale = 0;
+		{
+			if (global_locale.provider == COLLPROVIDER_ICU)
+				pg_regex_locale = &global_locale;
+			else
+				pg_regex_locale = 0;
+		}
 		else if (OidIsValid(collation))
 		{
 			/*
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index f7175df8da..dca8ca566f 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1674,6 +1674,8 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1798,6 +1800,8 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1923,6 +1927,8 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 13d5cb083c..57dd3fe59d 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -150,9 +150,14 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
 static inline int
 GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation)
 {
-	if (collation && !lc_ctype_is_c(collation) && collation != DEFAULT_COLLATION_OID)
+	if (collation && !lc_ctype_is_c(collation))
 	{
-		pg_locale_t locale = pg_newlocale_from_collation(collation);
+		pg_locale_t locale = 0;
+
+		if (collation != DEFAULT_COLLATION_OID)
+			locale = pg_newlocale_from_collation(collation);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 
 		if (locale && !locale->deterministic)
 			ereport(ERROR,
@@ -195,11 +200,14 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		}
 		locale = pg_newlocale_from_collation(collation);
 
-		if (locale && !locale->deterministic)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("nondeterministic collations are not supported for ILIKE")));
 	}
+	else if (global_locale.provider == COLLPROVIDER_ICU)
+		locale = &global_locale;
+
+	if (locale && !locale->deterministic)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
 	 * For efficiency reasons, in the single byte case we don't call lower()
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index c8fec7863f..09a28aab8e 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -966,6 +966,8 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 			}
 			locale = pg_newlocale_from_collation(collation);
 		}
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 	}
 
 	if (typeid != BYTEAOID)
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index fcdbaae37b..5a2cbc7dfb 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1191,6 +1191,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_COLLATE, NULL);
@@ -1241,6 +1244,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (global_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_CTYPE, NULL);
@@ -1269,6 +1275,89 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+struct pg_locale_struct global_locale;
+
+void
+make_icu_collator(const char *collcollate, const char *collctype,
+				  struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+	UCollator  *collator;
+	UErrorCode	status;
+
+	if (strcmp(collcollate, collctype) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("collations with different collate and ctype values are not supported by ICU")));
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(collcollate, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open collator for locale \"%s\": %s",
+						collcollate, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+		icu_set_collation_attributes(collator, collcollate);
+
+	/* We will leak this string if we get an error below :-( */
+	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext,
+														   collcollate);
+	resultp->info.icu.ucol = collator;
+#else							/* not USE_ICU */
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build"), \
+			 errhint("You need to rebuild PostgreSQL using --with-icu.")));
+#endif							/* not USE_ICU */
+}
+
+void
+check_collation_version(HeapTuple colltuple)
+{
+	Form_pg_collation collform;
+	Datum		collversion;
+	bool		isnull;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+
+	collversion = SysCacheGetAttr(COLLOID, colltuple, Anum_pg_collation_collversion,
+								  &isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 NameStr(collform->collcollate));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE
+			 * COLLATION for a libc locale, or manually creating a mess in
+			 * the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was specified",
+							NameStr(collform->collname))));
+		}
+		collversionstr = TextDatumGetCString(collversion);
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
+	}
+}
 
 /* simple subroutine for reporting errors from newlocale() */
 #ifdef HAVE_LOCALE_T
@@ -1342,8 +1431,6 @@ pg_newlocale_from_collation(Oid collid)
 		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
-		Datum		collversion;
-		bool		isnull;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 		if (!HeapTupleIsValid(tp))
@@ -1413,72 +1500,10 @@ pg_newlocale_from_collation(Oid collid)
 		}
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
-			UCollator  *collator;
-			UErrorCode	status;
-
-			if (strcmp(collcollate, collctype) != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("collations with different collate and ctype values are not supported by ICU")));
-
-			status = U_ZERO_ERROR;
-			collator = ucol_open(collcollate, &status);
-			if (U_FAILURE(status))
-				ereport(ERROR,
-						(errmsg("could not open collator for locale \"%s\": %s",
-								collcollate, u_errorName(status))));
-
-			if (U_ICU_VERSION_MAJOR_NUM < 54)
-				icu_set_collation_attributes(collator, collcollate);
-
-			/* We will leak this string if we get an error below :-( */
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext,
-														 collcollate);
-			result.info.icu.ucol = collator;
-#else							/* not USE_ICU */
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build"), \
-					 errhint("You need to rebuild PostgreSQL using --with-icu.")));
-#endif							/* not USE_ICU */
+			make_icu_collator(collcollate, collctype, &result);
 		}
 
-		collversion = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-									  &isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION for a libc locale, or manually creating a mess in
-				 * the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was specified",
-								NameStr(collform->collname))));
-			}
-			collversionstr = TextDatumGetCString(collversion);
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
+		check_collation_version(tp);
 
 		ReleaseSysCache(tp);
 
@@ -1505,6 +1530,13 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_DEFAULT)
+	{
+		/* Recurse to the real provider. */
+		collversion = get_collation_actual_version(global_locale.provider,
+												   global_locale.info.icu.locale);
+	}
+	else
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index e63a4e553b..d1b6ccc6a3 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -751,7 +751,7 @@ bpchareq(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -789,7 +789,7 @@ bpcharne(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -995,8 +995,13 @@ hashbpchar(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -1055,8 +1060,13 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 722b2c722d..0543a0688c 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1156,8 +1156,13 @@ text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -1499,6 +1504,8 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 
 		if (collid != DEFAULT_COLLATION_OID)
 			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
 
 		/*
 		 * memcmp() can't tell us which of two unequal strings sorts first,
@@ -1720,7 +1727,7 @@ texteq(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1774,7 +1781,7 @@ textne(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && global_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1886,8 +1893,13 @@ text_starts_with(PG_FUNCTION_ARGS)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &global_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -2002,6 +2014,8 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		 */
 		if (collid != DEFAULT_COLLATION_OID)
 			locale = pg_newlocale_from_collation(collid);
+		else if (global_locale.provider == COLLPROVIDER_ICU)
+			locale = &global_locale;
 
 		/*
 		 * There is a further exception on Windows.  When the database
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 29c5ec7b58..1a91b42798 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -31,6 +31,7 @@
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
@@ -404,6 +405,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	collate = NameStr(dbform->datcollate);
 	ctype = NameStr(dbform->datctype);
 
+	if (dbform->datcollprovider == COLLPROVIDER_LIBC)
+	{
 	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
 		ereport(FATAL,
 				(errmsg("database locale is incompatible with operating system"),
@@ -417,6 +420,24 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 				 errdetail("The database was initialized with LC_CTYPE \"%s\", "
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
+	}
+	else if (dbform->datcollprovider == COLLPROVIDER_ICU)
+	{
+		make_icu_collator(collate, ctype, &global_locale);
+	}
+
+	global_locale.provider = dbform->datcollprovider;
+	global_locale.deterministic = true;	// TODO
+
+	{
+		HeapTuple	tp;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(DEFAULT_COLLATION_OID));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", DEFAULT_COLLATION_OID);
+		check_collation_version(tp);
+		ReleaseSysCache(tp);
+	}
 
 	/* Make the locale settings visible as GUC variables, too */
 	SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_OVERRIDE);
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index 7c404430a9..a9335a8ba6 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -61,6 +61,8 @@ clean distclean maintainer-clean:
 # ensure that changes in datadir propagate into object file
 initdb.o: initdb.c $(top_builddir)/src/Makefile.global
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 88a261d9bd..62c310040a 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -129,6 +129,8 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static char collation_provider[] = {COLLPROVIDER_LIBC, '\0'};
+static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1412,10 +1414,13 @@ bootstrap_template1(void)
 							  encodingid_to_string(encodingid));
 
 	bki_lines = replace_token(bki_lines, "LC_COLLATE",
-							  escape_quotes_bki(lc_collate));
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : lc_collate));
 
 	bki_lines = replace_token(bki_lines, "LC_CTYPE",
-							  escape_quotes_bki(lc_ctype));
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : lc_ctype));
+
+	bki_lines = replace_token(bki_lines, "COLLPROVIDER",
+							  collation_provider);
 
 	/* Also ensure backend isn't confused by this environment var: */
 	unsetenv("PGCLIENTENCODING");
@@ -1707,6 +1712,12 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
+	/*
+	 * Set version of the default collation.
+	 */
+	PG_CMD_PRINTF("UPDATE pg_collation SET collversion = pg_collation_actual_version(oid) WHERE oid = %d;\n\n",
+				  DEFAULT_COLLATION_OID);
+
 	/*
 	 * Add an SQL-standard name.  We don't want to pin this, so it doesn't go
 	 * in pg_collation.h.  But add it before reading system collations, so
@@ -1995,8 +2006,6 @@ make_template0(FILE *cmdfd)
 {
 	const char *const *line;
 	static const char *const template0_setup[] = {
-		"CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n\n",
-
 		/*
 		 * We use the OID of template0 to determine datlastsysoid
 		 */
@@ -2021,6 +2030,9 @@ make_template0(FILE *cmdfd)
 		NULL
 	};
 
+	PG_CMD_PRINTF("CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false COLLATION_PROVIDER = %s;\n\n",
+				  collation_provider[0] == COLLPROVIDER_ICU ? "icu" : "libc");
+
 	for (line = template0_setup; *line; line++)
 		PG_CMD_PUTS(*line);
 }
@@ -2293,13 +2305,14 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale)
+			icu_locale = locale;
 	}
 
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
 	 */
-
 	check_locale_name(LC_CTYPE, lc_ctype, &canonname);
 	lc_ctype = canonname;
 	check_locale_name(LC_COLLATE, lc_collate, &canonname);
@@ -2318,6 +2331,18 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+
+	/*
+	 * If ICU is selected but no ICU locale has been given, take the
+	 * lc_collate locale and chop off any encoding suffix.  This should give
+	 * the user a configuration that resembles their operating system's locale
+	 * setup.
+	 */
+	if (collation_provider[0] == COLLPROVIDER_ICU && !icu_locale)
+	{
+		icu_locale = pg_strdup(lc_collate);
+		icu_locale[strcspn(icu_locale, ".")] = '\0';
+	}
 }
 
 /*
@@ -2333,9 +2358,12 @@ usage(const char *progname)
 	printf(_("  -A, --auth=METHOD         default authentication method for local connections\n"));
 	printf(_("      --auth-host=METHOD    default authentication method for local TCP/IP connections\n"));
 	printf(_("      --auth-local=METHOD   default authentication method for local-socket connections\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                            set default collation provider for new databases\n"));
 	printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
+	printf(_("      --icu-locale          set ICU locale for new databases\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
 			 "      --lc-monetary=, --lc-numeric=, --lc-time=LOCALE\n"
@@ -2510,7 +2538,8 @@ setup_locale_encoding(void)
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0)
+		strcmp(lc_ctype, lc_messages) == 0 &&
+		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
@@ -2527,9 +2556,13 @@ setup_locale_encoding(void)
 			   lc_monetary,
 			   lc_numeric,
 			   lc_time);
+		if (icu_locale)
+			printf(_("  ICU:      %s\n"), icu_locale);
 	}
 
-	if (!encoding)
+	if (!encoding && collation_provider[0] == COLLPROVIDER_ICU)
+		encodingid = PG_UTF8;
+	else if (!encoding)
 	{
 		int			ctype_enc;
 
@@ -3029,6 +3062,8 @@ main(int argc, char *argv[])
 		{"wal-segsize", required_argument, NULL, 12},
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
+		{"collation-provider", required_argument, NULL, 13},
+		{"icu-locale", required_argument, NULL, 14},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3167,6 +3202,20 @@ main(int argc, char *argv[])
 			case 'g':
 				SetDataDirectoryCreatePerm(PG_DIR_MODE_GROUP);
 				break;
+			case 13:
+				if (strcmp(optarg, "icu") == 0)
+					collation_provider[0] = COLLPROVIDER_ICU;
+				else if (strcmp(optarg, "libc") == 0)
+					collation_provider[0] = COLLPROVIDER_LIBC;
+				else
+				{
+					pg_log_error("unrecognized collation provider: %s", optarg);
+					exit(1);
+				}
+				break;
+			case 14:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 8387b945d3..90f6fc8f14 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -8,7 +8,7 @@ use Fcntl ':mode';
 use File::stat qw{lstat};
 use PostgresNode;
 use TestLib;
-use Test::More tests => 22;
+use Test::More tests => 24;
 
 my $tempdir = TestLib::tempdir;
 my $xlogdir = "$tempdir/pgxlog";
@@ -89,3 +89,19 @@ SKIP:
 	ok(check_mode_recursive($datadir_group, 0750, 0640),
 		'check PGDATA permissions');
 }
+
+# Collation provider tests
+
+if ($ENV{with_icu} eq 'yes')
+{
+	command_ok(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+			   'collation provider ICU');
+}
+else
+{
+	command_fails(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+				  'collation provider ICU fails since no ICU support');
+}
+
+command_fails(['initdb', '--no-sync', '--collation-provider=xyz', "$tempdir/dataX"],
+			  'fails for invalid collation provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f01fea5b91..9d7842583b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2638,6 +2638,7 @@ dumpDatabase(Archive *fout)
 				i_datname,
 				i_dba,
 				i_encoding,
+				i_datcollprovider,
 				i_collate,
 				i_ctype,
 				i_frozenxid,
@@ -2652,6 +2653,7 @@ dumpDatabase(Archive *fout)
 	const char *datname,
 			   *dba,
 			   *encoding,
+			   *datcollprovider,
 			   *collate,
 			   *ctype,
 			   *datacl,
@@ -2680,6 +2682,7 @@ dumpDatabase(Archive *fout)
 		appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
 						  "(%s datdba) AS dba, "
 						  "pg_encoding_to_char(encoding) AS encoding, "
+						  "datcollprovider, "
 						  "datcollate, datctype, datfrozenxid, datminmxid, "
 						  "(SELECT array_agg(acl ORDER BY row_n) FROM "
 						  "  (SELECT acl, row_n FROM "
@@ -2772,6 +2775,7 @@ dumpDatabase(Archive *fout)
 	i_datname = PQfnumber(res, "datname");
 	i_dba = PQfnumber(res, "dba");
 	i_encoding = PQfnumber(res, "encoding");
+	i_datcollprovider = PQfnumber(res, "datcollprovider");
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
@@ -2787,6 +2791,7 @@ dumpDatabase(Archive *fout)
 	datname = PQgetvalue(res, 0, i_datname);
 	dba = PQgetvalue(res, 0, i_dba);
 	encoding = PQgetvalue(res, 0, i_encoding);
+	datcollprovider = PQgetvalue(res, 0, i_datcollprovider);
 	collate = PQgetvalue(res, 0, i_collate);
 	ctype = PQgetvalue(res, 0, i_ctype);
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
@@ -2812,6 +2817,17 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ENCODING = ");
 		appendStringLiteralAH(creaQry, encoding, fout);
 	}
+	if (strlen(datcollprovider) > 0)
+	{
+		appendPQExpBufferStr(creaQry, " COLLATION_PROVIDER = ");
+		if (datcollprovider[0] == 'c')
+			appendPQExpBufferStr(creaQry, "libc");
+		else if (datcollprovider[0] == 'i')
+			appendPQExpBufferStr(creaQry, "icu");
+		else
+			fatal("unrecognized collation provider: %s",
+				  datcollprovider);
+	}
 	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
 	{
 		appendPQExpBufferStr(creaQry, " LOCALE = ");
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index d7c0fc0c1e..23fdbb92ae 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -880,6 +880,14 @@ listAllDbs(const char *pattern, bool verbose)
 						  "       d.datctype as \"%s\",\n",
 						  gettext_noop("Collate"),
 						  gettext_noop("Ctype"));
+	if (pset.sversion >= 130000)
+		appendPQExpBuffer(&buf,
+						  "       CASE d.datcollprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("Provider"));
 	appendPQExpBufferStr(&buf, "       ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose && pset.sversion >= 80200)
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index ede665090f..ef4f8593dc 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -53,6 +53,8 @@ clean distclean maintainer-clean:
 	rm -f common.o scripts_parallel.o $(WIN32RES)
 	rm -rf tmp_check
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 64bcc20cb4..5944fd3f63 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -37,6 +37,7 @@ main(int argc, char *argv[])
 		{"lc-ctype", required_argument, NULL, 2},
 		{"locale", required_argument, NULL, 'l'},
 		{"maintenance-db", required_argument, NULL, 3},
+		{"collation-provider", required_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -59,6 +60,7 @@ main(int argc, char *argv[])
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *locale = NULL;
+	char	   *collation_provider = NULL;
 
 	PQExpBufferData sql;
 
@@ -117,6 +119,9 @@ main(int argc, char *argv[])
 			case 3:
 				maintenance_db = pg_strdup(optarg);
 				break;
+			case 4:
+				collation_provider = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -193,6 +198,8 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " LC_COLLATE '%s'", lc_collate);
 	if (lc_ctype)
 		appendPQExpBuffer(&sql, " LC_CTYPE '%s'", lc_ctype);
+	if (collation_provider)
+		appendPQExpBuffer(&sql, " COLLATION_PROVIDER %s", collation_provider);
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -250,6 +257,8 @@ help(const char *progname)
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DBNAME] [DESCRIPTION]\n"), progname);
 	printf(_("\nOptions:\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                               collation provider for the database\n"));
 	printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index c0f6067a92..9e8220335f 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 13;
+use Test::More tests => 16;
 
 program_help_ok('createdb');
 program_version_ok('createdb');
@@ -22,5 +22,22 @@ $node->issues_sql_like(
 	qr/statement: CREATE DATABASE foobar2 ENCODING 'LATIN1'/,
 	'create database with encoding');
 
+if ($ENV{with_icu} eq 'yes')
+{
+	$node->issues_sql_like(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar3' ],
+		qr/statement: CREATE DATABASE foobar3 .* COLLATION_PROVIDER icu/,
+		'create database with ICU');
+}
+else
+{
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar3' ],
+		'create database with ICU fails since no ICU support');
+	pass;
+}
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
+$node->command_fails([ 'createdb', '-T', 'template0', '--collation-provider=xyz', 'foobarX' ],
+	'fails for invalid collation provider');
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index 89bd75d024..f261cdd838 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -15,7 +15,7 @@
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'default template for new databases',
   datname => 'template1', encoding => 'ENCODING', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datistemplate => 't', datallowconn => 't',
+  datctype => 'LC_CTYPE', datcollprovider => 'COLLPROVIDER', datistemplate => 't', datallowconn => 't',
   datconnlimit => '-1', datlastsysoid => '0', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datacl => '_null_' },
 
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 06fea45f53..ab3c0951df 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -46,6 +46,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* LC_CTYPE setting */
 	NameData	datctype;
 
+	/* see pg_collation.collprovider */
+	char		datcollprovider;
+
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b4b3aa5843..17fcee1e89 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -101,6 +101,12 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+extern struct pg_locale_struct global_locale;
+
+extern void make_icu_collator(const char *collcollate, const char *collctype,
+							  struct pg_locale_struct *resultp);
+extern void check_collation_version(HeapTuple colltuple);
+
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
-- 
2.20.1

Thomas Munro

thomas.munro@gmail.com

about 6 years ago

In reply to: Thomas Munro (#7)

Re: ICU for global collation

On Thu, Oct 17, 2019 at 3:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:

I rebased this patch, and tweaked get_collation_action_version() very
slightly so that you get collation version change detection (of the
ersatz kind provided by commit d5ac14f9) for the default collation
even when not using ICU. Please see attached.

It should also remove the sentence I recently added to
alter_collation.sgml to say that the default collation doesn't have
version tracking. Rereading that section, it's also clear that the
query introduced with:

"The following query can be used to identify all collations in the current
database that need to be refreshed and the objects that depend on them:"

… is wrong with this patch applied. The right query is quite hard to
come up with, since we don't explicitly track dependencies on the
default collation. That is, there is no pg_depend entry pointing from
the index to the collation when you write CREATE INDEX ON t(x) for a
text column using the default collation, but there is one when you
write CREATE INDEX ON t(x COLLATE "fr_FR"), or when you write CREATE
INDEX ON t(x) for a text column that was explicitly defined to use
COLLATE "fr_FR". One solution is that we could start tracking those
dependencies explicitly too.

A preexisting problem with that query is that it doesn't report
transitive dependencies. An index on t(x) of a user defined type
defined with CREATE TYPE my_type AS (x text COLLATE "fr_FR") doesn't
result in a pg_depend row from index to collation, so the query fails
to report that as an index needing to be rebuilt. You could fix that
with a sprinkle of recursive magic, but you'd need a different kind of
magic to deal with transitive dependencies on the default collation
unless we start listing such dependencies explicitly. In that
example, my_type would need to depend on collation "default". You
can't just do some kind of search for transitive dependencies on type
"text", because they aren't tracked either.

In my longer term proposal to track per-dependency versions, either by
adding refobjversion to pg_depend or by creating another
pg_depend-like catalog, you'd almost certainly need to add an explicit
record for dependencies on the default collation.

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

about 6 years ago

In reply to: Daniel Verite (#5)

Re: ICU for global collation

On 2019-09-17 15:08, Daniel Verite wrote:

When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).

fr-utf8=# select to_tsvector('ï¿œtï¿œ');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.

I looked into this problem. The way to address this would be adding
proper collation support to the text search subsystem. See the TODO
markers in src/backend/tsearch/ts_locale.c for starting points. These
APIs spread out to a lot of places, so it will take some time to finish.
In the meantime, I'm pausing this thread and will set the CF entry as RwF.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10

Daniel Verite

daniel@manitou-mail.org

about 6 years ago

In reply to: Peter Eisentraut (#9)

Re: ICU for global collation

Peter Eisentraut wrote:

I looked into this problem. The way to address this would be adding
proper collation support to the text search subsystem. See the TODO
markers in src/backend/tsearch/ts_locale.c for starting points. These
APIs spread out to a lot of places, so it will take some time to finish.
In the meantime, I'm pausing this thread and will set the CF entry as RwF.

Even if the FTS code is improved in that matter, any extension code
with libc functions depending on LC_CTYPE is still going to be
potentially problematic. In particular when it happens to be set
to a different encoding than the database.

Couldn't we simply invent per-database GUC options, as in
ALTER DATABASE myicudb SET libc_lc_ctype TO 'value';
ALTER DATABASE myicudb SET libc_lc_collate TO 'value';

where libc_lc_ctype/libc_lc_collate would specifically set
the values in the LC_CTYPE and LC_COLLATE environment vars
of any backend serving the corresponding database"?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#11

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

about 6 years ago

In reply to: Daniel Verite (#10)

Re: ICU for global collation

On 2019-11-01 19:18, Daniel Verite wrote:

Even if the FTS code is improved in that matter, any extension code
with libc functions depending on LC_CTYPE is still going to be
potentially problematic. In particular when it happens to be set
to a different encoding than the database.

I think the answer here is that extension code must not do that, at
least in ways that potentially interact with other parts of the
(collation-aware) database system. For example, libc and ICU might have
different opinions about what is a letter, because of different versions
of Unicode data in use. That would then affect tokenization etc. in
text search and elsewhere. That's why things like isalpha have to go
though ICU instead, if that is the collation provider in a particular
context.

Couldn't we simply invent per-database GUC options, as in
ALTER DATABASE myicudb SET libc_lc_ctype TO 'value';
ALTER DATABASE myicudb SET libc_lc_collate TO 'value';

where libc_lc_ctype/libc_lc_collate would specifically set
the values in the LC_CTYPE and LC_COLLATE environment vars
of any backend serving the corresponding database"?

We could do that as a transition measure to support extensions like you
mention above. But our own internal code should not have to rely on that.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Peter Eisentraut (#11)

1 attachment(s)

Re: ICU for global collation

There were a few inquiries about this topic recently, so I dug up the
old thread and patch. What we got stuck on last time was that we can't
just swap out all locale support in a database for ICU. We still need
to set the usual locale environment, otherwise some things that are not
ICU aware will break or degrade. I had initially anticipated fixing
that by converting everything that uses libc locales to ICU. But that
turned out to be tedious and ultimately not very useful as far as the
user-facing result is concerned, so I gave up.

So this is a different approach: If you choose ICU as the default locale
for a database, you still need to specify lc_ctype and lc_collate
settings, as before. Unlike in the previous patch, where the ICU
collation name was written in datcollate, there is now a third column
(daticucoll), so we can store all three values. This fixes the
described problem. Other than that, once you get all the initial
settings right, it basically just works: The places that have ICU
support now will use a database-wide ICU collation if appropriate, the
places that don't have ICU support continue to use the global libc
locale settings.

I changed the datcollate, datctype, and the new daticucoll fields to
type text (from name). That way, the daticucoll field can be set to
null if it's not applicable. Also, the limit of 63 characters can
actually be a problem if you want to use some combination of the options
that ICU locales offer. And for less extreme uses, having
variable-length fields will save some storage, since typical locale
names are much shorter.

For the same reasons and to keep things consistent, I also changed the
analogous pg_collation fields like that. This also removes some weird
code that has to check that colcollate and colctype have to be the same
for ICU, so it's overall cleaner.

Attachments:

v3-0001-Add-option-to-use-ICU-as-global-collation-provide.patchtext/plain; charset=UTF-8; name=v3-0001-Add-option-to-use-ICU-as-global-collation-provide.patchDownload

From 4eb9fbac238c1abf481fa43431ecc22e782a5290 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 30 Dec 2021 12:47:24 +0100
Subject: [PATCH v3] Add option to use ICU as global collation provider

This adds the option to use ICU as the default collation provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
---
 doc/src/sgml/catalogs.sgml                    |  13 +-
 doc/src/sgml/ref/create_database.sgml         |  16 ++
 doc/src/sgml/ref/createdb.sgml                |   9 +
 doc/src/sgml/ref/initdb.sgml                  |  23 ++
 src/backend/access/hash/hashfunc.c            |  18 +-
 src/backend/catalog/pg_collation.c            |  24 +-
 src/backend/commands/collationcmds.c          | 120 +++++++---
 src/backend/commands/dbcommands.c             |  93 ++++++--
 src/backend/regex/regc_pg_locale.c            |   7 +-
 src/backend/utils/adt/formatting.c            |   6 +
 src/backend/utils/adt/like.c                  |  20 +-
 src/backend/utils/adt/like_support.c          |   2 +
 src/backend/utils/adt/pg_locale.c             | 223 +++++++++++-------
 src/backend/utils/adt/varchar.c               |  22 +-
 src/backend/utils/adt/varlena.c               |  26 +-
 src/backend/utils/init/postinit.c             |  37 ++-
 src/bin/initdb/Makefile                       |   2 +
 src/bin/initdb/initdb.c                       |  62 ++++-
 src/bin/initdb/t/001_initdb.pl                |  18 +-
 src/bin/pg_dump/pg_dump.c                     |  16 ++
 src/bin/psql/describe.c                       |  23 +-
 src/bin/psql/tab-complete.c                   |   2 +-
 src/bin/scripts/Makefile                      |   2 +
 src/bin/scripts/createdb.c                    |   9 +
 src/bin/scripts/t/020_createdb.pl             |  20 +-
 src/include/catalog/pg_collation.dat          |   3 +-
 src/include/catalog/pg_collation.h            |   6 +-
 src/include/catalog/pg_database.dat           |   2 +-
 src/include/catalog/pg_database.h             |  16 +-
 src/include/utils/pg_locale.h                 |   6 +
 .../regress/expected/collate.icu.utf8.out     |  10 +-
 src/test/regress/sql/collate.icu.utf8.sql     |   8 +-
 32 files changed, 665 insertions(+), 199 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 03e2537b07..89e7279030 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2368,7 +2368,7 @@ <title><structname>pg_collation</structname> Columns</title>
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>collcollate</structfield> <type>name</type>
+       <structfield>collcollate</structfield> <type>text</type>
       </para>
       <para>
        <symbol>LC_COLLATE</symbol> for this collation object
@@ -2377,13 +2377,22 @@ <title><structname>pg_collation</structname> Columns</title>
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>collctype</structfield> <type>name</type>
+       <structfield>collctype</structfield> <type>text</type>
       </para>
       <para>
        <symbol>LC_CTYPE</symbol> for this collation object
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicucoll</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation string
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 41cb4068ec..7374a9fad5 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -28,6 +28,7 @@
            [ LOCALE [=] <replaceable class="parameter">locale</replaceable> ]
            [ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
            [ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
+           [ COLLATION_PROVIDER [=] <replaceable class="parameter">collation_provider</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
            [ ALLOW_CONNECTIONS [=] <replaceable class="parameter">allowconn</replaceable> ]
            [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ]
@@ -157,6 +158,21 @@ <title>Parameters</title>
        </para>
       </listitem>
      </varlistentry>
+
+     <varlistentry>
+      <term><replaceable>collation_provider</replaceable></term>
+
+      <listitem>
+       <para>
+        Specifies the provider to use for the default collation in this
+        database.  Possible values are:
+        <literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
+        <literal>libc</literal>.  <literal>libc</literal> is the default.  The
+        available choices depend on the operating system and build options.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><replaceable class="parameter">tablespace_name</replaceable></term>
       <listitem>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 86473455c9..4b07363fcc 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -83,6 +83,15 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        Specifies the collation provider for the database's default collation.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">tablespace</replaceable></option></term>
       <term><option>--tablespace=<replaceable class="parameter">tablespace</replaceable></option></term>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 8f71c7c962..77618d9a7a 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -166,6 +166,18 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        This option sets the collation provider for databases created in the
+        new cluster.  It can be overridden in the <command>CREATE
+        DATABASE</command> command when new databases are subsequently
+        created.  The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">directory</replaceable></option></term>
       <term><option>--pgdata=<replaceable class="parameter">directory</replaceable></option></term>
@@ -210,6 +222,17 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU collation provider is used.  If
+        this is not specified, the value from the <option>--locale</option>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index 242333920e..6c29816193 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -278,8 +278,13 @@ hashtext(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -334,8 +339,13 @@ hashtextextended(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 19068b652a..c3365e99c3 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -49,6 +49,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				bool collisdeterministic,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
+				const char *collicucoll,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -58,9 +59,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	HeapTuple	tup;
 	Datum		values[Natts_pg_collation];
 	bool		nulls[Natts_pg_collation];
-	NameData	name_name,
-				name_collate,
-				name_ctype;
+	NameData	name_name;
 	Oid			oid;
 	ObjectAddress myself,
 				referenced;
@@ -68,8 +67,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	AssertArg(collname);
 	AssertArg(collnamespace);
 	AssertArg(collowner);
-	AssertArg(collcollate);
-	AssertArg(collctype);
+	AssertArg((collcollate && collctype) || collicucoll);
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
@@ -163,10 +161,18 @@ CollationCreate(const char *collname, Oid collnamespace,
 	values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
 	values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
 	values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
-	namestrcpy(&name_collate, collcollate);
-	values[Anum_pg_collation_collcollate - 1] = NameGetDatum(&name_collate);
-	namestrcpy(&name_ctype, collctype);
-	values[Anum_pg_collation_collctype - 1] = NameGetDatum(&name_ctype);
+	if (collcollate)
+		values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
+	else
+		nulls[Anum_pg_collation_collcollate - 1] = true;
+	if (collctype)
+		values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
+	else
+		nulls[Anum_pg_collation_collctype - 1] = true;
+	if (collicucoll)
+		values[Anum_pg_collation_collicucoll - 1] = CStringGetTextDatum(collicucoll);
+	else
+		nulls[Anum_pg_collation_collicucoll - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53fc579f37..7dd125705b 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -65,6 +65,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *versionEl = NULL;
 	char	   *collcollate = NULL;
 	char	   *collctype = NULL;
+	char	   *collicucoll = NULL;
 	char	   *collproviderstr = NULL;
 	bool		collisdeterministic = true;
 	int			collencoding = 0;
@@ -129,18 +130,36 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	{
 		Oid			collid;
 		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
 
 		collid = get_collation_oid(defGetQualifiedName(fromEl), false);
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collid);
 
-		collcollate = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collcollate));
-		collctype = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collctype));
 		collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 		collisdeterministic = ((Form_pg_collation) GETSTRUCT(tp))->collisdeterministic;
 		collencoding = ((Form_pg_collation) GETSTRUCT(tp))->collencoding;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+		if (!isnull)
+			collcollate = TextDatumGetCString(datum);
+		else
+			collcollate = NULL;
+
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+		if (!isnull)
+			collctype = TextDatumGetCString(datum);
+		else
+			collctype = NULL;
+
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicucoll, &isnull);
+		if (!isnull)
+			collicucoll = TextDatumGetCString(datum);
+		else
+			collicucoll = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -156,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					 errmsg("collation \"default\" cannot be copied")));
 	}
 
-	if (localeEl)
-	{
-		collcollate = defGetString(localeEl);
-		collctype = defGetString(localeEl);
-	}
-
-	if (lccollateEl)
-		collcollate = defGetString(lccollateEl);
-
-	if (lcctypeEl)
-		collctype = defGetString(lcctypeEl);
-
 	if (providerEl)
 		collproviderstr = defGetString(providerEl);
 
@@ -192,15 +199,43 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	else if (!fromEl)
 		collprovider = COLLPROVIDER_LIBC;
 
-	if (!collcollate)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_collate\" must be specified")));
+	if (localeEl)
+	{
+		if (collprovider == COLLPROVIDER_LIBC)
+		{
+			collcollate = defGetString(localeEl);
+			collctype = defGetString(localeEl);
+		}
+		else
+			collicucoll = defGetString(localeEl);
+	}
 
-	if (!collctype)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_ctype\" must be specified")));
+	if (lccollateEl)
+		collcollate = defGetString(lccollateEl);
+
+	if (lcctypeEl)
+		collctype = defGetString(lcctypeEl);
+
+	if (collprovider == COLLPROVIDER_LIBC)
+	{
+		if (!collcollate)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"lc_collate\" must be specified")));
+
+		if (!collctype)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"lc_ctype\" must be specified")));
+	}
+
+	if (collprovider == COLLPROVIDER_ICU)
+	{
+		if (!collicucoll)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"locale\" must be specified")));
+	}
 
 	/*
 	 * Nondeterministic collations are currently only supported with ICU
@@ -243,7 +278,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collcollate);
+		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? collicucoll : collcollate);
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -253,6 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collencoding,
 							 collcollate,
 							 collctype,
+							 collicucoll,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -336,7 +372,13 @@ AlterCollation(AlterCollationStmt *stmt)
 								  &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(collversion);
 
-	newversion = get_collation_actual_version(collForm->collprovider, NameStr(collForm->collcollate));
+	{
+		Datum	datum;
+
+		datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_collicucoll : Anum_pg_collation_collcollate, &isnull);
+		Assert(!isnull);
+		newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	}
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -383,8 +425,9 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 {
 	Oid			collid = PG_GETARG_OID(0);
 	HeapTuple	tp;
-	char	   *collcollate;
 	char		collprovider;
+	Datum		datum;
+	bool		isnull;
 	char	   *version;
 
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
@@ -393,12 +436,19 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("collation with OID %u does not exist", collid)));
 
-	collcollate = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collcollate));
 	collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 
-	ReleaseSysCache(tp);
+	if (collprovider != COLLPROVIDER_DEFAULT)
+	{
+		datum = SysCacheGetAttr(COLLOID, tp, collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_collicucoll : Anum_pg_collation_collcollate, &isnull);
+		Assert(!isnull);
+
+		version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
+	}
+	else
+		version = NULL;
 
-	version = get_collation_actual_version(collprovider, collcollate);
+	ReleaseSysCache(tp);
 
 	if (version)
 		PG_RETURN_TEXT_P(cstring_to_text(version));
@@ -623,7 +673,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			 */
 			collid = CollationCreate(localebuf, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 localebuf, localebuf,
+									 localebuf, localebuf, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 									 true, true);
 			if (OidIsValid(collid))
@@ -684,7 +734,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale,
+									 locale, locale, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -725,7 +775,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
-			const char *collcollate;
+			const char *icucollstr;
 			Oid			collid;
 
 			if (i == -1)
@@ -734,20 +784,20 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 				name = uloc_getAvailable(i);
 
 			langtag = get_icu_language_tag(name);
-			collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
+			icucollstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
 
 			/*
 			 * Be paranoid about not allowing any non-ASCII strings into
 			 * pg_collation
 			 */
-			if (!pg_is_ascii(langtag) || !pg_is_ascii(collcollate))
+			if (!pg_is_ascii(langtag) || !pg_is_ascii(icucollstr))
 				continue;
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 collcollate, collcollate,
-									 get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
+									 NULL, NULL, icucollstr,
+									 get_collation_actual_version(COLLPROVIDER_ICU, icucollstr),
 									 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 029fab48df..7928790cc9 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -36,6 +36,7 @@
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_subscription.h"
@@ -86,7 +87,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 						MultiXactId *dbMinMultiP,
-						Oid *dbTablespace, char **dbCollate, char **dbCtype);
+						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIcucoll,
+						char *dbCollProvider);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -106,6 +108,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	int			src_encoding = -1;
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
+	char	   *src_icucoll = NULL;
+	char		src_collprovider;
 	bool		src_istemplate;
 	bool		src_allowconn;
 	Oid			src_lastsysoid = InvalidOid;
@@ -127,6 +131,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dlocale = NULL;
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
+	DefElem	   *dcollprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
@@ -135,6 +140,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	const char *dbtemplate = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
+	char	   *dbicucoll = NULL;
+	char		dbcollprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
 	bool		dbistemplate = false;
@@ -191,6 +198,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			dctype = defel;
 		}
+		else if (strcmp(defel->defname, "collation_provider") == 0)
+		{
+			if (dcollprovider)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			dcollprovider = defel;
+		}
 		else if (strcmp(defel->defname, "is_template") == 0)
 		{
 			if (distemplate)
@@ -224,12 +240,6 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	if (dlocale && (dcollate || dctype))
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("conflicting or redundant options"),
-				 errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.")));
-
 	if (downer && downer->arg)
 		dbowner = defGetString(downer);
 	if (dtemplate && dtemplate->arg)
@@ -266,11 +276,29 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		dbcollate = defGetString(dlocale);
 		dbctype = defGetString(dlocale);
+		dbicucoll = defGetString(dlocale);
 	}
 	if (dcollate && dcollate->arg)
 		dbcollate = defGetString(dcollate);
 	if (dctype && dctype->arg)
 		dbctype = defGetString(dctype);
+	if (dcollprovider && dcollprovider->arg)
+	{
+		char	   *collproviderstr = defGetString(dcollprovider);
+
+#ifdef USE_ICU
+		if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			dbcollprovider = COLLPROVIDER_ICU;
+		else
+#endif
+		if (pg_strcasecmp(collproviderstr, "libc") == 0)
+			dbcollprovider = COLLPROVIDER_LIBC;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("unrecognized collation provider: %s",
+							collproviderstr)));
+	}
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -320,7 +348,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn, &src_lastsysoid,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype))
+					 &src_collate, &src_ctype, &src_icucoll, &src_collprovider))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -346,6 +374,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = src_collate;
 	if (dbctype == NULL)
 		dbctype = src_ctype;
+	if (dbicucoll == NULL)
+		dbicucoll = src_icucoll;
+	if (dbcollprovider == '\0')
+		dbcollprovider = src_collprovider;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -525,10 +557,13 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbname));
 	new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
 	new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
-	new_record[Anum_pg_database_datcollate - 1] =
-		DirectFunctionCall1(namein, CStringGetDatum(dbcollate));
-	new_record[Anum_pg_database_datctype - 1] =
-		DirectFunctionCall1(namein, CStringGetDatum(dbctype));
+	new_record[Anum_pg_database_datcollate - 1] = CStringGetTextDatum(dbcollate);
+	new_record[Anum_pg_database_datctype - 1] = CStringGetTextDatum(dbctype);
+	if (dbicucoll)
+		new_record[Anum_pg_database_daticucoll - 1] = CStringGetTextDatum(dbicucoll);
+	else
+		new_record_nulls[Anum_pg_database_daticucoll] = true;
+	new_record[Anum_pg_database_datcollprovider - 1] = CharGetDatum(dbcollprovider);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -802,7 +837,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1001,7 +1036,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1114,7 +1149,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -1759,7 +1794,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			Oid *dbLastSysOidP, TransactionId *dbFrozenXidP,
 			MultiXactId *dbMinMultiP,
-			Oid *dbTablespace, char **dbCollate, char **dbCtype)
+			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIcucoll,
+			char *dbCollProvider)
 {
 	bool		result = false;
 	Relation	relation;
@@ -1824,6 +1860,9 @@ get_db_info(const char *name, LOCKMODE lockmode,
 
 			if (strcmp(name, NameStr(dbform->datname)) == 0)
 			{
+				Datum		datum;
+				bool		isnull;
+
 				/* oid of the database */
 				if (dbIdP)
 					*dbIdP = dbOid;
@@ -1852,10 +1891,28 @@ get_db_info(const char *name, LOCKMODE lockmode,
 				if (dbTablespace)
 					*dbTablespace = dbform->dattablespace;
 				/* default locale settings for this database */
+				if (dbCollProvider)
+					*dbCollProvider = dbform->datcollprovider;
 				if (dbCollate)
-					*dbCollate = pstrdup(NameStr(dbform->datcollate));
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollate, &isnull);
+					Assert(!isnull);
+					*dbCollate = TextDatumGetCString(datum);
+				}
 				if (dbCtype)
-					*dbCtype = pstrdup(NameStr(dbform->datctype));
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datctype, &isnull);
+					Assert(!isnull);
+					*dbCtype = TextDatumGetCString(datum);
+				}
+				if (dbIcucoll)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticucoll, &isnull);
+					if (isnull)
+						*dbIcucoll = NULL;
+					else
+						*dbIcucoll = TextDatumGetCString(datum);
+				}
 				ReleaseSysCache(tuple);
 				result = true;
 				break;
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index bbbd61c604..3fe0f1c386 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -241,7 +241,12 @@ pg_set_regex_collation(Oid collation)
 	else
 	{
 		if (collation == DEFAULT_COLLATION_OID)
-			pg_regex_locale = 0;
+		{
+			if (default_locale.provider == COLLPROVIDER_ICU)
+				pg_regex_locale = &default_locale;
+			else
+				pg_regex_locale = 0;
+		}
 		else if (OidIsValid(collation))
 		{
 			/*
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 419469fab5..320047fbd0 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1666,6 +1666,8 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1790,6 +1792,8 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1915,6 +1919,8 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 			}
 			mylocale = pg_newlocale_from_collation(collid);
 		}
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index eed183cd0d..85a668fa36 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -150,9 +150,14 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
 static inline int
 GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation)
 {
-	if (collation && !lc_ctype_is_c(collation) && collation != DEFAULT_COLLATION_OID)
+	if (collation && !lc_ctype_is_c(collation))
 	{
-		pg_locale_t locale = pg_newlocale_from_collation(collation);
+		pg_locale_t locale = 0;
+
+		if (collation != DEFAULT_COLLATION_OID)
+			locale = pg_newlocale_from_collation(collation);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			locale = &default_locale;
 
 		if (locale && !locale->deterministic)
 			ereport(ERROR,
@@ -195,11 +200,14 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 		}
 		locale = pg_newlocale_from_collation(collation);
 
-		if (locale && !locale->deterministic)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("nondeterministic collations are not supported for ILIKE")));
 	}
+	else if (default_locale.provider == COLLPROVIDER_ICU)
+		locale = &default_locale;
+
+	if (locale && !locale->deterministic)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
 	 * For efficiency reasons, in the single byte case we don't call lower()
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 988568825e..6b82ad1d43 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1030,6 +1030,8 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 			}
 			locale = pg_newlocale_from_collation(collation);
 		}
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			locale = &default_locale;
 	}
 
 	if (typeid != BYTEAOID)
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index cc2ab95535..8d32bc68d8 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1289,21 +1289,36 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		/* Attempt to set the flags */
 		HeapTuple	tp;
 		Form_pg_collation collform;
-		const char *collcollate;
-		const char *collctype;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collation));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collation);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		collcollate = NameStr(collform->collcollate);
-		collctype = NameStr(collform->collctype);
-
-		cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-									 (strcmp(collcollate, "POSIX") == 0));
-		cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-								   (strcmp(collctype, "POSIX") == 0));
+		if (collform->collprovider == COLLPROVIDER_LIBC)
+		{
+			Datum		datum;
+			bool		isnull;
+			const char *collcollate;
+			const char *collctype;
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
+										 (strcmp(collcollate, "POSIX") == 0));
+			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
+									   (strcmp(collctype, "POSIX") == 0));
+		}
+		else
+		{
+			cache_entry->collate_is_c = false;
+			cache_entry->ctype_is_c = false;
+		}
 
 		cache_entry->flags_valid = true;
 
@@ -1336,6 +1351,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_COLLATE, NULL);
@@ -1386,6 +1404,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_CTYPE, NULL);
@@ -1414,6 +1435,88 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+struct pg_locale_struct default_locale;
+
+void
+make_icu_collator(const char *icucollstr,
+				  struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+	UCollator  *collator;
+	UErrorCode	status;
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(icucollstr, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open collator for locale \"%s\": %s",
+						icucollstr, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+		icu_set_collation_attributes(collator, icucollstr);
+
+	/* We will leak this string if we get an error below :-( */
+	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, icucollstr);
+	resultp->info.icu.ucol = collator;
+#else							/* not USE_ICU */
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build"), \
+			 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif							/* not USE_ICU */
+}
+
+void
+check_collation_version(HeapTuple colltuple)
+{
+	Form_pg_collation collform;
+	Datum		collversion;
+	bool		isnull;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+
+	collversion = SysCacheGetAttr(COLLOID, colltuple, Anum_pg_collation_collversion,
+								  &isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+		Datum		datum;
+		bool		isnull;
+
+		datum = SysCacheGetAttr(COLLOID, colltuple, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_collicucoll : Anum_pg_collation_collcollate, &isnull);
+		Assert(!isnull);
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 TextDatumGetCString(datum));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE
+			 * COLLATION for a libc locale, or manually creating a mess in
+			 * the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was specified",
+							NameStr(collform->collname))));
+		}
+		collversionstr = TextDatumGetCString(collversion);
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
+	}
+}
 
 /* simple subroutine for reporting errors from newlocale() */
 #ifdef HAVE_LOCALE_T
@@ -1483,21 +1586,14 @@ pg_newlocale_from_collation(Oid collid)
 		/* We haven't computed this yet in this session, so do it */
 		HeapTuple	tp;
 		Form_pg_collation collform;
-		const char *collcollate;
-		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
-		Datum		collversion;
-		bool		isnull;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collid);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		collcollate = NameStr(collform->collcollate);
-		collctype = NameStr(collform->collctype);
-
 		/* We'll fill in the result struct locally before allocating memory */
 		memset(&result, 0, sizeof(result));
 		result.provider = collform->collprovider;
@@ -1506,8 +1602,19 @@ pg_newlocale_from_collation(Oid collid)
 		if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
+			Datum		datum;
+			bool		isnull;
+			const char *collcollate;
+			const char *collctype pg_attribute_unused();
 			locale_t	loc;
 
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
 			if (strcmp(collcollate, collctype) == 0)
 			{
 				/* Normal case where they're the same */
@@ -1558,72 +1665,17 @@ pg_newlocale_from_collation(Oid collid)
 		}
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
-			UCollator  *collator;
-			UErrorCode	status;
-
-			if (strcmp(collcollate, collctype) != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("collations with different collate and ctype values are not supported by ICU")));
-
-			status = U_ZERO_ERROR;
-			collator = ucol_open(collcollate, &status);
-			if (U_FAILURE(status))
-				ereport(ERROR,
-						(errmsg("could not open collator for locale \"%s\": %s",
-								collcollate, u_errorName(status))));
-
-			if (U_ICU_VERSION_MAJOR_NUM < 54)
-				icu_set_collation_attributes(collator, collcollate);
-
-			/* We will leak this string if we get an error below :-( */
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext,
-														 collcollate);
-			result.info.icu.ucol = collator;
-#else							/* not USE_ICU */
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build"), \
-					 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
-#endif							/* not USE_ICU */
+			Datum		datum;
+			bool		isnull;
+			const char *icucollstr;;
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicucoll, &isnull);
+			Assert(!isnull);
+			icucollstr = TextDatumGetCString(datum);
+			make_icu_collator(icucollstr, &result);
 		}
 
-		collversion = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-									  &isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION for a libc locale, or manually creating a mess in
-				 * the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was specified",
-								NameStr(collform->collname))));
-			}
-			collversionstr = TextDatumGetCString(collversion);
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
+		check_collation_version(tp);
 
 		ReleaseSysCache(tp);
 
@@ -1646,6 +1698,17 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_DEFAULT)
+	{
+#ifdef USE_ICU
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			collversion = get_collation_actual_version(default_locale.provider,
+													   default_locale.info.icu.locale);
+		else
+#endif
+			collversion = NULL;
+	}
+	else
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 8fc84649f1..7eb9e59a2c 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -750,7 +750,7 @@ bpchareq(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && default_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -790,7 +790,7 @@ bpcharne(PG_FUNCTION_ARGS)
 	len2 = bcTruelen(arg2);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && default_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		/*
@@ -996,8 +996,13 @@ hashbpchar(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
@@ -1056,8 +1061,13 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
 
 	if (!mylocale || mylocale->deterministic)
 	{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index bd3091bbfb..5492c85f36 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1200,8 +1200,13 @@ text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -1560,6 +1565,8 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 
 		if (collid != DEFAULT_COLLATION_OID)
 			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
 
 		/*
 		 * memcmp() can't tell us which of two unequal strings sorts first,
@@ -1781,7 +1788,7 @@ texteq(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && default_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1835,7 +1842,7 @@ textne(PG_FUNCTION_ARGS)
 	check_collation_set(collid);
 
 	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
+		(collid == DEFAULT_COLLATION_OID && default_locale.deterministic) ||
 		pg_newlocale_from_collation(collid)->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
@@ -1947,8 +1954,13 @@ text_starts_with(PG_FUNCTION_ARGS)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
 
 	if (mylocale && !mylocale->deterministic)
 		ereport(ERROR,
@@ -2063,6 +2075,8 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		 */
 		if (collid != DEFAULT_COLLATION_OID)
 			locale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			locale = &default_locale;
 
 		/*
 		 * There is a further exception on Windows.  When the database
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 7292e51f7d..2319166e91 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -30,6 +30,7 @@
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
@@ -53,6 +54,7 @@
 #include "storage/sync.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
+#include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -306,6 +308,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
+	Datum		datum;
+	bool		isnull;
 	char	   *collate;
 	char	   *ctype;
 
@@ -389,8 +393,12 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 					PGC_BACKEND, PGC_S_DYNAMIC_DEFAULT);
 
 	/* assign locale variables */
-	collate = NameStr(dbform->datcollate);
-	ctype = NameStr(dbform->datctype);
+	datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollate, &isnull);
+	Assert(!isnull);
+	collate = TextDatumGetCString(datum);
+	datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datctype, &isnull);
+	Assert(!isnull);
+	ctype = TextDatumGetCString(datum);
 
 	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
 		ereport(FATAL,
@@ -406,6 +414,31 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	if (dbform->datcollprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticucoll, &isnull);
+		Assert(!isnull);
+		make_icu_collator(TextDatumGetCString(datum), &default_locale);
+	}
+
+	default_locale.provider = dbform->datcollprovider;
+	/*
+	 * Default locale is currently always deterministic.  Nondeterministic
+	 * locales currently don't support pattern matching, which would break a
+	 * lot of things if applied globally.
+	 */
+	default_locale.deterministic = true;
+
+	{
+		HeapTuple	tp;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(DEFAULT_COLLATION_OID));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", DEFAULT_COLLATION_OID);
+		check_collation_version(tp);
+		ReleaseSysCache(tp);
+	}
+
 	/* Make the locale settings visible as GUC variables, too */
 	SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_OVERRIDE);
 	SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_OVERRIDE);
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index a620a5bea0..993d2fa7a3 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -62,6 +62,8 @@ clean distclean maintainer-clean:
 # ensure that changes in datadir propagate into object file
 initdb.o: initdb.c $(top_builddir)/src/Makefile.global
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 03b80f9575..9ee0037d50 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -131,6 +131,8 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static char collation_provider[] = {COLLPROVIDER_LIBC, '\0'};
+static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1404,6 +1406,12 @@ bootstrap_template1(void)
 	bki_lines = replace_token(bki_lines, "LC_CTYPE",
 							  escape_quotes_bki(lc_ctype));
 
+	bki_lines = replace_token(bki_lines, "ICUCOLL",
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : "_null_"));
+
+	bki_lines = replace_token(bki_lines, "COLLPROVIDER",
+							  collation_provider);
+
 	/* Also ensure backend isn't confused by this environment var: */
 	unsetenv("PGCLIENTENCODING");
 
@@ -1586,6 +1594,12 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
+	/*
+	 * Set version of the default collation.
+	 */
+	PG_CMD_PRINTF("UPDATE pg_collation SET collversion = pg_collation_actual_version(oid) WHERE oid = %d;\n\n",
+				  DEFAULT_COLLATION_OID);
+
 	/*
 	 * Add an SQL-standard name.  We don't want to pin this, so it doesn't go
 	 * in pg_collation.h.  But add it before reading system collations, so
@@ -1839,8 +1853,6 @@ make_template0(FILE *cmdfd)
 {
 	const char *const *line;
 	static const char *const template0_setup[] = {
-		"CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false;\n\n",
-
 		/*
 		 * We use the OID of template0 to determine datlastsysoid
 		 */
@@ -1865,6 +1877,9 @@ make_template0(FILE *cmdfd)
 		NULL
 	};
 
+	PG_CMD_PRINTF("CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false COLLATION_PROVIDER = %s;\n\n",
+				  collation_provider[0] == COLLPROVIDER_ICU ? "icu" : "libc");
+
 	for (line = template0_setup; *line; line++)
 		PG_CMD_PUTS(*line);
 }
@@ -2136,13 +2151,14 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale)
+			icu_locale = locale;
 	}
 
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
 	 */
-
 	check_locale_name(LC_CTYPE, lc_ctype, &canonname);
 	lc_ctype = canonname;
 	check_locale_name(LC_COLLATE, lc_collate, &canonname);
@@ -2161,6 +2177,18 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+
+	/*
+	 * If ICU is selected but no ICU locale has been given, take the
+	 * lc_collate locale and chop off any encoding suffix.  This should give
+	 * the user a configuration that resembles their operating system's locale
+	 * setup.
+	 */
+	if (collation_provider[0] == COLLPROVIDER_ICU && !icu_locale)
+	{
+		icu_locale = pg_strdup(lc_collate);
+		icu_locale[strcspn(icu_locale, ".")] = '\0';
+	}
 }
 
 /*
@@ -2176,9 +2204,12 @@ usage(const char *progname)
 	printf(_("  -A, --auth=METHOD         default authentication method for local connections\n"));
 	printf(_("      --auth-host=METHOD    default authentication method for local TCP/IP connections\n"));
 	printf(_("      --auth-local=METHOD   default authentication method for local-socket connections\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                            set default collation provider for new databases\n"));
 	printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
+	printf(_("      --icu-locale          set ICU locale for new databases\n"));
 	printf(_("  -k, --data-checksums      use data page checksums\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
@@ -2353,7 +2384,8 @@ setup_locale_encoding(void)
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0)
+		strcmp(lc_ctype, lc_messages) == 0 &&
+		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
@@ -2370,9 +2402,13 @@ setup_locale_encoding(void)
 			   lc_monetary,
 			   lc_numeric,
 			   lc_time);
+		if (icu_locale)
+			printf(_("  ICU:      %s\n"), icu_locale);
 	}
 
-	if (!encoding)
+	if (!encoding && collation_provider[0] == COLLPROVIDER_ICU)
+		encodingid = PG_UTF8;
+	else if (!encoding)
 	{
 		int			ctype_enc;
 
@@ -2876,6 +2912,8 @@ main(int argc, char *argv[])
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
 		{"discard-caches", no_argument, NULL, 14},
+		{"collation-provider", required_argument, NULL, 15},
+		{"icu-locale", required_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3022,6 +3060,20 @@ main(int argc, char *argv[])
 										 extra_options,
 										 "-c debug_discard_caches=1");
 				break;
+			case 15:
+				if (strcmp(optarg, "icu") == 0)
+					collation_provider[0] = COLLPROVIDER_ICU;
+				else if (strcmp(optarg, "libc") == 0)
+					collation_provider[0] = COLLPROVIDER_LIBC;
+				else
+				{
+					pg_log_error("unrecognized collation provider: %s", optarg);
+					exit(1);
+				}
+				break;
+			case 16:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 6796d8520e..6b3208a03d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -11,7 +11,7 @@
 use File::stat qw{lstat};
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 22;
+use Test::More tests => 24;
 
 my $tempdir = PostgreSQL::Test::Utils::tempdir;
 my $xlogdir = "$tempdir/pgxlog";
@@ -92,3 +92,19 @@
 	ok(check_mode_recursive($datadir_group, 0750, 0640),
 		'check PGDATA permissions');
 }
+
+# Collation provider tests
+
+if ($ENV{with_icu} eq 'yes')
+{
+	command_ok(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+			   'collation provider ICU');
+}
+else
+{
+	command_fails(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+				  'collation provider ICU fails since no ICU support');
+}
+
+command_fails(['initdb', '--no-sync', '--collation-provider=xyz', "$tempdir/dataX"],
+			  'fails for invalid collation provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index b52f3ccda2..7c6af8d2ef 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2740,6 +2740,7 @@ dumpDatabase(Archive *fout)
 				i_datname,
 				i_dba,
 				i_encoding,
+				i_datcollprovider,
 				i_collate,
 				i_ctype,
 				i_frozenxid,
@@ -2755,6 +2756,7 @@ dumpDatabase(Archive *fout)
 	const char *datname,
 			   *dba,
 			   *encoding,
+			   *datcollprovider,
 			   *collate,
 			   *ctype,
 			   *datistemplate,
@@ -2774,6 +2776,7 @@ dumpDatabase(Archive *fout)
 		appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
 						  "(%s datdba) AS dba, "
 						  "pg_encoding_to_char(encoding) AS encoding, "
+						  "datcollprovider, "
 						  "datcollate, datctype, datfrozenxid, datminmxid, "
 						  "datacl, acldefault('d', datdba) AS acldefault, "
 						  "datistemplate, datconnlimit, "
@@ -2807,6 +2810,7 @@ dumpDatabase(Archive *fout)
 	i_datname = PQfnumber(res, "datname");
 	i_dba = PQfnumber(res, "dba");
 	i_encoding = PQfnumber(res, "encoding");
+	i_datcollprovider = PQfnumber(res, "datcollprovider");
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
@@ -2822,6 +2826,7 @@ dumpDatabase(Archive *fout)
 	datname = PQgetvalue(res, 0, i_datname);
 	dba = PQgetvalue(res, 0, i_dba);
 	encoding = PQgetvalue(res, 0, i_encoding);
+	datcollprovider = PQgetvalue(res, 0, i_datcollprovider);
 	collate = PQgetvalue(res, 0, i_collate);
 	ctype = PQgetvalue(res, 0, i_ctype);
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
@@ -2847,6 +2852,17 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ENCODING = ");
 		appendStringLiteralAH(creaQry, encoding, fout);
 	}
+	if (strlen(datcollprovider) > 0)
+	{
+		appendPQExpBufferStr(creaQry, " COLLATION_PROVIDER = ");
+		if (datcollprovider[0] == 'c')
+			appendPQExpBufferStr(creaQry, "libc");
+		else if (datcollprovider[0] == 'i')
+			appendPQExpBufferStr(creaQry, "icu");
+		else
+			fatal("unrecognized collation provider: %s",
+				  datcollprovider);
+	}
 	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
 	{
 		appendPQExpBufferStr(creaQry, " LOCALE = ");
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index c28788e84f..0d710bd47e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -896,6 +896,18 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"),
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "       d.daticucoll as \"%s\",\n"
+						  "       CASE d.datcollprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("ICU Collation"),
+						  gettext_noop("Coll. Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       d.datcollate as \"%s\",\n"
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("ICU Collation"),
+						  gettext_noop("Coll. Provider"));
 	appendPQExpBufferStr(&buf, "       ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose)
@@ -4573,7 +4585,7 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, false, true, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, true, false};
 
 	initPQExpBuffer(&buf);
 
@@ -4587,6 +4599,15 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
 
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collicucoll AS \"%s\"",
+						  gettext_noop("ICU Collation"));
+	else
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collcollate AS \"%s\"",
+						  gettext_noop("ICU Collation"));
+
 	if (pset.sversion >= 100000)
 		appendPQExpBuffer(&buf,
 						  ",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index cf30239f6d..7db4a68df6 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2587,7 +2587,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("OWNER", "TEMPLATE", "ENCODING", "TABLESPACE",
 					  "IS_TEMPLATE",
 					  "ALLOW_CONNECTIONS", "CONNECTION LIMIT",
-					  "LC_COLLATE", "LC_CTYPE", "LOCALE");
+					  "LC_COLLATE", "LC_CTYPE", "LOCALE", "COLLATION_PROVIDER");
 
 	else if (Matches("CREATE", "DATABASE", MatchAny, "TEMPLATE"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_template_databases);
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index b8d7cf2f2d..342a57d71b 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -53,6 +53,8 @@ clean distclean maintainer-clean:
 	rm -f common.o $(WIN32RES)
 	rm -rf tmp_check
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 041454f075..1944580f36 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -38,6 +38,7 @@ main(int argc, char *argv[])
 		{"lc-ctype", required_argument, NULL, 2},
 		{"locale", required_argument, NULL, 'l'},
 		{"maintenance-db", required_argument, NULL, 3},
+		{"collation-provider", required_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -61,6 +62,7 @@ main(int argc, char *argv[])
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *locale = NULL;
+	char	   *collation_provider = NULL;
 
 	PQExpBufferData sql;
 
@@ -119,6 +121,9 @@ main(int argc, char *argv[])
 			case 3:
 				maintenance_db = pg_strdup(optarg);
 				break;
+			case 4:
+				collation_provider = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -217,6 +222,8 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(&sql, " LC_CTYPE ");
 		appendStringLiteralConn(&sql, lc_ctype, conn);
 	}
+	if (collation_provider)
+		appendPQExpBuffer(&sql, " COLLATION_PROVIDER %s", collation_provider);
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -267,6 +274,8 @@ help(const char *progname)
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DBNAME] [DESCRIPTION]\n"), progname);
 	printf(_("\nOptions:\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                               collation provider for the database's default collation\n"));
 	printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 6bcc59de08..e1a4af384c 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -6,7 +6,7 @@
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 25;
+use Test::More tests => 28;
 
 program_help_ok('createdb');
 program_version_ok('createdb');
@@ -25,9 +25,27 @@
 	qr/statement: CREATE DATABASE foobar2 ENCODING 'LATIN1'/,
 	'create database with encoding');
 
+if ($ENV{with_icu} eq 'yes')
+{
+	$node->issues_sql_like(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar4' ],
+		qr/statement: CREATE DATABASE foobar4 .* COLLATION_PROVIDER icu/,
+		'create database with ICU');
+}
+else
+{
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar4' ],
+		'create database with ICU fails since no ICU support');
+	pass;
+}
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
+$node->command_fails([ 'createdb', '-T', 'template0', '--collation-provider=xyz', 'foobarX' ],
+	'fails for invalid collation provider');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index 6e0ab1ab4b..2a8cf2be7c 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -14,8 +14,7 @@
 
 { oid => '100', oid_symbol => 'DEFAULT_COLLATION_OID',
   descr => 'database\'s default collation',
-  collname => 'default', collprovider => 'd', collencoding => '-1',
-  collcollate => '', collctype => '' },
+  collname => 'default', collprovider => 'd', collencoding => '-1' },
 { oid => '950', oid_symbol => 'C_COLLATION_OID',
   descr => 'standard C collation',
   collname => 'C', collprovider => 'c', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index 03bd4cb5d4..bff59abe92 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -39,9 +39,10 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	char		collprovider;	/* see constants below */
 	bool		collisdeterministic BKI_DEFAULT(t);
 	int32		collencoding;	/* encoding for this collation; -1 = "all" */
-	NameData	collcollate;	/* LC_COLLATE setting */
-	NameData	collctype;		/* LC_CTYPE setting */
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
+	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
+	text		collctype BKI_DEFAULT(_null_);		/* LC_CTYPE setting */
+	text		collicucoll BKI_DEFAULT(_null_);	/* ICU collation string */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -75,6 +76,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							bool collisdeterministic,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
+							const char *collicucoll,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index b8aa1364a0..6c62efcc54 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -15,7 +15,7 @@
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'default template for new databases',
   datname => 'template1', encoding => 'ENCODING', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datistemplate => 't', datallowconn => 't',
+  datctype => 'LC_CTYPE', daticucoll => 'ICUCOLL', datcollprovider => 'COLLPROVIDER', datistemplate => 't', datallowconn => 't',
   datconnlimit => '-1', datlastsysoid => '0', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datacl => '_null_' },
 
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 43f3beb6a3..9280850185 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -40,11 +40,8 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* character encoding */
 	int32		encoding;
 
-	/* LC_COLLATE setting */
-	NameData	datcollate;
-
-	/* LC_CTYPE setting */
-	NameData	datctype;
+	/* see pg_collation.collprovider */
+	char		datcollprovider;
 
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
@@ -68,6 +65,15 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	Oid			dattablespace BKI_LOOKUP(pg_tablespace);
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
+	/* LC_COLLATE setting */
+	text		datcollate BKI_FORCE_NOT_NULL;
+
+	/* LC_CTYPE setting */
+	text		datctype BKI_FORCE_NOT_NULL;
+
+	/* ICU collation */
+	text		daticucoll;
+
 	/* access permissions */
 	aclitem		datacl[1];
 #endif
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 2946f46c76..19478e573f 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -101,6 +101,12 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+extern struct pg_locale_struct default_locale;
+
+extern void make_icu_collator(const char *icucollstr,
+							  struct pg_locale_struct *resultp);
+extern void check_collation_version(HeapTuple colltuple);
+
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 70133df804..3d9647b597 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1029,14 +1029,12 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-ERROR:  parameter "lc_ctype" must be specified
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
+ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 9cee3d0042..0677ba56e4 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -366,13 +366,11 @@ CREATE SCHEMA test_schema;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;

base-commit: 8112bcf0cc602e00e95eab6c4bdc0eb73b5b547d
-- 
2.34.1

#13

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Peter Eisentraut (#12)

Re: ICU for global collation

Hi,

On Thu, Dec 30, 2021 at 01:07:21PM +0100, Peter Eisentraut wrote:

So this is a different approach: If you choose ICU as the default locale for
a database, you still need to specify lc_ctype and lc_collate settings, as
before. Unlike in the previous patch, where the ICU collation name was
written in datcollate, there is now a third column (daticucoll), so we can
store all three values. This fixes the described problem. Other than that,
once you get all the initial settings right, it basically just works: The
places that have ICU support now will use a database-wide ICU collation if
appropriate, the places that don't have ICU support continue to use the
global libc locale settings.

That looks sensible to me.

@@ -2774,6 +2776,7 @@ dumpDatabase(Archive *fout)
appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
"(%s datdba) AS dba, "
"pg_encoding_to_char(encoding) AS encoding, "
+ "datcollprovider, "

This needs to be in a new pg 15+ branch, not in the pg 9.3+.

-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}

There are really a lot of places with this new code. Maybe it could be some
new function/macro to wrap that for the normal case (e.g. not formatting.c)?

#14

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Julien Rouhaud (#13)

Re: ICU for global collation

On 04.01.22 03:21, Julien Rouhaud wrote:

@@ -2774,6 +2776,7 @@ dumpDatabase(Archive *fout)
appendPQExpBuffer(dbQry, "SELECT tableoid, oid, datname, "
"(%s datdba) AS dba, "
"pg_encoding_to_char(encoding) AS encoding, "
+ "datcollprovider, "

This needs to be in a new pg 15+ branch, not in the pg 9.3+.

-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
There are really a lot of places with this new code. Maybe it could be some
new function/macro to wrap that for the normal case (e.g. not formatting.c)?

Right, we could just put this into pg_newlocale_from_collation(), but
the comment there says

* In fact, they shouldn't call this function at all when they are dealing
* with the default locale. That can save quite a bit in hotspots.

I don't know how to assess that.

We could pack this into a macro or inline function if we are concerned
about this.

#15

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Peter Eisentraut (#14)

Re: ICU for global collation

On Tue, Jan 04, 2022 at 05:03:10PM +0100, Peter Eisentraut wrote:

On 04.01.22 03:21, Julien Rouhaud wrote:
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-		mylocale = pg_newlocale_from_collation(collid);
+	if (!lc_collate_is_c(collid))
+	{
+		if (collid != DEFAULT_COLLATION_OID)
+			mylocale = pg_newlocale_from_collation(collid);
+		else if (default_locale.provider == COLLPROVIDER_ICU)
+			mylocale = &default_locale;
+	}
There are really a lot of places with this new code. Maybe it could be some
new function/macro to wrap that for the normal case (e.g. not formatting.c)?
Right, we could just put this into pg_newlocale_from_collation(), but the
comment there says

* In fact, they shouldn't call this function at all when they are dealing
* with the default locale. That can save quite a bit in hotspots.

I don't know how to assess that.

We could pack this into a macro or inline function if we are concerned about
this.

Yes that was my idea, just have a new function (inline function or a macro
then since pg_newlocale_from_collation() clearly warns about performance
concerns) that have the whole
is-not-c-collation-and-is-default-collation-or-icu-collation logic and calls
pg_newlocale_from_collation() only when needed.

#16

Finnerty, Jim

jfinnert@amazon.com

about 4 years ago

In reply to: Julien Rouhaud (#15)

Re: ICU for global collation

I didn't notice anything version-specific about the patch. Would any modifications be needed to backport it to pg13 and pg14?

After this patch goes in, the big next thing would be to support nondeterministic collations for LIKE, ILIKE and pattern matching operators in general. Is anyone interested in working on that?

On 1/5/22, 10:36 PM, "Julien Rouhaud" <rjuju123@gmail.com> wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Tue, Jan 04, 2022 at 05:03:10PM +0100, Peter Eisentraut wrote:

On 04.01.22 03:21, Julien Rouhaud wrote:
- if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
-         mylocale = pg_newlocale_from_collation(collid);
+ if (!lc_collate_is_c(collid))
+ {
+         if (collid != DEFAULT_COLLATION_OID)
+                 mylocale = pg_newlocale_from_collation(collid);
+         else if (default_locale.provider == COLLPROVIDER_ICU)
+                 mylocale = &default_locale;
+ }
There are really a lot of places with this new code. Maybe it could be some
new function/macro to wrap that for the normal case (e.g. not formatting.c)?
Right, we could just put this into pg_newlocale_from_collation(), but the
comment there says

* In fact, they shouldn't call this function at all when they are dealing
* with the default locale. That can save quite a bit in hotspots.

I don't know how to assess that.

We could pack this into a macro or inline function if we are concerned about
this.

#17

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Finnerty, Jim (#16)

Re: ICU for global collation

On Thu, Jan 06, 2022 at 01:55:55PM +0000, Finnerty, Jim wrote:

I didn't notice anything version-specific about the patch. Would any
modifications be needed to backport it to pg13 and pg14?

This is a new feature so it can't be backported. The changes aren't big and
mostly touches places that didn't change in a long time so I don't think that
it would take much effort if you wanted to backport it on your own forks.

After this patch goes in, the big next thing would be to support
nondeterministic collations for LIKE, ILIKE and pattern matching operators in
general. Is anyone interested in working on that?

As far as I know you're the last person that seemed to be working on that topic
back in March :)

#18

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Peter Eisentraut (#12)

Re: ICU for global collation

Hi,

I looked a bit more in this patch and I have some additional remarks.

On Thu, Dec 30, 2021 at 01:07:21PM +0100, Peter Eisentraut wrote:

So this is a different approach: If you choose ICU as the default locale for
a database, you still need to specify lc_ctype and lc_collate settings, as
before. Unlike in the previous patch, where the ICU collation name was
written in datcollate, there is now a third column (daticucoll), so we can
store all three values. This fixes the described problem. Other than that,
once you get all the initial settings right, it basically just works: The
places that have ICU support now will use a database-wide ICU collation if
appropriate, the places that don't have ICU support continue to use the
global libc locale settings.

So just to confirm a database can now have 2 different *default* collations: a
libc-based one for everything that doesn't work with ICU and a ICU-based (if
specified) for everything else, and the ICU-based is optional, so if not
provided everything works as before, using the libc based default collation.

As I mentioned I think this approach is sensible. However, should we document
what are the things that are not ICU-aware?

I changed the datcollate, datctype, and the new daticucoll fields to type
text (from name). That way, the daticucoll field can be set to null if it's
not applicable. Also, the limit of 63 characters can actually be a problem
if you want to use some combination of the options that ICU locales offer.
And for less extreme uses, having variable-length fields will save some
storage, since typical locale names are much shorter.

I understand the need to have daticucoll as text, however it's not clear to me
why this has to be changed for datcollate and datctype? IIUC those will only
ever contain libc-based collation and are still mandatory?

For the same reasons and to keep things consistent, I also changed the
analogous pg_collation fields like that.

The respective fields in pg_collation are now nullable, so the changes there
sounds ok.

Digging a bit more in the patch here are some things that looks problematic.

- pg_upgrade

It checks (in check_locale_and_encoding()) the compatibility for each database,
and it looks like the daticucoll field should also be verified. Other than
that I don't think there is anything else needed for the pg_upgrade part as
everything else should be handled by pg_dump (I didn't look at the changes yet
given the below problems).

- CREATE DATABASE

There's a new COLLATION_PROVIDER option, but the overall behavior seems quite
unintuitive. As far as I can see the idea is to use LOCALE for the ICU default
collation, but as it's providing a default for the other values it's quite
annoying. For instance:

=# CREATE DATABASE db COLLATION_PROVIDER icu LOCALE 'fr-x-icu' LC_COLLATE 'en_GB.UTF-8';;
ERROR: 42809: invalid locale name: "fr-x-icu"
LOCATION: createdb, dbcommands.c:397

Looking at the code it's actually complaining about LC_CTYPE. If you want a
database with an ICU default collation the lc_collate and lc_ctype should
inherit what's in the template database and not what was provided in the
LOCALE I think. You could still probably overload them in some scenario, but
without a list of what isn't ICU-aware I can't really be sure of how often one
might have to do it.

Now, if I specify everything as needed it looks like it's missing some checks
on the ICU default collation when not using template0:

=# CREATE DATABASE db COLLATION_PROVIDER icu LOCALE 'en-x-icu' LC_COLLATE 'en_GB.UTF-8' LC_CTYPE 'en_GB.UTF-8';;
CREATE DATABASE

Unless I'm missing something the same concerns about collation incompatibility
with objects in the source database should also apply for the ICU collation?

While at it, I'm not exactly sure of what the COLLATION_PROVIDER is supposed to
mean, as the same commands but with a libc provider is accepted and has
the exact same result:

=# CREATE DATABASE db2 COLLATION_PROVIDER libc LOCALE 'en-x-icu' LC_COLLATE 'en_GB.UTF-8' LC_CTYPE 'en_GB.UTF-8';;
CREATE DATABASE

Shouldn't db2 have a NULL daticucoll, and if so also complain about
incompatibility for it?

- initdb

I don't think that initdb --collation-provider icu should be allowed without
--icu-locale, same for --collation-provider libc *with* --icu-locale.

When trying that, I can also see that the NULL handling for daticucoll is
broken in the BKI:

$ initdb -k --collation-provider icu
[...]
Success. You can now start the database server using:

There's a fallback on my LANG/LC_* env settings, but I don't think it can ever
be correct given the different naming convention in ICU (at least the s/_/-/).

And

$ initdb -k --collation-provider libc --icu-locale test
[...]
Success. You can now start the database server using:

#19

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Peter Eisentraut (#14)

Re: ICU for global collation

On 04.01.22 17:03, Peter Eisentraut wrote:

There are really a lot of places with this new code. Maybe it could
be some
new function/macro to wrap that for the normal case (e.g. not
formatting.c)?

Right, we could just put this into pg_newlocale_from_collation(), but
the comment there says

* In fact, they shouldn't call this function at all when they are dealing
* with the default locale. That can save quite a bit in hotspots.

I don't know how to assess that.

I tested this a bit. I used the following setup:

create table t1 (a text);
insert into t1 select md5(generate_series(1, 10000000)::text);
select count(*) from t1 where a > '';

And then I changed in varstr_cmp():

if (collid != DEFAULT_COLLATION_OID)
mylocale = pg_newlocale_from_collation(collid);

to just

mylocale = pg_newlocale_from_collation(collid);

I find that the \timing results are indistinguishable. (I used locale
"en_US.UTF-8" and made sure that that code path is actually hit.)

Does anyone have other insights?

#20

Daniel Verite

daniel@manitou-mail.org

about 4 years ago

In reply to: Julien Rouhaud (#18)

Re: ICU for global collation

Julien Rouhaud wrote:

If you want a database with an ICU default collation the lc_collate
and lc_ctype should inherit what's in the template database and not
what was provided in the LOCALE I think. You could still probably
overload them in some scenario, but without a list of what isn't
ICU-aware I can't really be sure of how often one might have to do
it.

I guess we'd need that when creating a database with a different
encoding than the template databases, at least.

About what's not ICU-aware, I believe the most significant part in
core is the Full Text Search parser.
It doesn't care about sorting strings, but it relies on the functions
from POSIX <ctype.h>, which depend on LC_CTYPE
(it looks however that this could be improved by following
what has been done in backend/regex/regc_pg_locale.c, which has
comparable needs and calls ICU functions when applicable).

Also, any extension is potentially concerned. Surely many
extensions call functions from ctype.h assuming that
the current value of LC_CTYPE works with the data they handle.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#21

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Peter Eisentraut (#19)

Re: ICU for global collation

On Fri, Jan 07, 2022 at 03:25:28PM +0100, Peter Eisentraut wrote:

I tested this a bit. I used the following setup:

create table t1 (a text);
insert into t1 select md5(generate_series(1, 10000000)::text);
select count(*) from t1 where a > '';

And then I changed in varstr_cmp():

if (collid != DEFAULT_COLLATION_OID)
mylocale = pg_newlocale_from_collation(collid);

to just

mylocale = pg_newlocale_from_collation(collid);

I find that the \timing results are indistinguishable. (I used locale
"en_US.UTF-8" and made sure that that code path is actually hit.)

Does anyone have other insights?

Looking at the git history, you added this comment in 414c5a2ea65.

After a bit a digging in the lists, I found that you introduced it to fix a
reported 13% slowdown in varstr_cmp():
/messages/by-id/20110129075253.GA18784@tornado.leadboat.com
/messages/by-id/1296748408.6442.1.camel@vanquo.pezone.net

#22

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Julien Rouhaud (#21)

Re: ICU for global collation

On Mon, Jan 10, 2022 at 11:25:08AM +0800, Julien Rouhaud wrote:

On Fri, Jan 07, 2022 at 03:25:28PM +0100, Peter Eisentraut wrote:

I tested this a bit. I used the following setup:

create table t1 (a text);
insert into t1 select md5(generate_series(1, 10000000)::text);
select count(*) from t1 where a > '';

And then I changed in varstr_cmp():

if (collid != DEFAULT_COLLATION_OID)
mylocale = pg_newlocale_from_collation(collid);

to just

mylocale = pg_newlocale_from_collation(collid);

I find that the \timing results are indistinguishable. (I used locale
"en_US.UTF-8" and made sure that that code path is actually hit.)

Does anyone have other insights?

Looking at the git history, you added this comment in 414c5a2ea65.

After a bit a digging in the lists, I found that you introduced it to fix a
reported 13% slowdown in varstr_cmp():
/messages/by-id/20110129075253.GA18784@tornado.leadboat.com
/messages/by-id/1296748408.6442.1.camel@vanquo.pezone.net

So I tried to run Noah's benchmark to see if I could reproduce the slowdown.
Unfortunately the results I'm getting don't really make sense as removing the
optimisation brings a 15% speedup, and with a few more runs I can see that I
have about 25% noise, so there isn't much I can do to help.

#23

Daniel Verite

daniel@manitou-mail.org

about 4 years ago

In reply to: Peter Eisentraut (#12)

Re: ICU for global collation

Peter Eisentraut wrote:

Unlike in the previous patch, where the ICU
collation name was written in datcollate, there is now a third column
(daticucoll), so we can store all three values.

I think some users would want their db-wide ICU collation to be
case/accent-insensitive. Postgres users are trained to expect
case-sensitive comparisons, but some apps initially made for
e.g. MySQL or MS-SQL that use such collations natively would be easier
to port to Postgres.
IIRC, that was the context for some questions where people were
enquiring about db-wide ICU collations.

With the current patch, it's not possible, AFAICS, because the user
can't tell that the collation is non-deterministic. Presumably this
would require another option to CREATE DATABASE and another
column to store that bit of information.

The "daticucol" column also suggests that we don't expect to add
other collation providers in the future. Maybe a pair of columns like
(datcollprovider, datcolllocale) would be more future-proof,
or a (datcollprovider, datcolllocale, datcollisdeterministic)
triplet if non-deterministic collations are allowed.

Also, pg_collation has "collversion" to detect a mismatch between
the ICU runtime and existing indexes. I don't see that field
for the db-wide ICU collation, so maybe we currently miss the capability
to detect the mismatch for the db-wide collation?

The lack of these fields overall suggest the idea that when CREATE
DATABASE is called with a global ICU collation, what if it somehow
inserted the collation into pg_collation in the new database?
Then pg_database would just store the collation oid and no other
collation-related field would need to be added into it, now
or in the future.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#24

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Daniel Verite (#23)

Re: ICU for global collation

On Mon, Jan 10, 2022 at 12:49:07PM +0100, Daniel Verite wrote:

The "daticucol" column also suggests that we don't expect to add
other collation providers in the future. Maybe a pair of columns like
(datcollprovider, datcolllocale) would be more future-proof,
or a (datcollprovider, datcolllocale, datcollisdeterministic)
triplet if non-deterministic collations are allowed.

I'm not sure about the non-deterministic default collation given the current
restrictions with it, but the extra column seems like a good idea. It would
require a bit more thinking, as we would need a second collation column in
pg_database for any default provider that's not libc.

Also, pg_collation has "collversion" to detect a mismatch between
the ICU runtime and existing indexes. I don't see that field
for the db-wide ICU collation, so maybe we currently miss the capability
to detect the mismatch for the db-wide collation?

I don't think that storing a version there will really help. There's no
guarantee that any object has been created with the version of the collation
that was installed when the database was created. And we would still need
to store a version with each underlying object anyway, as rebuilding all broken
dependencies can last for a long time, including a server restart.

The lack of these fields overall suggest the idea that when CREATE
DATABASE is called with a global ICU collation, what if it somehow
inserted the collation into pg_collation in the new database?
Then pg_database would just store the collation oid and no other
collation-related field would need to be added into it, now
or in the future.

I don't think it would be doable given the single-database-per-backend
restriction.

#25

Daniel Verite

daniel@manitou-mail.org

about 4 years ago

In reply to: Julien Rouhaud (#24)

Re: ICU for global collation

Julien Rouhaud wrote:

The lack of these fields overall suggest the idea that when CREATE
DATABASE is called with a global ICU collation, what if it somehow
inserted the collation into pg_collation in the new database?
Then pg_database would just store the collation oid and no other
collation-related field would need to be added into it, now
or in the future.

I don't think it would be doable given the single-database-per-backend
restriction.

By that I understand that CREATE DATABASE is limited to copying a template
database and then not write anything into it beyond that, as it's
not even connected to it.
I guess there's still the possibility of requiring that the ICU db-wide
collation of the new database does exist in the template database,
and then the CREATE DATABASE would refer to that collation instead of
an independent locale string.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#26

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Daniel Verite (#25)

Re: ICU for global collation

On Mon, Jan 10, 2022 at 03:45:47PM +0100, Daniel Verite wrote:

By that I understand that CREATE DATABASE is limited to copying a template
database and then not write anything into it beyond that, as it's
not even connected to it.

Yes.

I guess there's still the possibility of requiring that the ICU db-wide
collation of the new database does exist in the template database,
and then the CREATE DATABASE would refer to that collation instead of
an independent locale string.

That could work. However if having the collation in the template database a
strict requirement the something should also be done for initdb, and it will
probably be a bigger problem.

#27

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Julien Rouhaud (#22)

Re: ICU for global collation

On 10.01.22 07:00, Julien Rouhaud wrote:

And then I changed in varstr_cmp():

if (collid != DEFAULT_COLLATION_OID)
mylocale = pg_newlocale_from_collation(collid);

to just

mylocale = pg_newlocale_from_collation(collid);

I find that the \timing results are indistinguishable. (I used locale
"en_US.UTF-8" and made sure that that code path is actually hit.)

Does anyone have other insights?

Looking at the git history, you added this comment in 414c5a2ea65.

After a bit a digging in the lists, I found that you introduced it to fix a
reported 13% slowdown in varstr_cmp():
/messages/by-id/20110129075253.GA18784@tornado.leadboat.com
/messages/by-id/1296748408.6442.1.camel@vanquo.pezone.net

So I tried to run Noah's benchmark to see if I could reproduce the slowdown.
Unfortunately the results I'm getting don't really make sense as removing the
optimisation brings a 15% speedup, and with a few more runs I can see that I
have about 25% noise, so there isn't much I can do to help.

Heh, I had that same experience, it actually got faster without the
optimization, but then got lost in the noise on further testing.

Looking back at those discussions, I don't think those old test results
are relevant anymore. In the patch that was being tested there,
pg_newlocale_from_collation(), did not contain

if (collid == DEFAULT_COLLATION_OID)
return (pg_locale_t) 0;

so the default collation actually went through most or all of the
function and did a lot of work. That would understandably be quite
slow. But just calling a function and returning immediately should not
be a problem. Otherwise, the call to check_collation_set() in
varstr_cmp() and elsewhere would be just as bad.

So, unless there are concerns, I'm going to see about making a patch to
call pg_newlocale_from_collation() even with the default collation.
That would make the actual feature patch quite a bit smaller, since we
won't have to patch every call site of pg_newlocale_from_collation().

#28

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Peter Eisentraut (#27)

Re: ICU for global collation

On Tue, Jan 11, 2022 at 10:10:25AM +0100, Peter Eisentraut wrote:

On 10.01.22 07:00, Julien Rouhaud wrote:

So I tried to run Noah's benchmark to see if I could reproduce the slowdown.
Unfortunately the results I'm getting don't really make sense as removing the
optimisation brings a 15% speedup, and with a few more runs I can see that I
have about 25% noise, so there isn't much I can do to help.

Heh, I had that same experience, it actually got faster without the
optimization, but then got lost in the noise on further testing.

Ah, so it's not just my machine :)

Looking back at those discussions, I don't think those old test results are
relevant anymore. In the patch that was being tested there,
pg_newlocale_from_collation(), did not contain

if (collid == DEFAULT_COLLATION_OID)
return (pg_locale_t) 0;

so the default collation actually went through most or all of the function
and did a lot of work. That would understandably be quite slow. But just
calling a function and returning immediately should not be a problem.
Otherwise, the call to check_collation_set() in varstr_cmp() and elsewhere
would be just as bad.

I didn't noticed that. That definitely explain why the performance concern
isn't valid anymore.

So, unless there are concerns, I'm going to see about making a patch to call
pg_newlocale_from_collation() even with the default collation. That would
make the actual feature patch quite a bit smaller, since we won't have to
patch every call site of pg_newlocale_from_collation().

+1 for me!

#29

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Julien Rouhaud (#18)

Re: ICU for global collation

On 07.01.22 10:03, Julien Rouhaud wrote:

I changed the datcollate, datctype, and the new daticucoll fields to type
text (from name). That way, the daticucoll field can be set to null if it's
not applicable. Also, the limit of 63 characters can actually be a problem
if you want to use some combination of the options that ICU locales offer.
And for less extreme uses, having variable-length fields will save some
storage, since typical locale names are much shorter.

I understand the need to have daticucoll as text, however it's not clear to me
why this has to be changed for datcollate and datctype? IIUC those will only
ever contain libc-based collation and are still mandatory?

Right. I just did this for consistency. It would be strange otherwise
to have some fields as name and some as text. Arguably, using "name"
here was wrong to begin with, since they are not really object names.
Maybe there used to be a reason to avoid variable-length fields in
pg_database, but there isn't one now AFAICT.

- pg_upgrade

It checks (in check_locale_and_encoding()) the compatibility for each database,
and it looks like the daticucoll field should also be verified. Other than
that I don't think there is anything else needed for the pg_upgrade part as
everything else should be handled by pg_dump (I didn't look at the changes yet
given the below problems).

Ok, I have added this and will include it in my next patch submission.

- CREATE DATABASE

- initdb

Ok, some work is needed to make these interfaces behave sensibly in
various combinations. I will look into that.

#30

Peter Eisentraut

peter.eisentraut@enterprisedb.com

about 4 years ago

In reply to: Daniel Verite (#23)

Re: ICU for global collation

On 10.01.22 12:49, Daniel Verite wrote:

With the current patch, it's not possible, AFAICS, because the user
can't tell that the collation is non-deterministic. Presumably this
would require another option to CREATE DATABASE and another
column to store that bit of information.

Adding this would be easy, but since pattern matching currently does not
support nondeterministic collations, if you make a global collation
nondeterministic, a lot of system views, psql, pg_dump queries etc.
would break, so it's not practical. I view this is an orthogonal
project. Once we can support this without breaking system views etc.,
then it's easy to enable with a new column in pg_database.

The "daticucol" column also suggests that we don't expect to add
other collation providers in the future. Maybe a pair of columns like
(datcollprovider, datcolllocale) would be more future-proof,
or a (datcollprovider, datcolllocale, datcollisdeterministic)
triplet if non-deterministic collations are allowed.

I don't expect many new collation providers. So I don't think an
EAV-like storage would be helpful. The other problem is that we don't
know what we need. For example, the libc provider needs both a collate
and a ctype value, so that wouldn't fit into that scheme nicely.

Also, pg_collation has "collversion" to detect a mismatch between
the ICU runtime and existing indexes. I don't see that field
for the db-wide ICU collation, so maybe we currently miss the capability
to detect the mismatch for the db-wide collation?

Yeah, I think I need to add a datcollversion field and the associated
checks.

#31

Daniel Verite

daniel@manitou-mail.org

about 4 years ago

In reply to: Julien Rouhaud (#26)

Re: ICU for global collation

Julien Rouhaud wrote:

I guess there's still the possibility of requiring that the ICU db-wide
collation of the new database does exist in the template database,
and then the CREATE DATABASE would refer to that collation instead of
an independent locale string.

That could work. However if having the collation in the template database a
strict requirement the something should also be done for initdb, and it will
probably be a bigger problem.

If CREATE DATABASE referred to a collation in the template db,
either that collation already exists, or the user would have to add it
to the template db with CREATE COLLATION.
initdb already populates the template databases with a full set of
ICU collations through pg_import_system_collations().
I don't quite see what change you're seeing that would be needed in
initdb.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#32

Julien Rouhaud

rjuju123@gmail.com

about 4 years ago

In reply to: Daniel Verite (#31)

Re: ICU for global collation

On Tue, Jan 11, 2022 at 12:36:46PM +0100, Daniel Verite wrote:

If CREATE DATABASE referred to a collation in the template db,
either that collation already exists, or the user would have to add it
to the template db with CREATE COLLATION.
initdb already populates the template databases with a full set of
ICU collations through pg_import_system_collations().
I don't quite see what change you're seeing that would be needed in
initdb.

Yes, there are already the system collation imported. But you still need to
make sure that that collation does exist in the template database, and it's
still impossible to connect to that database to check when processing the
CREATE DATABASE. Also, if the wanted collation wasn't imported by
pg_import_system_collations() and isn't the server's default collation, then
the user would have to allow connection on the template0 database and create
the wanted collation there. It doesn't seem like something that should be
recommended for a reasonably standard use case.

#33

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Julien Rouhaud (#28)

1 attachment(s)

Re: ICU for global collation

On 11.01.22 12:08, Julien Rouhaud wrote:

So, unless there are concerns, I'm going to see about making a patch to call
pg_newlocale_from_collation() even with the default collation. That would
make the actual feature patch quite a bit smaller, since we won't have to
patch every call site of pg_newlocale_from_collation().

+1 for me!

Here is that patch.

If this is applied, then in my estimation all these hunks will
completely disappear from the global ICU patch. So this would be a
significant win.

Attachments:

0001-Call-pg_newlocale_from_collation-also-with-default-c.patchtext/plain; charset=UTF-8; name=0001-Call-pg_newlocale_from_collation-also-with-default-c.patchDownload

From 2ee4b77221020e81ec83cf37d36e3955bf619d80 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 13 Jan 2022 09:14:41 +0100
Subject: [PATCH] Call pg_newlocale_from_collation() also with default
 collation

Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0.  Instead, now we call it anyway and have it return
0 if the default collation was passed.  It already did this, so we
just have to adjust the callers.  This simplifies all the call sites
and also makes future enhancements easier.

After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.

Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
---
 src/backend/access/hash/hashfunc.c   |  4 +-
 src/backend/regex/regc_pg_locale.c   | 40 ++++++------
 src/backend/utils/adt/formatting.c   | 96 +++++++++++++---------------
 src/backend/utils/adt/like.c         | 37 ++++++-----
 src/backend/utils/adt/like_support.c | 27 ++++----
 src/backend/utils/adt/pg_locale.c    |  3 -
 src/backend/utils/adt/varchar.c      | 26 +++++---
 src/backend/utils/adt/varlena.c      | 34 ++++++----
 8 files changed, 135 insertions(+), 132 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index 0521c69dd5..b57ed946c4 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -278,7 +278,7 @@ hashtext(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
 	if (!mylocale || mylocale->deterministic)
@@ -334,7 +334,7 @@ hashtextextended(PG_FUNCTION_ARGS)
 				 errmsg("could not determine which collation to use for string hashing"),
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
 	if (!mylocale || mylocale->deterministic)
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index e0d93eab32..6e84f42cb2 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -231,6 +231,18 @@ static const unsigned char pg_char_properties[128] = {
 void
 pg_set_regex_collation(Oid collation)
 {
+	if (!OidIsValid(collation))
+	{
+		/*
+		 * This typically means that the parser could not resolve a
+		 * conflict of implicit collations, so report it that way.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_INDETERMINATE_COLLATION),
+				 errmsg("could not determine which collation to use for regular expression"),
+				 errhint("Use the COLLATE clause to set the collation explicitly.")));
+	}
+
 	if (lc_ctype_is_c(collation))
 	{
 		/* C/POSIX collations use this path regardless of database encoding */
@@ -240,28 +252,12 @@ pg_set_regex_collation(Oid collation)
 	}
 	else
 	{
-		if (collation == DEFAULT_COLLATION_OID)
-			pg_regex_locale = 0;
-		else if (OidIsValid(collation))
-		{
-			/*
-			 * NB: pg_newlocale_from_collation will fail if not HAVE_LOCALE_T;
-			 * the case of pg_regex_locale != 0 but not HAVE_LOCALE_T does not
-			 * have to be considered below.
-			 */
-			pg_regex_locale = pg_newlocale_from_collation(collation);
-		}
-		else
-		{
-			/*
-			 * This typically means that the parser could not resolve a
-			 * conflict of implicit collations, so report it that way.
-			 */
-			ereport(ERROR,
-					(errcode(ERRCODE_INDETERMINATE_COLLATION),
-					 errmsg("could not determine which collation to use for regular expression"),
-					 errhint("Use the COLLATE clause to set the collation explicitly.")));
-		}
+		/*
+		 * NB: pg_newlocale_from_collation will fail if not HAVE_LOCALE_T;
+		 * the case of pg_regex_locale != 0 but not HAVE_LOCALE_T does not
+		 * have to be considered below.
+		 */
+		pg_regex_locale = pg_newlocale_from_collation(collation);
 
 		if (pg_regex_locale && !pg_regex_locale->deterministic)
 			ereport(ERROR,
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index e8f996ac83..d4c2e7b069 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1641,6 +1641,19 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	if (!buff)
 		return NULL;
 
+	if (!OidIsValid(collid))
+	{
+		/*
+		 * This typically means that the parser could not resolve a
+		 * conflict of implicit collations, so report it that way.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_INDETERMINATE_COLLATION),
+				 errmsg("could not determine which collation to use for %s function",
+						"lower()"),
+				 errhint("Use the COLLATE clause to set the collation explicitly.")));
+	}
+
 	/* C/POSIX collations use this path regardless of database encoding */
 	if (lc_ctype_is_c(collid))
 	{
@@ -1648,24 +1661,9 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-		pg_locale_t mylocale = 0;
+		pg_locale_t mylocale;
 
-		if (collid != DEFAULT_COLLATION_OID)
-		{
-			if (!OidIsValid(collid))
-			{
-				/*
-				 * This typically means that the parser could not resolve a
-				 * conflict of implicit collations, so report it that way.
-				 */
-				ereport(ERROR,
-						(errcode(ERRCODE_INDETERMINATE_COLLATION),
-						 errmsg("could not determine which collation to use for %s function",
-								"lower()"),
-						 errhint("Use the COLLATE clause to set the collation explicitly.")));
-			}
-			mylocale = pg_newlocale_from_collation(collid);
-		}
+		mylocale = pg_newlocale_from_collation(collid);
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1765,6 +1763,19 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	if (!buff)
 		return NULL;
 
+	if (!OidIsValid(collid))
+	{
+		/*
+		 * This typically means that the parser could not resolve a
+		 * conflict of implicit collations, so report it that way.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_INDETERMINATE_COLLATION),
+				 errmsg("could not determine which collation to use for %s function",
+						"upper()"),
+				 errhint("Use the COLLATE clause to set the collation explicitly.")));
+	}
+
 	/* C/POSIX collations use this path regardless of database encoding */
 	if (lc_ctype_is_c(collid))
 	{
@@ -1772,24 +1783,9 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-		pg_locale_t mylocale = 0;
+		pg_locale_t mylocale;
 
-		if (collid != DEFAULT_COLLATION_OID)
-		{
-			if (!OidIsValid(collid))
-			{
-				/*
-				 * This typically means that the parser could not resolve a
-				 * conflict of implicit collations, so report it that way.
-				 */
-				ereport(ERROR,
-						(errcode(ERRCODE_INDETERMINATE_COLLATION),
-						 errmsg("could not determine which collation to use for %s function",
-								"upper()"),
-						 errhint("Use the COLLATE clause to set the collation explicitly.")));
-			}
-			mylocale = pg_newlocale_from_collation(collid);
-		}
+		mylocale = pg_newlocale_from_collation(collid);
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
@@ -1890,6 +1886,19 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	if (!buff)
 		return NULL;
 
+	if (!OidIsValid(collid))
+	{
+		/*
+		 * This typically means that the parser could not resolve a
+		 * conflict of implicit collations, so report it that way.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_INDETERMINATE_COLLATION),
+				 errmsg("could not determine which collation to use for %s function",
+						"initcap()"),
+				 errhint("Use the COLLATE clause to set the collation explicitly.")));
+	}
+
 	/* C/POSIX collations use this path regardless of database encoding */
 	if (lc_ctype_is_c(collid))
 	{
@@ -1897,24 +1906,9 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 	}
 	else
 	{
-		pg_locale_t mylocale = 0;
+		pg_locale_t mylocale;
 
-		if (collid != DEFAULT_COLLATION_OID)
-		{
-			if (!OidIsValid(collid))
-			{
-				/*
-				 * This typically means that the parser could not resolve a
-				 * conflict of implicit collations, so report it that way.
-				 */
-				ereport(ERROR,
-						(errcode(ERRCODE_INDETERMINATE_COLLATION),
-						 errmsg("could not determine which collation to use for %s function",
-								"initcap()"),
-						 errhint("Use the COLLATE clause to set the collation explicitly.")));
-			}
-			mylocale = pg_newlocale_from_collation(collid);
-		}
+		mylocale = pg_newlocale_from_collation(collid);
 
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 9f241dc7c6..833ee8f814 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -150,7 +150,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
 static inline int
 GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation)
 {
-	if (collation && !lc_ctype_is_c(collation) && collation != DEFAULT_COLLATION_OID)
+	if (collation && !lc_ctype_is_c(collation))
 	{
 		pg_locale_t locale = pg_newlocale_from_collation(collation);
 
@@ -178,28 +178,27 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	pg_locale_t locale = 0;
 	bool		locale_is_c = false;
 
+	if (!OidIsValid(collation))
+	{
+		/*
+		 * This typically means that the parser could not resolve a
+		 * conflict of implicit collations, so report it that way.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_INDETERMINATE_COLLATION),
+				 errmsg("could not determine which collation to use for ILIKE"),
+				 errhint("Use the COLLATE clause to set the collation explicitly.")));
+	}
+
 	if (lc_ctype_is_c(collation))
 		locale_is_c = true;
-	else if (collation != DEFAULT_COLLATION_OID)
-	{
-		if (!OidIsValid(collation))
-		{
-			/*
-			 * This typically means that the parser could not resolve a
-			 * conflict of implicit collations, so report it that way.
-			 */
-			ereport(ERROR,
-					(errcode(ERRCODE_INDETERMINATE_COLLATION),
-					 errmsg("could not determine which collation to use for ILIKE"),
-					 errhint("Use the COLLATE clause to set the collation explicitly.")));
-		}
+	else
 		locale = pg_newlocale_from_collation(collation);
 
-		if (locale && !locale->deterministic)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("nondeterministic collations are not supported for ILIKE")));
-	}
+	if (locale && !locale->deterministic)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
 	 * For efficiency reasons, in the single byte case we don't call lower()
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 7ca2a01e49..65a57fc3c4 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1012,24 +1012,23 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("case insensitive matching not supported on type bytea")));
 
+		if (!OidIsValid(collation))
+		{
+			/*
+			 * This typically means that the parser could not resolve a
+			 * conflict of implicit collations, so report it that way.
+			 */
+			ereport(ERROR,
+					(errcode(ERRCODE_INDETERMINATE_COLLATION),
+					 errmsg("could not determine which collation to use for ILIKE"),
+					 errhint("Use the COLLATE clause to set the collation explicitly.")));
+		}
+
 		/* If case-insensitive, we need locale info */
 		if (lc_ctype_is_c(collation))
 			locale_is_c = true;
-		else if (collation != DEFAULT_COLLATION_OID)
-		{
-			if (!OidIsValid(collation))
-			{
-				/*
-				 * This typically means that the parser could not resolve a
-				 * conflict of implicit collations, so report it that way.
-				 */
-				ereport(ERROR,
-						(errcode(ERRCODE_INDETERMINATE_COLLATION),
-						 errmsg("could not determine which collation to use for ILIKE"),
-						 errhint("Use the COLLATE clause to set the collation explicitly.")));
-			}
+		else
 			locale = pg_newlocale_from_collation(collation);
-		}
 	}
 
 	if (typeid != BYTEAOID)
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 18f3afdc62..33cccc5c6c 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1454,8 +1454,6 @@ report_newlocale_failure(const char *localename)
  *
  * As a special optimization, the default/database collation returns 0.
  * Callers should then revert to the non-locale_t-enabled code path.
- * In fact, they shouldn't call this function at all when they are dealing
- * with the default locale.  That can save quite a bit in hotspots.
  * Also, callers should avoid calling this before going down a C/POSIX
  * fastpath, because such a fastpath should work even on platforms without
  * locale_t support in the C library.
@@ -1472,7 +1470,6 @@ pg_newlocale_from_collation(Oid collid)
 	/* Callers must pass a valid OID */
 	Assert(OidIsValid(collid));
 
-	/* Return 0 for "default" collation, just in case caller forgets */
 	if (collid == DEFAULT_COLLATION_OID)
 		return (pg_locale_t) 0;
 
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 95f768c884..8b5b30ed71 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -743,15 +743,20 @@ bpchareq(PG_FUNCTION_ARGS)
 				len2;
 	bool		result;
 	Oid			collid = PG_GET_COLLATION();
+	bool		locale_is_c = false;
+	pg_locale_t	mylocale = 0;
 
 	check_collation_set(collid);
 
 	len1 = bcTruelen(arg1);
 	len2 = bcTruelen(arg2);
 
-	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
-		pg_newlocale_from_collation(collid)->deterministic)
+	if (lc_collate_is_c(collid))
+		locale_is_c = true;
+	else
+		mylocale = pg_newlocale_from_collation(collid);
+
+	if (locale_is_c || !mylocale || mylocale->deterministic)
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
@@ -783,15 +788,20 @@ bpcharne(PG_FUNCTION_ARGS)
 				len2;
 	bool		result;
 	Oid			collid = PG_GET_COLLATION();
+	bool		locale_is_c = false;
+	pg_locale_t	mylocale = 0;
 
 	check_collation_set(collid);
 
 	len1 = bcTruelen(arg1);
 	len2 = bcTruelen(arg2);
 
-	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
-		pg_newlocale_from_collation(collid)->deterministic)
+	if (lc_collate_is_c(collid))
+		locale_is_c = true;
+	else
+		mylocale = pg_newlocale_from_collation(collid);
+
+	if (locale_is_c || !mylocale || mylocale->deterministic)
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
@@ -996,7 +1006,7 @@ hashbpchar(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
 	if (!mylocale || mylocale->deterministic)
@@ -1056,7 +1066,7 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 	keydata = VARDATA_ANY(key);
 	keylen = bcTruelen(key);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
 	if (!mylocale || mylocale->deterministic)
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index b3eb39761d..a8db8080e2 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1200,7 +1200,7 @@ text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
 	if (mylocale && !mylocale->deterministic)
@@ -1556,10 +1556,9 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 		char		a2buf[TEXTBUFLEN];
 		char	   *a1p,
 				   *a2p;
-		pg_locale_t mylocale = 0;
+		pg_locale_t mylocale;
 
-		if (collid != DEFAULT_COLLATION_OID)
-			mylocale = pg_newlocale_from_collation(collid);
+		mylocale = pg_newlocale_from_collation(collid);
 
 		/*
 		 * memcmp() can't tell us which of two unequal strings sorts first,
@@ -1776,13 +1775,18 @@ Datum
 texteq(PG_FUNCTION_ARGS)
 {
 	Oid			collid = PG_GET_COLLATION();
+	bool		locale_is_c = false;
+	pg_locale_t	mylocale = 0;
 	bool		result;
 
 	check_collation_set(collid);
 
-	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
-		pg_newlocale_from_collation(collid)->deterministic)
+	if (lc_collate_is_c(collid))
+		locale_is_c = true;
+	else
+		mylocale = pg_newlocale_from_collation(collid);
+
+	if (locale_is_c || !mylocale || mylocale->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
 		Datum		arg2 = PG_GETARG_DATUM(1);
@@ -1830,13 +1834,18 @@ Datum
 textne(PG_FUNCTION_ARGS)
 {
 	Oid			collid = PG_GET_COLLATION();
+	bool		locale_is_c = false;
+	pg_locale_t	mylocale = 0;
 	bool		result;
 
 	check_collation_set(collid);
 
-	if (lc_collate_is_c(collid) ||
-		collid == DEFAULT_COLLATION_OID ||
-		pg_newlocale_from_collation(collid)->deterministic)
+	if (lc_collate_is_c(collid))
+		locale_is_c = true;
+	else
+		mylocale = pg_newlocale_from_collation(collid);
+
+	if (locale_is_c || !mylocale || mylocale->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
 		Datum		arg2 = PG_GETARG_DATUM(1);
@@ -1947,7 +1956,7 @@ text_starts_with(PG_FUNCTION_ARGS)
 
 	check_collation_set(collid);
 
-	if (!lc_collate_is_c(collid) && collid != DEFAULT_COLLATION_OID)
+	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
 	if (mylocale && !mylocale->deterministic)
@@ -2061,8 +2070,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		 * we'll figure out the collation based on the locale id and cache the
 		 * result.
 		 */
-		if (collid != DEFAULT_COLLATION_OID)
-			locale = pg_newlocale_from_collation(collid);
+		locale = pg_newlocale_from_collation(collid);
 
 		/*
 		 * There is a further exception on Windows.  When the database
-- 
2.34.1

#34

Finnerty, Jim

jfinnert@amazon.com

almost 4 years ago

In reply to: Peter Eisentraut (#33)

Re: ICU for global collation

Re:

After this patch goes in, the big next thing would be to support
nondeterministic collations for LIKE, ILIKE and pattern matching operators in
general. Is anyone interested in working on that?

As far as I know you're the last person that seemed to be working on that topic
back in March :)

I have a solution for LIKE and ILIKE for case-insensitive, accent-sensitive ICU collations and the UTF8 server encoding, but didn't attempt to address the general case until ICU collations were supported at the database and cluster levels. I may have another look at that after this patch goes in.

#35

Finnerty, Jim

jfinnert@amazon.com

almost 4 years ago

In reply to: Peter Eisentraut (#30)

Re: ICU for global collation

On 10.01.22 12:49, Daniel Verite wrote:

I think some users would want their db-wide ICU collation to be
case/accent-insensitive.

...

IIRC, that was the context for some questions where people were
enquiring about db-wide ICU collations.

+1. There is the DEFAULT_COLLATION_OID, which is the cluster-level default collation, a.k.a. the "global collation", as distinct from the "db-wide" database-level default collation, which controls the default type of the collatable types within that database.

With the current patch, it's not possible, AFAICS, because the user
can't tell that the collation is non-deterministic. Presumably this
would require another option to CREATE DATABASE and another
column to store that bit of information.

On 1/11/22, 6:24 AM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:

Adding this would be easy, but since pattern matching currently does not
support nondeterministic collations, if you make a global collation
nondeterministic, a lot of system views, psql, pg_dump queries etc.
would break, so it's not practical. I view this is an orthogonal
project. Once we can support this without breaking system views etc.,
then it's easy to enable with a new column in pg_database.

So this patch only enables the default cluster collation (DEFAULT_COLLATION_OID) to be a deterministic ICU collation, but doesn't extend the metadata in a way that would enable database collations to be ICU collations?

Waiting for the pattern matching problem to be solved in general before creating the metadata to support ICU collations everywhere will make it more difficult for members of the community to help solve the pattern matching problem.

What additional metadata changes would be required to enable an ICU collation to be specified at either the cluster-level or the database-level, even if new checks need to be added to disallow a nondeterministic collation to be specified at the cluster level for now?

#36

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Finnerty, Jim (#35)

Re: ICU for global collation

Hi,

On Mon, Jan 17, 2022 at 07:07:38PM +0000, Finnerty, Jim wrote:

On 10.01.22 12:49, Daniel Verite wrote:

I think some users would want their db-wide ICU collation to be
case/accent-insensitive.

...

IIRC, that was the context for some questions where people were
enquiring about db-wide ICU collations.

+1. There is the DEFAULT_COLLATION_OID, which is the cluster-level default
collation, a.k.a. the "global collation", as distinct from the "db-wide"
database-level default collation, which controls the default type of the
collatable types within that database.

There's no cluster-level default collation, and DEFAULT_COLLATION_OID is always
database-level (and template1 having a specific default collation). The
template0 database is there to be able to support different databases with
different default collations, among other things.

So this patchset would allow per-database default ICU collation, although not
non-deterministic ones.

#37

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#33)

Re: ICU for global collation

Hi,

On Thu, Jan 13, 2022 at 09:39:42AM +0100, Peter Eisentraut wrote:

On 11.01.22 12:08, Julien Rouhaud wrote:

So, unless there are concerns, I'm going to see about making a patch to call
pg_newlocale_from_collation() even with the default collation. That would
make the actual feature patch quite a bit smaller, since we won't have to
patch every call site of pg_newlocale_from_collation().

+1 for me!

Here is that patch.

The patch is quite straightforward so I don't have much to say, it all looks
good and passes all regression tests.

If this is applied, then in my estimation all these hunks will completely
disappear from the global ICU patch. So this would be a significant win.

Agreed, the patch will be quite smaller and also easier to review. Are you
planning to apply it on its own or add it to the default ICU patchset? Given
the possible backpatch conflicts it can bring I'm not sure it's worthy enough
to apply on its own.

#38

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Julien Rouhaud (#37)

Re: ICU for global collation

On 18.01.22 05:02, Julien Rouhaud wrote:

If this is applied, then in my estimation all these hunks will completely
disappear from the global ICU patch. So this would be a significant win.

Agreed, the patch will be quite smaller and also easier to review. Are you
planning to apply it on its own or add it to the default ICU patchset?

My plan is to apply this patch in the next few days.

#39

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#38)

1 attachment(s)

Re: ICU for global collation

On 18.01.22 13:54, Peter Eisentraut wrote:

On 18.01.22 05:02, Julien Rouhaud wrote:

If this is applied, then in my estimation all these hunks will
completely
disappear from the global ICU patch. So this would be a significant
win.

Agreed, the patch will be quite smaller and also easier to review.
Are you
planning to apply it on its own or add it to the default ICU patchset?

My plan is to apply this patch in the next few days.

This patch has been committed.

Here is a second preparation patch: Change collate and ctype fields to
type text.

I think this is useful by itself because it allows longer locale names.
ICU locale names with several options appended can be longer than 63
bytes. This case is probably broken right now. Also, it saves space in
the typical case, since most locale names are not anywhere near 63 bytes.

Attachments:

0001-Change-collate-and-ctype-fields-to-type-text.patchtext/plain; charset=UTF-8; name=0001-Change-collate-and-ctype-fields-to-type-text.patchDownload

From 1c46bf3138ad42074971aa3130142236de7e63f7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 21 Jan 2022 10:01:25 +0100
Subject: [PATCH] Change collate and ctype fields to type text

This changes the data type of the catalog fields datcollate, datctype,
collcollate, and collctype from name to text.  There wasn't ever a
really good reason for them to be of type name; presumably this was
just carried over from when they were fixed-size fields in pg_control,
first into the corresponding pg_database fields, and then to
pg_collation.  The values are not identifiers or object names, and we
don't ever look them up that way.

Changing to type text saves space in the typical case, since locale
names are typically less than 10 bytes long.  But it is also possible
that an ICU locale name with several customization options appended
could be longer than 63 bytes, so this also enables that case, which
was previously probably broken.

Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a@2ndquadrant.com
---
 doc/src/sgml/catalogs.sgml           | 40 +++++++++++++--------------
 src/backend/catalog/pg_collation.c   | 10 ++-----
 src/backend/commands/collationcmds.c | 41 +++++++++++++++++++---------
 src/backend/commands/dbcommands.c    | 21 ++++++++++----
 src/backend/utils/adt/pg_locale.c    | 29 +++++++++++++-------
 src/backend/utils/init/postinit.c    | 11 ++++++--
 src/include/catalog/pg_collation.h   |  4 +--
 src/include/catalog/pg_database.h    | 12 ++++----
 src/include/utils/pg_locale.h        |  2 ++
 9 files changed, 104 insertions(+), 66 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 1e65c426b2..7d5b0b1656 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2368,7 +2368,7 @@ <title><structname>pg_collation</structname> Columns</title>
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>collcollate</structfield> <type>name</type>
+       <structfield>collcollate</structfield> <type>text</type>
       </para>
       <para>
        <symbol>LC_COLLATE</symbol> for this collation object
@@ -2377,7 +2377,7 @@ <title><structname>pg_collation</structname> Columns</title>
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>collctype</structfield> <type>name</type>
+       <structfield>collctype</structfield> <type>text</type>
       </para>
       <para>
        <symbol>LC_CTYPE</symbol> for this collation object
@@ -2951,24 +2951,6 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>datcollate</structfield> <type>name</type>
-      </para>
-      <para>
-       LC_COLLATE for this database
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>datctype</structfield> <type>name</type>
-      </para>
-      <para>
-       LC_CTYPE for this database
-      </para></entry>
-     </row>
-
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datistemplate</structfield> <type>bool</type>
@@ -3043,6 +3025,24 @@ <title><structname>pg_database</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datcollate</structfield> <type>text</type>
+      </para>
+      <para>
+       LC_COLLATE for this database
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>datctype</structfield> <type>text</type>
+      </para>
+      <para>
+       LC_CTYPE for this database
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datacl</structfield> <type>aclitem[]</type>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 5be6600652..bfc02d3038 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -58,9 +58,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	HeapTuple	tup;
 	Datum		values[Natts_pg_collation];
 	bool		nulls[Natts_pg_collation];
-	NameData	name_name,
-				name_collate,
-				name_ctype;
+	NameData	name_name;
 	Oid			oid;
 	ObjectAddress myself,
 				referenced;
@@ -163,10 +161,8 @@ CollationCreate(const char *collname, Oid collnamespace,
 	values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
 	values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
 	values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
-	namestrcpy(&name_collate, collcollate);
-	values[Anum_pg_collation_collcollate - 1] = NameGetDatum(&name_collate);
-	namestrcpy(&name_ctype, collctype);
-	values[Anum_pg_collation_collctype - 1] = NameGetDatum(&name_ctype);
+	values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
+	values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 56748551de..12fc2316f9 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -129,18 +129,30 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	{
 		Oid			collid;
 		HeapTuple	tp;
+		Datum		datum;
+		bool		isnull;
 
 		collid = get_collation_oid(defGetQualifiedName(fromEl), false);
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collid);
 
-		collcollate = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collcollate));
-		collctype = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collctype));
 		collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 		collisdeterministic = ((Form_pg_collation) GETSTRUCT(tp))->collisdeterministic;
 		collencoding = ((Form_pg_collation) GETSTRUCT(tp))->collencoding;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+		if (!isnull)
+			collcollate = TextDatumGetCString(datum);
+		else
+			collcollate = NULL;
+
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+		if (!isnull)
+			collctype = TextDatumGetCString(datum);
+		else
+			collctype = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -314,7 +326,7 @@ AlterCollation(AlterCollationStmt *stmt)
 	Oid			collOid;
 	HeapTuple	tup;
 	Form_pg_collation collForm;
-	Datum		collversion;
+	Datum		datum;
 	bool		isnull;
 	char	   *oldversion;
 	char	   *newversion;
@@ -332,11 +344,12 @@ AlterCollation(AlterCollationStmt *stmt)
 		elog(ERROR, "cache lookup failed for collation %u", collOid);
 
 	collForm = (Form_pg_collation) GETSTRUCT(tup);
-	collversion = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion,
-								  &isnull);
-	oldversion = isnull ? NULL : TextDatumGetCString(collversion);
+	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
+	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	newversion = get_collation_actual_version(collForm->collprovider, NameStr(collForm->collcollate));
+	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
+	Assert(!isnull);
+	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -383,8 +396,9 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 {
 	Oid			collid = PG_GETARG_OID(0);
 	HeapTuple	tp;
-	char	   *collcollate;
 	char		collprovider;
+	Datum		datum;
+	bool		isnull;
 	char	   *version;
 
 	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
@@ -393,12 +407,13 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("collation with OID %u does not exist", collid)));
 
-	collcollate = pstrdup(NameStr(((Form_pg_collation) GETSTRUCT(tp))->collcollate));
 	collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 
-	ReleaseSysCache(tp);
+	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+	Assert(!isnull);
+	version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
 
-	version = get_collation_actual_version(collprovider, collcollate);
+	ReleaseSysCache(tp);
 
 	if (version)
 		PG_RETURN_TEXT_P(cstring_to_text(version));
@@ -546,7 +561,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 #ifdef READ_LOCALE_A_OUTPUT
 	{
 		FILE	   *locale_a_handle;
-		char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+		char		localebuf[LOCALE_NAME_BUFLEN];
 		int			nvalid = 0;
 		Oid			collid;
 		CollAliasData *aliases;
@@ -570,7 +585,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 		{
 			size_t		len;
 			int			enc;
-			char		alias[NAMEDATALEN];
+			char		alias[LOCALE_NAME_BUFLEN];
 
 			len = strlen(localebuf);
 
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index da8345561d..65b420aea6 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -523,10 +523,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbname));
 	new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
 	new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
-	new_record[Anum_pg_database_datcollate - 1] =
-		DirectFunctionCall1(namein, CStringGetDatum(dbcollate));
-	new_record[Anum_pg_database_datctype - 1] =
-		DirectFunctionCall1(namein, CStringGetDatum(dbctype));
+	new_record[Anum_pg_database_datcollate - 1] = CStringGetTextDatum(dbcollate);
+	new_record[Anum_pg_database_datctype - 1] = CStringGetTextDatum(dbctype);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -1820,6 +1818,9 @@ get_db_info(const char *name, LOCKMODE lockmode,
 
 			if (strcmp(name, NameStr(dbform->datname)) == 0)
 			{
+				Datum		datum;
+				bool		isnull;
+
 				/* oid of the database */
 				if (dbIdP)
 					*dbIdP = dbOid;
@@ -1846,9 +1847,17 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					*dbTablespace = dbform->dattablespace;
 				/* default locale settings for this database */
 				if (dbCollate)
-					*dbCollate = pstrdup(NameStr(dbform->datcollate));
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollate, &isnull);
+					Assert(!isnull);
+					*dbCollate = TextDatumGetCString(datum);
+				}
 				if (dbCtype)
-					*dbCtype = pstrdup(NameStr(dbform->datctype));
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datctype, &isnull);
+					Assert(!isnull);
+					*dbCtype = TextDatumGetCString(datum);
+				}
 				ReleaseSysCache(tuple);
 				result = true;
 				break;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 33cccc5c6c..aefa0818d0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -179,7 +179,7 @@ pg_perm_setlocale(int category, const char *locale)
 	 */
 	if (category == LC_CTYPE)
 	{
-		static char save_lc_ctype[NAMEDATALEN + 20];
+		static char save_lc_ctype[LOCALE_NAME_BUFLEN];
 
 		/* copy setlocale() return value before callee invokes it again */
 		strlcpy(save_lc_ctype, result, sizeof(save_lc_ctype));
@@ -1288,17 +1288,21 @@ lookup_collation_cache(Oid collation, bool set_flags)
 	{
 		/* Attempt to set the flags */
 		HeapTuple	tp;
-		Form_pg_collation collform;
+		Datum		datum;
+		bool		isnull;
 		const char *collcollate;
 		const char *collctype;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collation));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collation);
-		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		collcollate = NameStr(collform->collcollate);
-		collctype = NameStr(collform->collctype);
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+		Assert(!isnull);
+		collcollate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+		Assert(!isnull);
+		collctype = TextDatumGetCString(datum);
 
 		cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
 									 (strcmp(collcollate, "POSIX") == 0));
@@ -1484,7 +1488,7 @@ pg_newlocale_from_collation(Oid collid)
 		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
-		Datum		collversion;
+		Datum		datum;
 		bool		isnull;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
@@ -1492,8 +1496,12 @@ pg_newlocale_from_collation(Oid collid)
 			elog(ERROR, "cache lookup failed for collation %u", collid);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		collcollate = NameStr(collform->collcollate);
-		collctype = NameStr(collform->collctype);
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+		Assert(!isnull);
+		collcollate = TextDatumGetCString(datum);
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+		Assert(!isnull);
+		collctype = TextDatumGetCString(datum);
 
 		/* We'll fill in the result struct locally before allocating memory */
 		memset(&result, 0, sizeof(result));
@@ -1587,13 +1595,15 @@ pg_newlocale_from_collation(Oid collid)
 #endif							/* not USE_ICU */
 		}
 
-		collversion = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
 									  &isnull);
 		if (!isnull)
 		{
 			char	   *actual_versionstr;
 			char	   *collversionstr;
 
+			collversionstr = TextDatumGetCString(datum);
+
 			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
 			if (!actual_versionstr)
 			{
@@ -1606,7 +1616,6 @@ pg_newlocale_from_collation(Oid collid)
 						(errmsg("collation \"%s\" has no actual version, but a version was specified",
 								NameStr(collform->collname))));
 			}
-			collversionstr = TextDatumGetCString(collversion);
 
 			if (strcmp(actual_versionstr, collversionstr) != 0)
 				ereport(WARNING,
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 0236165f60..d046caabd7 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -53,6 +53,7 @@
 #include "storage/sync.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
+#include "utils/builtins.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -306,6 +307,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 {
 	HeapTuple	tup;
 	Form_pg_database dbform;
+	Datum		datum;
+	bool		isnull;
 	char	   *collate;
 	char	   *ctype;
 
@@ -389,8 +392,12 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 					PGC_BACKEND, PGC_S_DYNAMIC_DEFAULT);
 
 	/* assign locale variables */
-	collate = NameStr(dbform->datcollate);
-	ctype = NameStr(dbform->datctype);
+	datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollate, &isnull);
+	Assert(!isnull);
+	collate = TextDatumGetCString(datum);
+	datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datctype, &isnull);
+	Assert(!isnull);
+	ctype = TextDatumGetCString(datum);
 
 	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
 		ereport(FATAL,
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index bc746537c1..8763dd4080 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -39,9 +39,9 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	char		collprovider;	/* see constants below */
 	bool		collisdeterministic BKI_DEFAULT(t);
 	int32		collencoding;	/* encoding for this collation; -1 = "all" */
-	NameData	collcollate;	/* LC_COLLATE setting */
-	NameData	collctype;		/* LC_CTYPE setting */
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
+	text		collcollate BKI_FORCE_NOT_NULL;		/* LC_COLLATE setting */
+	text		collctype BKI_FORCE_NOT_NULL;		/* LC_CTYPE setting */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 1ff6d3e50c..90b43a4ecc 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -40,12 +40,6 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* character encoding */
 	int32		encoding;
 
-	/* LC_COLLATE setting */
-	NameData	datcollate;
-
-	/* LC_CTYPE setting */
-	NameData	datctype;
-
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
 
@@ -65,6 +59,12 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	Oid			dattablespace BKI_LOOKUP(pg_tablespace);
 
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
+	/* LC_COLLATE setting */
+	text		datcollate BKI_FORCE_NOT_NULL;
+
+	/* LC_CTYPE setting */
+	text		datctype BKI_FORCE_NOT_NULL;
+
 	/* access permissions */
 	aclitem		datacl[1];
 #endif
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b4a2765983..30e423af0e 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -34,6 +34,8 @@
 #endif
 #endif
 
+/* use for libc locale names */
+#define LOCALE_NAME_BUFLEN 128
 
 /* GUC settings */
 extern char *locale_messages;
-- 
2.34.1

#40

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#39)

Re: ICU for global collation

Hi,

On Fri, Jan 21, 2022 at 10:44:24AM +0100, Peter Eisentraut wrote:

Here is a second preparation patch: Change collate and ctype fields to type
text.

I think this is useful by itself because it allows longer locale names. ICU
locale names with several options appended can be longer than 63 bytes.
This case is probably broken right now. Also, it saves space in the typical
case, since most locale names are not anywhere near 63 bytes.

I totally agree.

From 1c46bf3138ad42074971aa3130142236de7e63f7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 21 Jan 2022 10:01:25 +0100
Subject: [PATCH] Change collate and ctype fields to type text

+			collversionstr = TextDatumGetCString(datum);
+
 			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
 			if (!actual_versionstr)
 			{
@@ -1606,7 +1616,6 @@ pg_newlocale_from_collation(Oid collid)
 						(errmsg("collation \"%s\" has no actual version, but a version was specified",
 								NameStr(collform->collname))));
 			}
-			collversionstr = TextDatumGetCString(collversion);

Is that change intended? There isn't any usage of the collversionstr before
the possible error when actual_versionstr is missing.

Apart from that the patch looks good to me, all tests pass and no issue with
building the doc either.

#41

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Julien Rouhaud (#40)

Re: ICU for global collation

On 21.01.22 14:51, Julien Rouhaud wrote:

From 1c46bf3138ad42074971aa3130142236de7e63f7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 21 Jan 2022 10:01:25 +0100
Subject: [PATCH] Change collate and ctype fields to type text
+			collversionstr = TextDatumGetCString(datum);
+
actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
if (!actual_versionstr)
{
@@ -1606,7 +1616,6 @@ pg_newlocale_from_collation(Oid collid)
(errmsg("collation \"%s\" has no actual version, but a version was specified",
NameStr(collform->collname))));
}
-			collversionstr = TextDatumGetCString(collversion);
Is that change intended? There isn't any usage of the collversionstr before
the possible error when actual_versionstr is missing.

I wanted to move it closer to the SysCacheGetAttr() where the "datum"
value is obtained. It seemed weird to get the datum, then do other
things, then decode the datum.

#42

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#41)

Re: ICU for global collation

On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote:

On 21.01.22 14:51, Julien Rouhaud wrote:

Is that change intended? There isn't any usage of the collversionstr before
the possible error when actual_versionstr is missing.

I wanted to move it closer to the SysCacheGetAttr() where the "datum" value
is obtained. It seemed weird to get the datum, then do other things, then
decode the datum.

Oh ok. It won't make much difference performance-wise, so no objection.

#43

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Julien Rouhaud (#42)

Re: ICU for global collation

On 21.01.22 17:13, Julien Rouhaud wrote:

On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote:

On 21.01.22 14:51, Julien Rouhaud wrote:

Is that change intended? There isn't any usage of the collversionstr before
the possible error when actual_versionstr is missing.

I wanted to move it closer to the SysCacheGetAttr() where the "datum" value
is obtained. It seemed weird to get the datum, then do other things, then
decode the datum.

Oh ok. It won't make much difference performance-wise, so no objection.

I have committed this and will provide follow-up patches in the next few
days.

#44

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#43)

1 attachment(s)

Re: ICU for global collation

On 27.01.22 09:10, Peter Eisentraut wrote:

On 21.01.22 17:13, Julien Rouhaud wrote:

On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote:

On 21.01.22 14:51, Julien Rouhaud wrote:

Is that change intended? There isn't any usage of the
collversionstr before
the possible error when actual_versionstr is missing.

I wanted to move it closer to the SysCacheGetAttr() where the "datum"
value
is obtained. It seemed weird to get the datum, then do other things,
then
decode the datum.

Oh ok. It won't make much difference performance-wise, so no objection.

I have committed this and will provide follow-up patches in the next few
days.

Here is the main patch rebased on the various changes that have been
committed in the meantime. There is still some work to be done on the
user interfaces of initdb, createdb, etc.

I have split out the database-level collation version tracking into a
separate patch [0]/messages/by-id/f0ff3190-29a3-5b39-a179-fa32eee57db6@enterprisedb.com. I think we should get that one in first and then
refresh this one.

[0]: /messages/by-id/f0ff3190-29a3-5b39-a179-fa32eee57db6@enterprisedb.com
/messages/by-id/f0ff3190-29a3-5b39-a179-fa32eee57db6@enterprisedb.com

Attachments:

v4-0001-Add-option-to-use-ICU-as-global-collation-provide.patchtext/plain; charset=UTF-8; name=v4-0001-Add-option-to-use-ICU-as-global-collation-provide.patchDownload

From 4028980f6be3662c0302575ed92de77e941e5a9e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 2 Feb 2022 13:54:04 +0100
Subject: [PATCH v4] Add option to use ICU as global collation provider

This adds the option to use ICU as the default collation provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
---
 doc/src/sgml/catalogs.sgml                    |   9 +
 doc/src/sgml/ref/create_database.sgml         |  16 ++
 doc/src/sgml/ref/createdb.sgml                |   9 +
 doc/src/sgml/ref/initdb.sgml                  |  23 ++
 src/backend/catalog/pg_collation.c            |  18 +-
 src/backend/commands/collationcmds.c          |  93 ++++---
 src/backend/commands/dbcommands.c             |  72 +++++-
 src/backend/utils/adt/pg_locale.c             | 242 +++++++++++-------
 src/backend/utils/init/postinit.c             |  26 ++
 src/bin/initdb/Makefile                       |   2 +
 src/bin/initdb/initdb.c                       |  64 ++++-
 src/bin/initdb/t/001_initdb.pl                |  18 +-
 src/bin/pg_dump/pg_dump.c                     |  19 ++
 src/bin/pg_upgrade/check.c                    |  10 +
 src/bin/pg_upgrade/info.c                     |  18 +-
 src/bin/pg_upgrade/pg_upgrade.h               |   2 +
 src/bin/psql/describe.c                       |  23 +-
 src/bin/psql/tab-complete.c                   |   2 +-
 src/bin/scripts/Makefile                      |   2 +
 src/bin/scripts/createdb.c                    |   9 +
 src/bin/scripts/t/020_createdb.pl             |  20 +-
 src/include/catalog/pg_collation.dat          |   3 +-
 src/include/catalog/pg_collation.h            |   6 +-
 src/include/catalog/pg_database.dat           |   4 +-
 src/include/catalog/pg_database.h             |   6 +
 src/include/utils/pg_locale.h                 |   6 +
 .../regress/expected/collate.icu.utf8.out     |  10 +-
 src/test/regress/sql/collate.icu.utf8.sql     |   8 +-
 28 files changed, 572 insertions(+), 168 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 7d5b0b1656..5a5779b9a3 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2384,6 +2384,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>collicucoll</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU collation string
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index f22e28dc81..403010cddf 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -28,6 +28,7 @@
            [ LOCALE [=] <replaceable class="parameter">locale</replaceable> ]
            [ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
            [ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
+           [ COLLATION_PROVIDER [=] <replaceable class="parameter">collation_provider</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
            [ ALLOW_CONNECTIONS [=] <replaceable class="parameter">allowconn</replaceable> ]
            [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ]
@@ -158,6 +159,21 @@ <title>Parameters</title>
        </para>
       </listitem>
      </varlistentry>
+
+     <varlistentry>
+      <term><replaceable>collation_provider</replaceable></term>
+
+      <listitem>
+       <para>
+        Specifies the provider to use for the default collation in this
+        database.  Possible values are:
+        <literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
+        <literal>libc</literal>.  <literal>libc</literal> is the default.  The
+        available choices depend on the operating system and build options.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><replaceable class="parameter">tablespace_name</replaceable></term>
       <listitem>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 86473455c9..4b07363fcc 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -83,6 +83,15 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        Specifies the collation provider for the database's default collation.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">tablespace</replaceable></option></term>
       <term><option>--tablespace=<replaceable class="parameter">tablespace</replaceable></option></term>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 8f71c7c962..77618d9a7a 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -166,6 +166,18 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--collation-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        This option sets the collation provider for databases created in the
+        new cluster.  It can be overridden in the <command>CREATE
+        DATABASE</command> command when new databases are subsequently
+        created.  The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-D <replaceable class="parameter">directory</replaceable></option></term>
       <term><option>--pgdata=<replaceable class="parameter">directory</replaceable></option></term>
@@ -210,6 +222,17 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU collation provider is used.  If
+        this is not specified, the value from the <option>--locale</option>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index bfc02d3038..5596b9be5a 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -49,6 +49,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				bool collisdeterministic,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
+				const char *collicucoll,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -66,8 +67,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	AssertArg(collname);
 	AssertArg(collnamespace);
 	AssertArg(collowner);
-	AssertArg(collcollate);
-	AssertArg(collctype);
+	AssertArg((collcollate && collctype) || collicucoll);
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
@@ -161,8 +161,18 @@ CollationCreate(const char *collname, Oid collnamespace,
 	values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
 	values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
 	values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
-	values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
-	values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
+	if (collcollate)
+		values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
+	else
+		nulls[Anum_pg_collation_collcollate - 1] = true;
+	if (collctype)
+		values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
+	else
+		nulls[Anum_pg_collation_collctype - 1] = true;
+	if (collicucoll)
+		values[Anum_pg_collation_collicucoll - 1] = CStringGetTextDatum(collicucoll);
+	else
+		nulls[Anum_pg_collation_collicucoll - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 12fc2316f9..a9b50f5d2b 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -65,6 +65,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *versionEl = NULL;
 	char	   *collcollate = NULL;
 	char	   *collctype = NULL;
+	char	   *collicucoll = NULL;
 	char	   *collproviderstr = NULL;
 	bool		collisdeterministic = true;
 	int			collencoding = 0;
@@ -153,6 +154,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collctype = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicucoll, &isnull);
+		if (!isnull)
+			collicucoll = TextDatumGetCString(datum);
+		else
+			collicucoll = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					 errmsg("collation \"default\" cannot be copied")));
 	}
 
-	if (localeEl)
-	{
-		collcollate = defGetString(localeEl);
-		collctype = defGetString(localeEl);
-	}
-
-	if (lccollateEl)
-		collcollate = defGetString(lccollateEl);
-
-	if (lcctypeEl)
-		collctype = defGetString(lcctypeEl);
-
 	if (providerEl)
 		collproviderstr = defGetString(providerEl);
 
@@ -204,15 +199,43 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	else if (!fromEl)
 		collprovider = COLLPROVIDER_LIBC;
 
-	if (!collcollate)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_collate\" must be specified")));
+	if (localeEl)
+	{
+		if (collprovider == COLLPROVIDER_LIBC)
+		{
+			collcollate = defGetString(localeEl);
+			collctype = defGetString(localeEl);
+		}
+		else
+			collicucoll = defGetString(localeEl);
+	}
 
-	if (!collctype)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_ctype\" must be specified")));
+	if (lccollateEl)
+		collcollate = defGetString(lccollateEl);
+
+	if (lcctypeEl)
+		collctype = defGetString(lcctypeEl);
+
+	if (collprovider == COLLPROVIDER_LIBC)
+	{
+		if (!collcollate)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"lc_collate\" must be specified")));
+
+		if (!collctype)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"lc_ctype\" must be specified")));
+	}
+
+	if (collprovider == COLLPROVIDER_ICU)
+	{
+		if (!collicucoll)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"locale\" must be specified")));
+	}
 
 	/*
 	 * Nondeterministic collations are currently only supported with ICU
@@ -255,7 +278,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collcollate);
+		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? collicucoll : collcollate);
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -265,6 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collencoding,
 							 collcollate,
 							 collctype,
+							 collicucoll,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -347,7 +371,7 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
+	datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_collicucoll : Anum_pg_collation_collcollate, &isnull);
 	Assert(!isnull);
 	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
 
@@ -409,9 +433,14 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 	collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 
-	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-	Assert(!isnull);
-	version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
+	if (collprovider != COLLPROVIDER_DEFAULT)
+	{
+		datum = SysCacheGetAttr(COLLOID, tp, collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_collicucoll : Anum_pg_collation_collcollate, &isnull);
+		Assert(!isnull);
+		version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
+	}
+	else
+		version = NULL;
 
 	ReleaseSysCache(tp);
 
@@ -638,7 +667,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			 */
 			collid = CollationCreate(localebuf, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 localebuf, localebuf,
+									 localebuf, localebuf, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 									 true, true);
 			if (OidIsValid(collid))
@@ -699,7 +728,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale,
+									 locale, locale, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -740,7 +769,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
-			const char *collcollate;
+			const char *icucollstr;
 			Oid			collid;
 
 			if (i == -1)
@@ -749,20 +778,20 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 				name = uloc_getAvailable(i);
 
 			langtag = get_icu_language_tag(name);
-			collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
+			icucollstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
 
 			/*
 			 * Be paranoid about not allowing any non-ASCII strings into
 			 * pg_collation
 			 */
-			if (!pg_is_ascii(langtag) || !pg_is_ascii(collcollate))
+			if (!pg_is_ascii(langtag) || !pg_is_ascii(icucollstr))
 				continue;
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 collcollate, collcollate,
-									 get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
+									 NULL, NULL, icucollstr,
+									 get_collation_actual_version(COLLPROVIDER_ICU, icucollstr),
 									 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index e673138cbd..df7087cd7b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -36,6 +36,7 @@
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_subscription.h"
@@ -85,7 +86,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						Oid *dbIdP, Oid *ownerIdP,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
-						Oid *dbTablespace, char **dbCollate, char **dbCtype);
+						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIcucoll,
+						char *dbCollProvider);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
 static bool check_db_file_conflict(Oid db_id);
@@ -105,6 +107,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	int			src_encoding = -1;
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
+	char	   *src_icucoll = NULL;
+	char		src_collprovider;
 	bool		src_istemplate;
 	bool		src_allowconn;
 	TransactionId src_frozenxid = InvalidTransactionId;
@@ -125,6 +129,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dlocale = NULL;
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
+	DefElem	   *dcollprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
@@ -133,6 +138,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	const char *dbtemplate = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
+	char	   *dbicucoll = NULL;
+	char		dbcollprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
 	bool		dbistemplate = false;
@@ -189,6 +196,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			dctype = defel;
 		}
+		else if (strcmp(defel->defname, "collation_provider") == 0)
+		{
+			if (dcollprovider)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			dcollprovider = defel;
+		}
 		else if (strcmp(defel->defname, "is_template") == 0)
 		{
 			if (distemplate)
@@ -246,12 +262,6 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	if (dlocale && (dcollate || dctype))
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("conflicting or redundant options"),
-				 errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.")));
-
 	if (downer && downer->arg)
 		dbowner = defGetString(downer);
 	if (dtemplate && dtemplate->arg)
@@ -288,11 +298,29 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		dbcollate = defGetString(dlocale);
 		dbctype = defGetString(dlocale);
+		dbicucoll = defGetString(dlocale);
 	}
 	if (dcollate && dcollate->arg)
 		dbcollate = defGetString(dcollate);
 	if (dctype && dctype->arg)
 		dbctype = defGetString(dctype);
+	if (dcollprovider && dcollprovider->arg)
+	{
+		char	   *collproviderstr = defGetString(dcollprovider);
+
+#ifdef USE_ICU
+		if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			dbcollprovider = COLLPROVIDER_ICU;
+		else
+#endif
+		if (pg_strcasecmp(collproviderstr, "libc") == 0)
+			dbcollprovider = COLLPROVIDER_LIBC;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("unrecognized collation provider: %s",
+							collproviderstr)));
+	}
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -342,7 +370,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype))
+					 &src_collate, &src_ctype, &src_icucoll, &src_collprovider))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -368,6 +396,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = src_collate;
 	if (dbctype == NULL)
 		dbctype = src_ctype;
+	if (dbicucoll == NULL)
+		dbicucoll = src_icucoll;
+	if (dbcollprovider == '\0')
+		dbcollprovider = src_collprovider;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -570,6 +602,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbname));
 	new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
 	new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+	new_record[Anum_pg_database_datcollprovider - 1] = CharGetDatum(dbcollprovider);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -578,6 +611,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
 	new_record[Anum_pg_database_datcollate - 1] = CStringGetTextDatum(dbcollate);
 	new_record[Anum_pg_database_datctype - 1] = CStringGetTextDatum(dbctype);
+	if (dbicucoll)
+		new_record[Anum_pg_database_daticucoll - 1] = CStringGetTextDatum(dbicucoll);
+	else
+		new_record_nulls[Anum_pg_database_daticucoll] = true;
 
 	/*
 	 * We deliberately set datacl to default (NULL), rather than copying it
@@ -844,7 +881,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1043,7 +1080,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1156,7 +1193,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -1800,7 +1837,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			Oid *dbIdP, Oid *ownerIdP,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
-			Oid *dbTablespace, char **dbCollate, char **dbCtype)
+			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIcucoll,
+			char *dbCollProvider)
 {
 	bool		result = false;
 	Relation	relation;
@@ -1893,6 +1931,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 				if (dbTablespace)
 					*dbTablespace = dbform->dattablespace;
 				/* default locale settings for this database */
+				if (dbCollProvider)
+					*dbCollProvider = dbform->datcollprovider;
 				if (dbCollate)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollate, &isnull);
@@ -1905,6 +1945,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					Assert(!isnull);
 					*dbCtype = TextDatumGetCString(datum);
 				}
+				if (dbIcucoll)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticucoll, &isnull);
+					if (isnull)
+						*dbIcucoll = NULL;
+					else
+						*dbIcucoll = TextDatumGetCString(datum);
+				}
 				ReleaseSysCache(tuple);
 				result = true;
 				break;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index aefa0818d0..0334f66a23 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1288,26 +1288,37 @@ lookup_collation_cache(Oid collation, bool set_flags)
 	{
 		/* Attempt to set the flags */
 		HeapTuple	tp;
-		Datum		datum;
-		bool		isnull;
-		const char *collcollate;
-		const char *collctype;
+		Form_pg_collation collform;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collation));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collation);
+		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-		Assert(!isnull);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
-		Assert(!isnull);
-		collctype = TextDatumGetCString(datum);
-
-		cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-									 (strcmp(collcollate, "POSIX") == 0));
-		cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-								   (strcmp(collctype, "POSIX") == 0));
+		if (collform->collprovider == COLLPROVIDER_LIBC)
+		{
+			Datum		datum;
+			bool		isnull;
+			const char *collcollate;
+			const char *collctype;
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
+										 (strcmp(collcollate, "POSIX") == 0));
+			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
+									   (strcmp(collctype, "POSIX") == 0));
+		}
+		else
+		{
+			cache_entry->collate_is_c = false;
+			cache_entry->ctype_is_c = false;
+		}
 
 		cache_entry->flags_valid = true;
 
@@ -1340,6 +1351,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_COLLATE, NULL);
@@ -1390,6 +1404,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_CTYPE, NULL);
@@ -1418,6 +1435,87 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+struct pg_locale_struct default_locale;
+
+void
+make_icu_collator(const char *icucollstr,
+				  struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+	UCollator  *collator;
+	UErrorCode	status;
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(icucollstr, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open collator for locale \"%s\": %s",
+						icucollstr, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+		icu_set_collation_attributes(collator, icucollstr);
+
+	/* We will leak this string if we get an error below :-( */
+	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, icucollstr);
+	resultp->info.icu.ucol = collator;
+#else							/* not USE_ICU */
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build"), \
+			 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif							/* not USE_ICU */
+}
+
+void
+check_collation_version(HeapTuple colltuple)
+{
+	Form_pg_collation collform;
+	Datum		datum;
+	bool		isnull;
+
+	collform = (Form_pg_collation) GETSTRUCT(colltuple);
+
+	datum = SysCacheGetAttr(COLLOID, colltuple, Anum_pg_collation_collversion,
+							&isnull);
+	if (!isnull)
+	{
+		char	   *actual_versionstr;
+		char	   *collversionstr;
+
+		collversionstr = TextDatumGetCString(datum);
+
+		datum = SysCacheGetAttr(COLLOID, colltuple, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_collicucoll : Anum_pg_collation_collcollate, &isnull);
+		Assert(!isnull);
+
+		actual_versionstr = get_collation_actual_version(collform->collprovider,
+														 TextDatumGetCString(datum));
+		if (!actual_versionstr)
+		{
+			/*
+			 * This could happen when specifying a version in CREATE
+			 * COLLATION for a libc locale, or manually creating a mess in
+			 * the catalogs.
+			 */
+			ereport(ERROR,
+					(errmsg("collation \"%s\" has no actual version, but a version was specified",
+							NameStr(collform->collname))));
+		}
+
+		if (strcmp(actual_versionstr, collversionstr) != 0)
+			ereport(WARNING,
+					(errmsg("collation \"%s\" has version mismatch",
+							NameStr(collform->collname)),
+					 errdetail("The collation in the database was created using version %s, "
+							   "but the operating system provides version %s.",
+							   collversionstr, actual_versionstr),
+					 errhint("Rebuild all objects affected by this collation and run "
+							 "ALTER COLLATION %s REFRESH VERSION, "
+							 "or build PostgreSQL with the right library version.",
+							 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
+														NameStr(collform->collname)))));
+	}
+}
 
 /* simple subroutine for reporting errors from newlocale() */
 #ifdef HAVE_LOCALE_T
@@ -1475,7 +1573,12 @@ pg_newlocale_from_collation(Oid collid)
 	Assert(OidIsValid(collid));
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return (pg_locale_t) 0;
+	{
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return &default_locale;
+		else
+			return (pg_locale_t) 0;
+	}
 
 	cache_entry = lookup_collation_cache(collid, false);
 
@@ -1484,25 +1587,14 @@ pg_newlocale_from_collation(Oid collid)
 		/* We haven't computed this yet in this session, so do it */
 		HeapTuple	tp;
 		Form_pg_collation collform;
-		const char *collcollate;
-		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
-		Datum		datum;
-		bool		isnull;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collid));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collid);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-		Assert(!isnull);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
-		Assert(!isnull);
-		collctype = TextDatumGetCString(datum);
-
 		/* We'll fill in the result struct locally before allocating memory */
 		memset(&result, 0, sizeof(result));
 		result.provider = collform->collprovider;
@@ -1511,8 +1603,19 @@ pg_newlocale_from_collation(Oid collid)
 		if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
+			Datum		datum;
+			bool		isnull;
+			const char *collcollate;
+			const char *collctype pg_attribute_unused();
 			locale_t	loc;
 
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
 			if (strcmp(collcollate, collctype) == 0)
 			{
 				/* Normal case where they're the same */
@@ -1563,73 +1666,17 @@ pg_newlocale_from_collation(Oid collid)
 		}
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
-			UCollator  *collator;
-			UErrorCode	status;
-
-			if (strcmp(collcollate, collctype) != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("collations with different collate and ctype values are not supported by ICU")));
-
-			status = U_ZERO_ERROR;
-			collator = ucol_open(collcollate, &status);
-			if (U_FAILURE(status))
-				ereport(ERROR,
-						(errmsg("could not open collator for locale \"%s\": %s",
-								collcollate, u_errorName(status))));
-
-			if (U_ICU_VERSION_MAJOR_NUM < 54)
-				icu_set_collation_attributes(collator, collcollate);
-
-			/* We will leak this string if we get an error below :-( */
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext,
-														 collcollate);
-			result.info.icu.ucol = collator;
-#else							/* not USE_ICU */
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build"), \
-					 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
-#endif							/* not USE_ICU */
+			Datum		datum;
+			bool		isnull;
+			const char *icucollstr;;
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collicucoll, &isnull);
+			Assert(!isnull);
+			icucollstr = TextDatumGetCString(datum);
+			make_icu_collator(icucollstr, &result);
 		}
 
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
-									  &isnull);
-		if (!isnull)
-		{
-			char	   *actual_versionstr;
-			char	   *collversionstr;
-
-			collversionstr = TextDatumGetCString(datum);
-
-			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
-			if (!actual_versionstr)
-			{
-				/*
-				 * This could happen when specifying a version in CREATE
-				 * COLLATION for a libc locale, or manually creating a mess in
-				 * the catalogs.
-				 */
-				ereport(ERROR,
-						(errmsg("collation \"%s\" has no actual version, but a version was specified",
-								NameStr(collform->collname))));
-			}
-
-			if (strcmp(actual_versionstr, collversionstr) != 0)
-				ereport(WARNING,
-						(errmsg("collation \"%s\" has version mismatch",
-								NameStr(collform->collname)),
-						 errdetail("The collation in the database was created using version %s, "
-								   "but the operating system provides version %s.",
-								   collversionstr, actual_versionstr),
-						 errhint("Rebuild all objects affected by this collation and run "
-								 "ALTER COLLATION %s REFRESH VERSION, "
-								 "or build PostgreSQL with the right library version.",
-								 quote_qualified_identifier(get_namespace_name(collform->collnamespace),
-															NameStr(collform->collname)))));
-		}
+		check_collation_version(tp);
 
 		ReleaseSysCache(tp);
 
@@ -1652,6 +1699,17 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_DEFAULT)
+	{
+#ifdef USE_ICU
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			collversion = get_collation_actual_version(default_locale.provider,
+													   default_locale.info.icu.locale);
+		else
+#endif
+			collversion = NULL;
+	}
+	else
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 5b9ed2f6f5..ffe60673d5 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -31,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
+#include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
@@ -414,6 +415,31 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	if (dbform->datcollprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticucoll, &isnull);
+		Assert(!isnull);
+		make_icu_collator(TextDatumGetCString(datum), &default_locale);
+	}
+
+	default_locale.provider = dbform->datcollprovider;
+	/*
+	 * Default locale is currently always deterministic.  Nondeterministic
+	 * locales currently don't support pattern matching, which would break a
+	 * lot of things if applied globally.
+	 */
+	default_locale.deterministic = true;
+
+	{
+		HeapTuple	tp;
+
+		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(DEFAULT_COLLATION_OID));
+		if (!HeapTupleIsValid(tp))
+			elog(ERROR, "cache lookup failed for collation %u", DEFAULT_COLLATION_OID);
+		check_collation_version(tp);
+		ReleaseSysCache(tp);
+	}
+
 	/* Make the locale settings visible as GUC variables, too */
 	SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_OVERRIDE);
 	SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_OVERRIDE);
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index eba282267a..b0dd13dfbd 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -62,6 +62,8 @@ clean distclean maintainer-clean:
 # ensure that changes in datadir propagate into object file
 initdb.o: initdb.c $(top_builddir)/src/Makefile.global
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index d78e8e67b8..64880038ca 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -132,6 +132,8 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static char collation_provider[] = {COLLPROVIDER_LIBC, '\0'};
+static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1405,6 +1407,12 @@ bootstrap_template1(void)
 	bki_lines = replace_token(bki_lines, "LC_CTYPE",
 							  escape_quotes_bki(lc_ctype));
 
+	bki_lines = replace_token(bki_lines, "ICUCOLL",
+							  escape_quotes_bki(collation_provider[0] == COLLPROVIDER_ICU ? icu_locale : "_null_"));
+
+	bki_lines = replace_token(bki_lines, "COLLPROVIDER",
+							  collation_provider);
+
 	/* Also ensure backend isn't confused by this environment var: */
 	unsetenv("PGCLIENTENCODING");
 
@@ -1587,6 +1595,12 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
+	/*
+	 * Set version of the default collation.
+	 */
+	PG_CMD_PRINTF("UPDATE pg_collation SET collversion = pg_collation_actual_version(oid) WHERE oid = %d;\n\n",
+				  DEFAULT_COLLATION_OID);
+
 	/*
 	 * Add an SQL-standard name.  We don't want to pin this, so it doesn't go
 	 * in pg_collation.h.  But add it before reading system collations, so
@@ -1854,9 +1868,6 @@ make_template0(FILE *cmdfd)
 	 * the new cluster should be the result of a fresh initdb.)
 	 */
 	static const char *const template0_setup[] = {
-		"CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID = "
-		CppAsString2(Template0ObjectId) ";\n\n",
-
 		/*
 		 * Explicitly revoke public create-schema and create-temp-table
 		 * privileges in template1 and template0; else the latter would be on
@@ -1874,6 +1885,10 @@ make_template0(FILE *cmdfd)
 		NULL
 	};
 
+	PG_CMD_PRINTF("CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID = "
+				  CppAsString2(Template0ObjectId) " COLLATION_PROVIDER = %s;\n\n",
+				  collation_provider[0] == COLLPROVIDER_ICU ? "icu" : "libc");
+
 	for (line = template0_setup; *line; line++)
 		PG_CMD_PUTS(*line);
 }
@@ -2147,13 +2162,14 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale)
+			icu_locale = locale;
 	}
 
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
 	 */
-
 	check_locale_name(LC_CTYPE, lc_ctype, &canonname);
 	lc_ctype = canonname;
 	check_locale_name(LC_COLLATE, lc_collate, &canonname);
@@ -2172,6 +2188,18 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+
+	/*
+	 * If ICU is selected but no ICU locale has been given, take the
+	 * lc_collate locale and chop off any encoding suffix.  This should give
+	 * the user a configuration that resembles their operating system's locale
+	 * setup.
+	 */
+	if (collation_provider[0] == COLLPROVIDER_ICU && !icu_locale)
+	{
+		icu_locale = pg_strdup(lc_collate);
+		icu_locale[strcspn(icu_locale, ".")] = '\0';
+	}
 }
 
 /*
@@ -2187,9 +2215,12 @@ usage(const char *progname)
 	printf(_("  -A, --auth=METHOD         default authentication method for local connections\n"));
 	printf(_("      --auth-host=METHOD    default authentication method for local TCP/IP connections\n"));
 	printf(_("      --auth-local=METHOD   default authentication method for local-socket connections\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                            set default collation provider for new databases\n"));
 	printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
+	printf(_("      --icu-locale          set ICU locale for new databases\n"));
 	printf(_("  -k, --data-checksums      use data page checksums\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
@@ -2364,7 +2395,8 @@ setup_locale_encoding(void)
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0)
+		strcmp(lc_ctype, lc_messages) == 0 &&
+		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
@@ -2381,9 +2413,13 @@ setup_locale_encoding(void)
 			   lc_monetary,
 			   lc_numeric,
 			   lc_time);
+		if (icu_locale)
+			printf(_("  ICU:      %s\n"), icu_locale);
 	}
 
-	if (!encoding)
+	if (!encoding && collation_provider[0] == COLLPROVIDER_ICU)
+		encodingid = PG_UTF8;
+	else if (!encoding)
 	{
 		int			ctype_enc;
 
@@ -2887,6 +2923,8 @@ main(int argc, char *argv[])
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
 		{"discard-caches", no_argument, NULL, 14},
+		{"collation-provider", required_argument, NULL, 15},
+		{"icu-locale", required_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3033,6 +3071,20 @@ main(int argc, char *argv[])
 										 extra_options,
 										 "-c debug_discard_caches=1");
 				break;
+			case 15:
+				if (strcmp(optarg, "icu") == 0)
+					collation_provider[0] = COLLPROVIDER_ICU;
+				else if (strcmp(optarg, "libc") == 0)
+					collation_provider[0] = COLLPROVIDER_LIBC;
+				else
+				{
+					pg_log_error("unrecognized collation provider: %s", optarg);
+					exit(1);
+				}
+				break;
+			case 16:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 02bc688a3b..ee6803490c 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -11,7 +11,7 @@
 use File::stat qw{lstat};
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 22;
+use Test::More tests => 24;
 
 my $tempdir = PostgreSQL::Test::Utils::tempdir;
 my $xlogdir = "$tempdir/pgxlog";
@@ -92,3 +92,19 @@
 	ok(check_mode_recursive($datadir_group, 0750, 0640),
 		'check PGDATA permissions');
 }
+
+# Collation provider tests
+
+if ($ENV{with_icu} eq 'yes')
+{
+	command_ok(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+			   'collation provider ICU');
+}
+else
+{
+	command_fails(['initdb', '--no-sync', '--collation-provider=icu', "$tempdir/data2"],
+				  'collation provider ICU fails since no ICU support');
+}
+
+command_fails(['initdb', '--no-sync', '--collation-provider=xyz', "$tempdir/dataX"],
+			  'fails for invalid collation provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e3ddf19959..0c162f7d42 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2753,6 +2753,7 @@ dumpDatabase(Archive *fout)
 				i_datname,
 				i_datdba,
 				i_encoding,
+				i_datcollprovider,
 				i_collate,
 				i_ctype,
 				i_frozenxid,
@@ -2768,6 +2769,7 @@ dumpDatabase(Archive *fout)
 	const char *datname,
 			   *dba,
 			   *encoding,
+			   *datcollprovider,
 			   *collate,
 			   *ctype,
 			   *datistemplate,
@@ -2792,6 +2794,10 @@ dumpDatabase(Archive *fout)
 		appendPQExpBuffer(dbQry, "datminmxid, ");
 	else
 		appendPQExpBuffer(dbQry, "0 AS datminmxid, ");
+	if (fout->remoteVersion >= 150000)
+		appendPQExpBuffer(dbQry, "datcollprovider, ");
+	else
+		appendPQExpBuffer(dbQry, "'c' AS datcollprovider, ");
 	appendPQExpBuffer(dbQry,
 					  "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, "
 					  "shobj_description(oid, 'pg_database') AS description "
@@ -2805,6 +2811,7 @@ dumpDatabase(Archive *fout)
 	i_datname = PQfnumber(res, "datname");
 	i_datdba = PQfnumber(res, "datdba");
 	i_encoding = PQfnumber(res, "encoding");
+	i_datcollprovider = PQfnumber(res, "datcollprovider");
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
@@ -2820,6 +2827,7 @@ dumpDatabase(Archive *fout)
 	datname = PQgetvalue(res, 0, i_datname);
 	dba = getRoleName(PQgetvalue(res, 0, i_datdba));
 	encoding = PQgetvalue(res, 0, i_encoding);
+	datcollprovider = PQgetvalue(res, 0, i_datcollprovider);
 	collate = PQgetvalue(res, 0, i_collate);
 	ctype = PQgetvalue(res, 0, i_ctype);
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
@@ -2853,6 +2861,17 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ENCODING = ");
 		appendStringLiteralAH(creaQry, encoding, fout);
 	}
+	if (strlen(datcollprovider) > 0)
+	{
+		appendPQExpBufferStr(creaQry, " COLLATION_PROVIDER = ");
+		if (datcollprovider[0] == 'c')
+			appendPQExpBufferStr(creaQry, "libc");
+		else if (datcollprovider[0] == 'i')
+			appendPQExpBufferStr(creaQry, "icu");
+		else
+			fatal("unrecognized collation provider: %s",
+				  datcollprovider);
+	}
 	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
 	{
 		appendPQExpBufferStr(creaQry, " LOCALE = ");
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 3d218c2ad2..5c5aa78b2c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -349,6 +349,16 @@ check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb)
 	if (!equivalent_locale(LC_CTYPE, olddb->db_ctype, newdb->db_ctype))
 		pg_fatal("lc_ctype values for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
 				 olddb->db_name, olddb->db_ctype, newdb->db_ctype);
+	if (olddb->db_collprovider != newdb->db_collprovider)
+		pg_fatal("collation providers for database \"%s\" do not match:  old \"%c\", new \"%c\"\n",
+				 olddb->db_name, olddb->db_collprovider, newdb->db_collprovider);
+	if ((olddb->db_icucoll == NULL && newdb->db_icucoll != NULL) ||
+		(olddb->db_icucoll != NULL && newdb->db_icucoll == NULL) ||
+		(olddb->db_icucoll != NULL && newdb->db_icucoll != NULL && strcmp(olddb->db_icucoll, newdb->db_icucoll) != 0))
+		pg_fatal("ICU collation values for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
+				 olddb->db_name,
+				 olddb->db_icucoll ? olddb->db_icucoll : "(null)",
+				 newdb->db_icucoll ? newdb->db_icucoll : "(null)");
 }
 
 /*
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 69ef23119f..2a9ca0e389 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -312,11 +312,20 @@ get_db_infos(ClusterInfo *cluster)
 				i_encoding,
 				i_datcollate,
 				i_datctype,
+				i_datcollprovider,
+				i_daticucoll,
 				i_spclocation;
 	char		query[QUERY_ALLOC];
 
 	snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "'c' AS datcollprovider, NULL AS daticucoll, ");
+	else
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "datcollprovider, daticucoll, ");
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
 			 "pg_catalog.pg_tablespace_location(t.oid) AS spclocation "
 			 "FROM pg_catalog.pg_database d "
 			 " LEFT OUTER JOIN pg_catalog.pg_tablespace t "
@@ -331,6 +340,8 @@ get_db_infos(ClusterInfo *cluster)
 	i_encoding = PQfnumber(res, "encoding");
 	i_datcollate = PQfnumber(res, "datcollate");
 	i_datctype = PQfnumber(res, "datctype");
+	i_datcollprovider = PQfnumber(res, "datcollprovider");
+	i_daticucoll = PQfnumber(res, "daticucoll");
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
@@ -343,6 +354,11 @@ get_db_infos(ClusterInfo *cluster)
 		dbinfos[tupnum].db_encoding = atoi(PQgetvalue(res, tupnum, i_encoding));
 		dbinfos[tupnum].db_collate = pg_strdup(PQgetvalue(res, tupnum, i_datcollate));
 		dbinfos[tupnum].db_ctype = pg_strdup(PQgetvalue(res, tupnum, i_datctype));
+		dbinfos[tupnum].db_collprovider = PQgetvalue(res, tupnum, i_datcollprovider)[0];
+		if (PQgetisnull(res, tupnum, i_daticucoll))
+			dbinfos[tupnum].db_icucoll = NULL;
+		else
+			dbinfos[tupnum].db_icucoll = pg_strdup(PQgetvalue(res, tupnum, i_daticucoll));
 		snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
 				 PQgetvalue(res, tupnum, i_spclocation));
 	}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 1db8e3f0fb..5c0f256598 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -163,6 +163,8 @@ typedef struct
 											 * path */
 	char	   *db_collate;
 	char	   *db_ctype;
+	char		db_collprovider;
+	char	   *db_icucoll;
 	int			db_encoding;
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
 } DbInfo;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 346cd92793..b85a7b1794 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -896,6 +896,18 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"),
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "       d.daticucoll as \"%s\",\n"
+						  "       CASE d.datcollprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("ICU Collation"),
+						  gettext_noop("Coll. Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       d.datcollate as \"%s\",\n"
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("ICU Collation"),
+						  gettext_noop("Coll. Provider"));
 	appendPQExpBufferStr(&buf, "       ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose)
@@ -4603,7 +4615,7 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, false, true, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, true, false};
 
 	initPQExpBuffer(&buf);
 
@@ -4617,6 +4629,15 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
 
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collicucoll AS \"%s\"",
+						  gettext_noop("ICU Collation"));
+	else
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collcollate AS \"%s\"",
+						  gettext_noop("ICU Collation"));
+
 	if (pset.sversion >= 100000)
 		appendPQExpBuffer(&buf,
 						  ",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 0ee94d7362..ff05d395e0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2716,7 +2716,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("OWNER", "TEMPLATE", "ENCODING", "TABLESPACE",
 					  "IS_TEMPLATE",
 					  "ALLOW_CONNECTIONS", "CONNECTION LIMIT",
-					  "LC_COLLATE", "LC_CTYPE", "LOCALE", "OID");
+					  "LC_COLLATE", "LC_CTYPE", "LOCALE", "OID", "COLLATION_PROVIDER");
 
 	else if (Matches("CREATE", "DATABASE", MatchAny, "TEMPLATE"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_template_databases);
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index b833109da6..25e7da3d3f 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -53,6 +53,8 @@ clean distclean maintainer-clean:
 	rm -f common.o $(WIN32RES)
 	rm -rf tmp_check
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b0c6805bc9..471cae7cca 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -38,6 +38,7 @@ main(int argc, char *argv[])
 		{"lc-ctype", required_argument, NULL, 2},
 		{"locale", required_argument, NULL, 'l'},
 		{"maintenance-db", required_argument, NULL, 3},
+		{"collation-provider", required_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -61,6 +62,7 @@ main(int argc, char *argv[])
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *locale = NULL;
+	char	   *collation_provider = NULL;
 
 	PQExpBufferData sql;
 
@@ -119,6 +121,9 @@ main(int argc, char *argv[])
 			case 3:
 				maintenance_db = pg_strdup(optarg);
 				break;
+			case 4:
+				collation_provider = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -217,6 +222,8 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(&sql, " LC_CTYPE ");
 		appendStringLiteralConn(&sql, lc_ctype, conn);
 	}
+	if (collation_provider)
+		appendPQExpBuffer(&sql, " COLLATION_PROVIDER %s", collation_provider);
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -267,6 +274,8 @@ help(const char *progname)
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DBNAME] [DESCRIPTION]\n"), progname);
 	printf(_("\nOptions:\n"));
+	printf(_("      --collation-provider={libc|icu}\n"
+			 "                               collation provider for the database's default collation\n"));
 	printf(_("  -D, --tablespace=TABLESPACE  default tablespace for the database\n"));
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -E, --encoding=ENCODING      encoding for the database\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index c54c291b7a..9712714a46 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -6,7 +6,7 @@
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 25;
+use Test::More tests => 28;
 
 program_help_ok('createdb');
 program_version_ok('createdb');
@@ -25,9 +25,27 @@
 	qr/statement: CREATE DATABASE foobar2 ENCODING 'LATIN1'/,
 	'create database with encoding');
 
+if ($ENV{with_icu} eq 'yes')
+{
+	$node->issues_sql_like(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar4' ],
+		qr/statement: CREATE DATABASE foobar4 .* COLLATION_PROVIDER icu/,
+		'create database with ICU');
+}
+else
+{
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--collation-provider=icu', 'foobar4' ],
+		'create database with ICU fails since no ICU support');
+	pass;
+}
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
+$node->command_fails([ 'createdb', '-T', 'template0', '--collation-provider=xyz', 'foobarX' ],
+	'fails for invalid collation provider');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index 4b56825d82..f7470ead49 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -14,8 +14,7 @@
 
 { oid => '100', oid_symbol => 'DEFAULT_COLLATION_OID',
   descr => 'database\'s default collation',
-  collname => 'default', collprovider => 'd', collencoding => '-1',
-  collcollate => '', collctype => '' },
+  collname => 'default', collprovider => 'd', collencoding => '-1' },
 { oid => '950', oid_symbol => 'C_COLLATION_OID',
   descr => 'standard C collation',
   collname => 'C', collprovider => 'c', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index 8763dd4080..590e06d1a8 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -40,8 +40,9 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	bool		collisdeterministic BKI_DEFAULT(t);
 	int32		collencoding;	/* encoding for this collation; -1 = "all" */
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		collcollate BKI_FORCE_NOT_NULL;		/* LC_COLLATE setting */
-	text		collctype BKI_FORCE_NOT_NULL;		/* LC_CTYPE setting */
+	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
+	text		collctype BKI_DEFAULT(_null_);		/* LC_CTYPE setting */
+	text		collicucoll BKI_DEFAULT(_null_);	/* ICU collation string */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -75,6 +76,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							bool collisdeterministic,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
+							const char *collicucoll,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index e7e42d6023..cce12e27bd 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -14,9 +14,9 @@
 
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'default template for new databases',
-  datname => 'template1', encoding => 'ENCODING', datistemplate => 't',
+  datname => 'template1', encoding => 'ENCODING', datcollprovider => 'COLLPROVIDER', datistemplate => 't',
   datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datacl => '_null_' },
+  datctype => 'LC_CTYPE', daticucoll => 'ICUCOLL', datacl => '_null_' },
 
 ]
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 90b43a4ecc..13dc5e1ba9 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -40,6 +40,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* character encoding */
 	int32		encoding;
 
+	/* see pg_collation.collprovider */
+	char		datcollprovider;
+
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
 
@@ -65,6 +68,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* LC_CTYPE setting */
 	text		datctype BKI_FORCE_NOT_NULL;
 
+	/* ICU collation */
+	text		daticucoll;
+
 	/* access permissions */
 	aclitem		datacl[1];
 #endif
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 30e423af0e..11138e02f3 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -103,6 +103,12 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+extern struct pg_locale_struct default_locale;
+
+extern void make_icu_collator(const char *icucollstr,
+							  struct pg_locale_struct *resultp);
+extern void check_collation_version(HeapTuple colltuple);
+
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 70133df804..3d9647b597 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1029,14 +1029,12 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-ERROR:  parameter "lc_ctype" must be specified
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
+ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 9cee3d0042..0677ba56e4 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -366,13 +366,11 @@ CREATE SCHEMA test_schema;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;

base-commit: 87669de72c2249e6aec84b8c27fdc3ffb7284e13
-- 
2.35.1

#45

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Peter Eisentraut (#44)

1 attachment(s)

Re: ICU for global collation

On 02.02.22 14:01, Peter Eisentraut wrote:

Here is the main patch rebased on the various changes that have been
committed in the meantime. There is still some work to be done on the
user interfaces of initdb, createdb, etc.

I have split out the database-level collation version tracking into a
separate patch [0]. I think we should get that one in first and then
refresh this one.

All that preliminary work has been completed, so here is a new patch.

There isn't actually much left here now except all the new DDL and
command-line options to set this up and documentation for those. I have
given all that another review and I hope it's more intuitive now, but I
guess there will be other opinions.

I have changed the terminology a bit to match ICU better. It's now
called "ICU locale ID" and "locale provider" (instead of "collation").
It might actually cover things that are not strictly collations (such as
the isalpha stuff in text search, maybe, in the future).

One thing that is left that bothers me is that the invocations of
get_collation_actual_version() have all gotten quite complicated. I'm
thinking about ways to refactor that, but I haven't got a great idea yet.

Attachments:

v5-0001-Add-option-to-use-ICU-as-global-locale-provider.patchtext/plain; charset=UTF-8; name=v5-0001-Add-option-to-use-ICU-as-global-locale-provider.patchDownload

From 6d265e2cf78546bcc25d03031ea03f397f1c1c1b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Wed, 16 Feb 2022 15:11:04 +0100
Subject: [PATCH v5] Add option to use ICU as global locale provider

This adds the option to use ICU as the default locale provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com

XXX catversion bump
---
 doc/src/sgml/catalogs.sgml                    |   9 +
 doc/src/sgml/ref/create_database.sgml         |  27 +++
 doc/src/sgml/ref/createdb.sgml                |  19 ++
 doc/src/sgml/ref/initdb.sgml                  |  76 ++++++--
 src/backend/catalog/pg_collation.c            |  18 +-
 src/backend/commands/collationcmds.c          |  96 +++++++----
 src/backend/commands/dbcommands.c             | 162 +++++++++++++++---
 src/backend/utils/adt/pg_locale.c             | 144 ++++++++++------
 src/backend/utils/init/postinit.c             |  21 ++-
 src/bin/initdb/Makefile                       |   4 +-
 src/bin/initdb/initdb.c                       | 110 ++++++++++--
 src/bin/initdb/t/001_initdb.pl                |  25 +++
 src/bin/pg_dump/pg_dump.c                     |  31 +++-
 src/bin/pg_upgrade/check.c                    |  13 ++
 src/bin/pg_upgrade/info.c                     |  18 +-
 src/bin/pg_upgrade/pg_upgrade.h               |   2 +
 src/bin/psql/describe.c                       |  23 ++-
 src/bin/psql/tab-complete.c                   |   3 +-
 src/bin/scripts/Makefile                      |   2 +
 src/bin/scripts/createdb.c                    |  20 +++
 src/bin/scripts/t/020_createdb.pl             |  28 +++
 src/include/catalog/pg_collation.dat          |   3 +-
 src/include/catalog/pg_collation.h            |  20 ++-
 src/include/catalog/pg_database.dat           |   4 +-
 src/include/catalog/pg_database.h             |   6 +
 src/include/utils/pg_locale.h                 |   5 +
 .../regress/expected/collate.icu.utf8.out     |  10 +-
 src/test/regress/sql/collate.icu.utf8.sql     |   8 +-
 28 files changed, 741 insertions(+), 166 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5a1627a394..8fde32dfac 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2384,6 +2384,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>colliculocale</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU locale ID for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index f70d0c75b4..544c1cb443 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -28,6 +28,8 @@
            [ LOCALE [=] <replaceable class="parameter">locale</replaceable> ]
            [ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
            [ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
+           [ ICU_LOCALE [=] <replaceable class="parameter">icu_locale</replaceable> ]
+           [ LOCALE_PROVIDER [=] <replaceable class="parameter">locale_provider</replaceable> ]
            [ COLLATION_VERSION = <replaceable>collation_version</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
            [ ALLOW_CONNECTIONS [=] <replaceable class="parameter">allowconn</replaceable> ]
@@ -160,6 +162,31 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><replaceable class="parameter">icu_locale</replaceable></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU locale provider is used.  If
+        this is not specified, the value from the <literal>LOCALE</literal>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable>locale_provider</replaceable></term>
+
+      <listitem>
+       <para>
+        Specifies the provider to use for the default collation in this
+        database.  Possible values are:
+        <literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
+        <literal>libc</literal>.  <literal>libc</literal> is the default.  The
+        available choices depend on the operating system and build options.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><replaceable>collation_version</replaceable></term>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 86473455c9..ebed1df30b 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -147,6 +147,25 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable class="parameter">locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale setting to be used in this database, if the
+        ICU locale provider is selected.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        Specifies the locale provider for the database's default collation.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-O <replaceable class="parameter">owner</replaceable></option></term>
       <term><option>--owner=<replaceable class="parameter">owner</replaceable></option></term>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 8f71c7c962..31f4450755 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -86,30 +86,47 @@ <title>Description</title>
   </para>
 
   <para>
-   <command>initdb</command> initializes the database cluster's default
-   locale and character set encoding. The character set encoding,
-   collation order (<literal>LC_COLLATE</literal>) and character set classes
-   (<literal>LC_CTYPE</literal>, e.g., upper, lower, digit) can be set separately
-   for a database when it is created. <command>initdb</command> determines
-   those settings for the template databases, which will
-   serve as the default for all other databases.
+   <command>initdb</command> initializes the database cluster's default locale
+   and character set encoding. These can also be set separately for each
+   database when it is created. <command>initdb</command> determines those
+   settings for the template databases, which will serve as the default for
+   all other databases.  By default, <command>initdb</command> uses the
+   locale provider <literal>libc</literal>, takes the locale settings from
+   the environment, and determines the encoding from the locale settings.
+   This is almost always sufficient, unless there are special requirements.
   </para>
 
   <para>
-   To alter the default collation order or character set classes, use the
-   <option>--lc-collate</option> and <option>--lc-ctype</option> options.
-   Collation orders other than <literal>C</literal> or <literal>POSIX</literal> also have
-   a performance penalty.  For these reasons it is important to choose the
-   right locale when running <command>initdb</command>.
+   To choose a different locale for the cluster, use the option
+   <option>--locale</option>.  There are also individual options
+   <option>--lc-*</option> (see below) to set values for the individual locale
+   categories.  Note that inconsistent settings for different locale
+   categories can give nonsensical results, so this should be used with care.
   </para>
 
   <para>
-   The remaining locale categories can be changed later when the server
-   is started.  You can also use <option>--locale</option> to set the
-   default for all locale categories, including collation order and
-   character set classes. All server locale values (<literal>lc_*</literal>) can
-   be displayed via <command>SHOW ALL</command>.
-   More details can be found in <xref linkend="locale"/>.
+   Alternatively, the ICU library can be used to provide locale services.
+   (Again, this only sets the default for subsequently created databases.)  To
+   select this option, specify <literal>--locale-provider=icu</literal>.
+   To chose the specific ICU locale ID to apply, use the option
+   <option>--icu-locale</option>.  The ICU locale ID defaults to
+   <option>--locale</option> or the environment, as above (with some name
+   mangling applied to make the locale naming appropriate for ICU).  Note that
+   for implementation reasons and to support legacy code,
+   <command>initdb</command> will still select and initialize libc locale
+   settings when the ICU locale provider is used.
+  </para>
+
+  <para>
+   When <command>initdb</command> runs, it will print out the locale settings
+   it has chosen.  If you have complex requirements or specified multiple
+   options, it is advisable to check that the result matches what was
+   intended.
+  </para>
+
+  <para>
+   More details about locale settings can be found in <xref
+   linkend="locale"/>.
   </para>
 
   <para>
@@ -210,6 +227,17 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU locale provider is used.  If
+        this is not specified, the value from the <option>--locale</option>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
@@ -264,6 +292,18 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        This option sets the locale provider for databases created in the
+        new cluster.  It can be overridden in the <command>CREATE
+        DATABASE</command> command when new databases are subsequently
+        created.  The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-N</option></term>
       <term><option>--no-sync</option></term>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index bfc02d3038..93545786df 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -49,6 +49,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				bool collisdeterministic,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
+				const char *colliculocale,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -66,8 +67,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	AssertArg(collname);
 	AssertArg(collnamespace);
 	AssertArg(collowner);
-	AssertArg(collcollate);
-	AssertArg(collctype);
+	AssertArg((collcollate && collctype) || colliculocale);
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
@@ -161,8 +161,18 @@ CollationCreate(const char *collname, Oid collnamespace,
 	values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
 	values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
 	values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
-	values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
-	values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
+	if (collcollate)
+		values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
+	else
+		nulls[Anum_pg_collation_collcollate - 1] = true;
+	if (collctype)
+		values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
+	else
+		nulls[Anum_pg_collation_collctype - 1] = true;
+	if (colliculocale)
+		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
+	else
+		nulls[Anum_pg_collation_colliculocale - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 12fc2316f9..9777b63f16 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -65,6 +65,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *versionEl = NULL;
 	char	   *collcollate = NULL;
 	char	   *collctype = NULL;
+	char	   *colliculocale = NULL;
 	char	   *collproviderstr = NULL;
 	bool		collisdeterministic = true;
 	int			collencoding = 0;
@@ -153,6 +154,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collctype = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
+		if (!isnull)
+			colliculocale = TextDatumGetCString(datum);
+		else
+			colliculocale = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					 errmsg("collation \"default\" cannot be copied")));
 	}
 
-	if (localeEl)
-	{
-		collcollate = defGetString(localeEl);
-		collctype = defGetString(localeEl);
-	}
-
-	if (lccollateEl)
-		collcollate = defGetString(lccollateEl);
-
-	if (lcctypeEl)
-		collctype = defGetString(lcctypeEl);
-
 	if (providerEl)
 		collproviderstr = defGetString(providerEl);
 
@@ -204,15 +199,43 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	else if (!fromEl)
 		collprovider = COLLPROVIDER_LIBC;
 
-	if (!collcollate)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_collate\" must be specified")));
+	if (localeEl)
+	{
+		if (collprovider == COLLPROVIDER_LIBC)
+		{
+			collcollate = defGetString(localeEl);
+			collctype = defGetString(localeEl);
+		}
+		else
+			colliculocale = defGetString(localeEl);
+	}
 
-	if (!collctype)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_ctype\" must be specified")));
+	if (lccollateEl)
+		collcollate = defGetString(lccollateEl);
+
+	if (lcctypeEl)
+		collctype = defGetString(lcctypeEl);
+
+	if (collprovider == COLLPROVIDER_LIBC)
+	{
+		if (!collcollate)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"lc_collate\" must be specified")));
+
+		if (!collctype)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"lc_ctype\" must be specified")));
+	}
+
+	if (collprovider == COLLPROVIDER_ICU)
+	{
+		if (!colliculocale)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"locale\" must be specified")));
+	}
 
 	/*
 	 * Nondeterministic collations are currently only supported with ICU
@@ -255,7 +278,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collcollate);
+		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -265,6 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collencoding,
 							 collcollate,
 							 collctype,
+							 colliculocale,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -347,9 +371,11 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
-	Assert(!isnull);
-	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+	if (!isnull)
+		newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	else
+		newversion = NULL;
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -409,9 +435,11 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 	collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 
-	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-	Assert(!isnull);
-	version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
+	datum = SysCacheGetAttr(COLLOID, tp, collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+	if (!isnull)
+		version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
+	else
+		version = NULL;
 
 	ReleaseSysCache(tp);
 
@@ -638,7 +666,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			 */
 			collid = CollationCreate(localebuf, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 localebuf, localebuf,
+									 localebuf, localebuf, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 									 true, true);
 			if (OidIsValid(collid))
@@ -699,7 +727,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale,
+									 locale, locale, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -740,7 +768,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
-			const char *collcollate;
+			const char *iculocstr;
 			Oid			collid;
 
 			if (i == -1)
@@ -749,20 +777,20 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 				name = uloc_getAvailable(i);
 
 			langtag = get_icu_language_tag(name);
-			collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
+			iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
 
 			/*
 			 * Be paranoid about not allowing any non-ASCII strings into
 			 * pg_collation
 			 */
-			if (!pg_is_ascii(langtag) || !pg_is_ascii(collcollate))
+			if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
 				continue;
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 collcollate, collcollate,
-									 get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
+									 NULL, NULL, iculocstr,
+									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index c37e3c9a9a..6598a91ed0 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -86,7 +86,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						Oid *dbIdP, Oid *ownerIdP,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
-						Oid *dbTablespace, char **dbCollate, char **dbCtype,
+						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char *dbLocProvider,
 						char **dbCollversion);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
@@ -107,6 +108,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	int			src_encoding = -1;
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
+	char	   *src_iculocale = NULL;
+	char		src_locprovider;
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
 	bool		src_allowconn;
@@ -128,6 +131,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dlocale = NULL;
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
+	DefElem    *diculocale = NULL;
+	DefElem	   *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
@@ -137,6 +142,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	const char *dbtemplate = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
+	char	   *dbiculocale = NULL;
+	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
 	bool		dbistemplate = false;
@@ -194,6 +201,21 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			dctype = defel;
 		}
+		else if (strcmp(defel->defname, "icu_locale") == 0)
+		{
+			if (diculocale)
+				errorConflictingDefElem(defel, pstate);
+			diculocale = defel;
+		}
+		else if (strcmp(defel->defname, "locale_provider") == 0)
+		{
+			if (dlocprovider)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options"),
+						 parser_errposition(pstate, defel->location)));
+			dlocprovider = defel;
+		}
 		else if (strcmp(defel->defname, "is_template") == 0)
 		{
 			if (distemplate)
@@ -257,12 +279,6 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	if (dlocale && (dcollate || dctype))
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("conflicting or redundant options"),
-				 errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.")));
-
 	if (downer && downer->arg)
 		dbowner = defGetString(downer);
 	if (dtemplate && dtemplate->arg)
@@ -304,6 +320,31 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = defGetString(dcollate);
 	if (dctype && dctype->arg)
 		dbctype = defGetString(dctype);
+	if (diculocale && diculocale->arg)
+		dbiculocale = defGetString(diculocale);
+	if (dlocprovider && dlocprovider->arg)
+	{
+		char	   *locproviderstr = defGetString(dlocprovider);
+
+		if (pg_strcasecmp(locproviderstr, "icu") == 0)
+			dblocprovider = COLLPROVIDER_ICU;
+		else if (pg_strcasecmp(locproviderstr, "libc") == 0)
+			dblocprovider = COLLPROVIDER_LIBC;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("unrecognized locale provider: %s",
+							locproviderstr)));
+	}
+	if (diculocale && dblocprovider != COLLPROVIDER_ICU)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				 errmsg("ICU locale cannot be specified unless locale provider is ICU")));
+	if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+	}
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -355,7 +396,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_collversion))
+					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -381,6 +423,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = src_collate;
 	if (dbctype == NULL)
 		dbctype = src_ctype;
+	if (dbiculocale == NULL)
+		dbiculocale = src_iculocale;
+	if (dblocprovider == '\0')
+		dblocprovider = src_locprovider;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -402,6 +448,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
 
+	if (dblocprovider == COLLPROVIDER_ICU)
+	{
+		/*
+		 * This would happen if template0 uses the libc provider but the new
+		 * database uses icu.
+		 */
+		if (!dbiculocale)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("ICU locale must be specified")));
+	}
+
+	if (dblocprovider == COLLPROVIDER_ICU)
+	{
+#ifdef USE_ICU
+		UErrorCode  status;
+
+		status = U_ZERO_ERROR;
+		ucol_open(dbiculocale, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\": %s",
+							dbiculocale, u_errorName(status))));
+#else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build"), \
+				 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif
+	}
+
 	/*
 	 * Check that the new encoding and locale settings match the source
 	 * database.  We insist on this because we simply copy the source data ---
@@ -435,6 +512,25 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("new LC_CTYPE (%s) is incompatible with the LC_CTYPE of the template database (%s)",
 							dbctype, src_ctype),
 					 errhint("Use the same LC_CTYPE as in the template database, or use template0 as template.")));
+
+		if (dblocprovider != src_locprovider)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("new locale provider (%c) does not match locale provider of the template database (%c)",
+							dblocprovider, src_locprovider),
+					 errhint("Use the same locale provider as in the template database, or use template0 as template.")));
+
+		if (dblocprovider == COLLPROVIDER_ICU)
+		{
+			Assert(dbiculocale);
+			Assert(src_iculocale);
+			if (strcmp(dbiculocale, src_iculocale) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
+								dbiculocale, src_iculocale),
+						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+		}
 	}
 
 	/*
@@ -453,7 +549,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		char	   *actual_versionstr;
 
-		actual_versionstr = get_collation_actual_version(COLLPROVIDER_LIBC, dbcollate);
+		actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
 		if (!actual_versionstr)
 			ereport(ERROR,
 					(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
@@ -481,7 +577,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * collation version, which is normally only the case for template0.
 	 */
 	if (dbcollversion == NULL)
-		dbcollversion = get_collation_actual_version(COLLPROVIDER_LIBC, dbcollate);
+		dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
 
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
@@ -620,6 +716,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * block on the unique index, and fail after we commit).
 	 */
 
+	Assert((dblocprovider == COLLPROVIDER_ICU && dbiculocale) ||
+		   (dblocprovider != COLLPROVIDER_ICU && !dbiculocale));
+
 	/* Form tuple */
 	MemSet(new_record, 0, sizeof(new_record));
 	MemSet(new_record_nulls, false, sizeof(new_record_nulls));
@@ -629,6 +728,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbname));
 	new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
 	new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+	new_record[Anum_pg_database_datlocprovider - 1] = CharGetDatum(dblocprovider);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -637,6 +737,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
 	new_record[Anum_pg_database_datcollate - 1] = CStringGetTextDatum(dbcollate);
 	new_record[Anum_pg_database_datctype - 1] = CStringGetTextDatum(dbctype);
+	if (dbiculocale)
+		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
+	else
+		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -907,7 +1011,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1109,7 +1213,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1222,7 +1326,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -1755,9 +1859,11 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
-	Assert(!isnull);
-	newversion = get_collation_actual_version(COLLPROVIDER_LIBC, TextDatumGetCString(datum));
+	datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
+	if (!isnull)
+		newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
+	else
+		newversion = NULL;
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -1943,6 +2049,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 {
 	Oid			dbid = PG_GETARG_OID(0);
 	HeapTuple	tp;
+	char		datlocprovider;
 	Datum		datum;
 	bool		isnull;
 	char	   *version;
@@ -1953,9 +2060,13 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("database with OID %u does not exist", dbid)));
 
-	datum = SysCacheGetAttr(DATABASEOID, tp, Anum_pg_database_datcollate, &isnull);
-	Assert(!isnull);
-	version = get_collation_actual_version(COLLPROVIDER_LIBC, TextDatumGetCString(datum));
+	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
+
+	datum = SysCacheGetAttr(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, &isnull);
+	if (!isnull)
+		version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
+	else
+		version = NULL;
 
 	ReleaseSysCache(tp);
 
@@ -1981,7 +2092,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			Oid *dbIdP, Oid *ownerIdP,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
-			Oid *dbTablespace, char **dbCollate, char **dbCtype,
+			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char *dbLocProvider,
 			char **dbCollversion)
 {
 	bool		result = false;
@@ -2075,6 +2187,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 				if (dbTablespace)
 					*dbTablespace = dbform->dattablespace;
 				/* default locale settings for this database */
+				if (dbLocProvider)
+					*dbLocProvider = dbform->datlocprovider;
 				if (dbCollate)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollate, &isnull);
@@ -2087,6 +2201,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					Assert(!isnull);
 					*dbCtype = TextDatumGetCString(datum);
 				}
+				if (dbIculocale)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticulocale, &isnull);
+					if (isnull)
+						*dbIculocale = NULL;
+					else
+						*dbIculocale = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 871a710967..2ca75330ee 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1288,26 +1288,37 @@ lookup_collation_cache(Oid collation, bool set_flags)
 	{
 		/* Attempt to set the flags */
 		HeapTuple	tp;
-		Datum		datum;
-		bool		isnull;
-		const char *collcollate;
-		const char *collctype;
+		Form_pg_collation collform;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collation));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collation);
+		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-		Assert(!isnull);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
-		Assert(!isnull);
-		collctype = TextDatumGetCString(datum);
-
-		cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-									 (strcmp(collcollate, "POSIX") == 0));
-		cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-								   (strcmp(collctype, "POSIX") == 0));
+		if (collform->collprovider == COLLPROVIDER_LIBC)
+		{
+			Datum		datum;
+			bool		isnull;
+			const char *collcollate;
+			const char *collctype;
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
+										 (strcmp(collcollate, "POSIX") == 0));
+			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
+									   (strcmp(collctype, "POSIX") == 0));
+		}
+		else
+		{
+			cache_entry->collate_is_c = false;
+			cache_entry->ctype_is_c = false;
+		}
 
 		cache_entry->flags_valid = true;
 
@@ -1340,6 +1351,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_COLLATE, NULL);
@@ -1390,6 +1404,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_CTYPE, NULL);
@@ -1418,6 +1435,38 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+struct pg_locale_struct default_locale;
+
+void
+make_icu_collator(const char *iculocstr,
+				  struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+	UCollator  *collator;
+	UErrorCode	status;
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(iculocstr, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open collator for locale \"%s\": %s",
+						iculocstr, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+		icu_set_collation_attributes(collator, iculocstr);
+
+	/* We will leak this string if we get an error below :-( */
+	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
+	resultp->info.icu.ucol = collator;
+#else							/* not USE_ICU */
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build"), \
+			 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif							/* not USE_ICU */
+}
+
 
 /* simple subroutine for reporting errors from newlocale() */
 #ifdef HAVE_LOCALE_T
@@ -1475,7 +1524,12 @@ pg_newlocale_from_collation(Oid collid)
 	Assert(OidIsValid(collid));
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return (pg_locale_t) 0;
+	{
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return &default_locale;
+		else
+			return (pg_locale_t) 0;
+	}
 
 	cache_entry = lookup_collation_cache(collid, false);
 
@@ -1484,8 +1538,6 @@ pg_newlocale_from_collation(Oid collid)
 		/* We haven't computed this yet in this session, so do it */
 		HeapTuple	tp;
 		Form_pg_collation collform;
-		const char *collcollate;
-		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
 		Datum		datum;
@@ -1496,13 +1548,6 @@ pg_newlocale_from_collation(Oid collid)
 			elog(ERROR, "cache lookup failed for collation %u", collid);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-		Assert(!isnull);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
-		Assert(!isnull);
-		collctype = TextDatumGetCString(datum);
-
 		/* We'll fill in the result struct locally before allocating memory */
 		memset(&result, 0, sizeof(result));
 		result.provider = collform->collprovider;
@@ -1511,8 +1556,17 @@ pg_newlocale_from_collation(Oid collid)
 		if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
+			const char *collcollate;
+			const char *collctype pg_attribute_unused();
 			locale_t	loc;
 
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
 			if (strcmp(collcollate, collctype) == 0)
 			{
 				/* Normal case where they're the same */
@@ -1563,36 +1617,12 @@ pg_newlocale_from_collation(Oid collid)
 		}
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
-			UCollator  *collator;
-			UErrorCode	status;
-
-			if (strcmp(collcollate, collctype) != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("collations with different collate and ctype values are not supported by ICU")));
-
-			status = U_ZERO_ERROR;
-			collator = ucol_open(collcollate, &status);
-			if (U_FAILURE(status))
-				ereport(ERROR,
-						(errmsg("could not open collator for locale \"%s\": %s",
-								collcollate, u_errorName(status))));
+			const char *iculocstr;
 
-			if (U_ICU_VERSION_MAJOR_NUM < 54)
-				icu_set_collation_attributes(collator, collcollate);
-
-			/* We will leak this string if we get an error below :-( */
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext,
-														 collcollate);
-			result.info.icu.ucol = collator;
-#else							/* not USE_ICU */
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build"), \
-					 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
-#endif							/* not USE_ICU */
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
+			Assert(!isnull);
+			iculocstr = TextDatumGetCString(datum);
+			make_icu_collator(iculocstr, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
@@ -1604,7 +1634,11 @@ pg_newlocale_from_collation(Oid collid)
 
 			collversionstr = TextDatumGetCString(datum);
 
-			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
+			datum = SysCacheGetAttr(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+
+			actual_versionstr = get_collation_actual_version(collform->collprovider,
+															 TextDatumGetCString(datum));
 			if (!actual_versionstr)
 			{
 				/*
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e2208151e4..37b02106eb 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -318,6 +318,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	bool		isnull;
 	char	   *collate;
 	char	   *ctype;
+	char	   *iculocale;
 
 	/* Fetch our pg_database row normally, via syscache */
 	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
@@ -420,6 +421,24 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	if (dbform->datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
+		Assert(!isnull);
+		iculocale = TextDatumGetCString(datum);
+		make_icu_collator(iculocale, &default_locale);
+	}
+	else
+		iculocale = NULL;
+
+	default_locale.provider = dbform->datlocprovider;
+	/*
+	 * Default locale is currently always deterministic.  Nondeterministic
+	 * locales currently don't support pattern matching, which would break a
+	 * lot of things if applied globally.
+	 */
+	default_locale.deterministic = true;
+
 	/*
 	 * Check collation version.  See similar code in
 	 * pg_newlocale_from_collation().  Note that here we warn instead of error
@@ -434,7 +453,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 		collversionstr = TextDatumGetCString(datum);
 
-		actual_versionstr = get_collation_actual_version(COLLPROVIDER_LIBC, collate);
+		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate);
 		if (!actual_versionstr)
 			ereport(WARNING,
 					(errmsg("database \"%s\" has no actual collation version, but a version was recorded",
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index eba282267a..8dd25e7afc 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -40,7 +40,7 @@ OBJS = \
 all: initdb
 
 initdb: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
-	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) $(ICU_LIBS) -o $@$(X)
 
 # We must pull in localtime.c from src/timezones
 localtime.c: % : $(top_srcdir)/src/timezone/%
@@ -62,6 +62,8 @@ clean distclean maintainer-clean:
 # ensure that changes in datadir propagate into object file
 initdb.o: initdb.c $(top_builddir)/src/Makefile.global
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 97f15971e2..a0fd68e739 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -55,6 +55,10 @@
 #include <signal.h>
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #ifdef HAVE_SHM_OPEN
 #include "sys/mman.h"
 #endif
@@ -132,6 +136,8 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static char locale_provider = COLLPROVIDER_LIBC;
+static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1405,6 +1411,12 @@ bootstrap_template1(void)
 	bki_lines = replace_token(bki_lines, "LC_CTYPE",
 							  escape_quotes_bki(lc_ctype));
 
+	bki_lines = replace_token(bki_lines, "ICU_LOCALE",
+							  locale_provider == COLLPROVIDER_ICU ? escape_quotes_bki(icu_locale) : "_null_");
+
+	sprintf(buf, "%c", locale_provider);
+	bki_lines = replace_token(bki_lines, "LOCALE_PROVIDER", buf);
+
 	/* Also ensure backend isn't confused by this environment var: */
 	unsetenv("PGCLIENTENCODING");
 
@@ -2165,7 +2177,6 @@ setlocales(void)
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
 	 */
-
 	check_locale_name(LC_CTYPE, lc_ctype, &canonname);
 	lc_ctype = canonname;
 	check_locale_name(LC_COLLATE, lc_collate, &canonname);
@@ -2184,6 +2195,50 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+
+	if (locale_provider == COLLPROVIDER_ICU)
+	{
+		if (!icu_locale && locale)
+			icu_locale = locale;
+
+		/*
+		 * If ICU is selected but no ICU locale has been given, take the
+		 * lc_collate locale and chop off any encoding suffix.  This should
+		 * give the user a configuration that resembles their operating
+		 * system's locale setup.
+		 *
+		 * See
+		 * <https://unicode-org.github.io/icu/userguide/locale/#the-locale-concept>
+		 * for the ICU locale ID format.
+		 */
+		if (!icu_locale)
+		{
+			icu_locale = pg_strdup(lc_collate);
+			icu_locale[strcspn(icu_locale, ".")] = '\0';
+		}
+
+		/*
+		 * Check ICU locale name
+		 */
+#ifdef USE_ICU
+		{
+			UErrorCode	status;
+
+			status = U_ZERO_ERROR;
+			ucol_open(icu_locale, &status);
+			if (U_FAILURE(status))
+			{
+				pg_log_error("could not open collator for locale \"%s\": %s",
+							 icu_locale, u_errorName(status));
+				exit(1);
+			}
+		}
+#else
+		pg_log_error("ICU is not supported in this build");
+		fprintf(stderr, _("You need to rebuild PostgreSQL using %s.\n"), "--with-icu");
+		exit(1);
+#endif
+	}
 }
 
 /*
@@ -2202,6 +2257,7 @@ usage(const char *progname)
 	printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
+	printf(_("      --icu-locale          set ICU locale for new databases\n"));
 	printf(_("  -k, --data-checksums      use data page checksums\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
@@ -2209,6 +2265,8 @@ usage(const char *progname)
 			 "                            set default locale in the respective category for\n"
 			 "                            new databases (default taken from environment)\n"));
 	printf(_("      --no-locale           equivalent to --locale=C\n"));
+	printf(_("      --locale-provider={libc|icu}\n"
+			 "                            set default locale provider for new databases\n"));
 	printf(_("      --pwfile=FILE         read password for the new superuser from file\n"));
 	printf(_("  -T, --text-search-config=CFG\n"
 			 "                            default text search configuration\n"));
@@ -2372,21 +2430,26 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (strcmp(lc_ctype, lc_collate) == 0 &&
+	if (locale_provider == COLLPROVIDER_LIBC &&
+		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0)
+		strcmp(lc_ctype, lc_messages) == 0 &&
+		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
-		printf(_("The database cluster will be initialized with locales\n"
-				 "  COLLATE:  %s\n"
-				 "  CTYPE:    %s\n"
-				 "  MESSAGES: %s\n"
-				 "  MONETARY: %s\n"
-				 "  NUMERIC:  %s\n"
-				 "  TIME:     %s\n"),
+		printf(_("The database cluster will be initialized with this locale configuration:\n"));
+		printf(_("  provider:    %s\n"), collprovider_name(locale_provider));
+		if (icu_locale)
+			printf(_("  ICU locale:  %s\n"), icu_locale);
+		printf(_("  LC_COLLATE:  %s\n"
+				 "  LC_CTYPE:    %s\n"
+				 "  LC_MESSAGES: %s\n"
+				 "  LC_MONETARY: %s\n"
+				 "  LC_NUMERIC:  %s\n"
+				 "  LC_TIME:     %s\n"),
 			   lc_collate,
 			   lc_ctype,
 			   lc_messages,
@@ -2395,7 +2458,9 @@ setup_locale_encoding(void)
 			   lc_time);
 	}
 
-	if (!encoding)
+	if (!encoding && locale_provider == COLLPROVIDER_ICU)
+		encodingid = PG_UTF8;
+	else if (!encoding)
 	{
 		int			ctype_enc;
 
@@ -2899,6 +2964,8 @@ main(int argc, char *argv[])
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
 		{"discard-caches", no_argument, NULL, 14},
+		{"locale-provider", required_argument, NULL, 15},
+		{"icu-locale", required_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3045,6 +3112,20 @@ main(int argc, char *argv[])
 										 extra_options,
 										 "-c debug_discard_caches=1");
 				break;
+			case 15:
+				if (strcmp(optarg, "icu") == 0)
+					locale_provider = COLLPROVIDER_ICU;
+				else if (strcmp(optarg, "libc") == 0)
+					locale_provider = COLLPROVIDER_LIBC;
+				else
+				{
+					pg_log_error("unrecognized locale provider: %s", optarg);
+					exit(1);
+				}
+				break;
+			case 16:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -3073,6 +3154,13 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (icu_locale && locale_provider != COLLPROVIDER_ICU)
+	{
+		pg_log_error("%s cannot be specified unless locale provider \"%s\" is chosen",
+					 "--icu-locale", "icu");
+		exit(1);
+	}
+
 	atexit(cleanup_directories_atexit);
 
 	/* If we only need to fsync, just do it and exit */
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7dc8cdd855..b7ecff17d8 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -93,4 +93,29 @@
 		'check PGDATA permissions');
 }
 
+# Locale provider tests
+
+if ($ENV{with_icu} eq 'yes')
+{
+	command_ok(['initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2"],
+			   'locale provider ICU');
+
+	command_ok(['initdb', '--no-sync', '--locale-provider=icu', '--icu-locale=en', "$tempdir/data3"],
+		'option --icu-locale');
+
+	command_fails(['initdb', '--no-sync', '--locale-provider=icu', '--icu-locale=@colNumeric=lower', "$tempdir/dataX"],
+		'fails for invalid ICU locale');
+}
+else
+{
+	command_fails(['initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2"],
+				  'locale provider ICU fails since no ICU support');
+}
+
+command_fails(['initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX"],
+			  'fails for invalid locale provider');
+
+command_fails(['initdb', '--no-sync', '--locale-provider=libc', '--icu-locale=en', "$tempdir/dataX"],
+			  'fails for invalid option combination');
+
 done_testing();
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4485ea83b1..a3d53c25c9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2753,8 +2753,10 @@ dumpDatabase(Archive *fout)
 				i_datname,
 				i_datdba,
 				i_encoding,
+				i_datlocprovider,
 				i_collate,
 				i_ctype,
+				i_daticulocale,
 				i_frozenxid,
 				i_minmxid,
 				i_datacl,
@@ -2769,8 +2771,10 @@ dumpDatabase(Archive *fout)
 	const char *datname,
 			   *dba,
 			   *encoding,
+			   *datlocprovider,
 			   *collate,
 			   *ctype,
+			   *iculocale,
 			   *datistemplate,
 			   *datconnlimit,
 			   *tablespace;
@@ -2794,9 +2798,9 @@ dumpDatabase(Archive *fout)
 	else
 		appendPQExpBuffer(dbQry, "0 AS datminmxid, ");
 	if (fout->remoteVersion >= 150000)
-		appendPQExpBuffer(dbQry, "datcollversion, ");
+		appendPQExpBuffer(dbQry, "datlocprovider, daticulocale, datcollversion, ");
 	else
-		appendPQExpBuffer(dbQry, "NULL AS datcollversion, ");
+		appendPQExpBuffer(dbQry, "'c' AS datlocprovider, NULL AS daticulocale, NULL AS datcollversion, ");
 	appendPQExpBuffer(dbQry,
 					  "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, "
 					  "shobj_description(oid, 'pg_database') AS description "
@@ -2810,8 +2814,10 @@ dumpDatabase(Archive *fout)
 	i_datname = PQfnumber(res, "datname");
 	i_datdba = PQfnumber(res, "datdba");
 	i_encoding = PQfnumber(res, "encoding");
+	i_datlocprovider = PQfnumber(res, "datlocprovider");
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
+	i_daticulocale = PQfnumber(res, "daticulocale");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
 	i_minmxid = PQfnumber(res, "datminmxid");
 	i_datacl = PQfnumber(res, "datacl");
@@ -2826,8 +2832,13 @@ dumpDatabase(Archive *fout)
 	datname = PQgetvalue(res, 0, i_datname);
 	dba = getRoleName(PQgetvalue(res, 0, i_datdba));
 	encoding = PQgetvalue(res, 0, i_encoding);
+	datlocprovider = PQgetvalue(res, 0, i_datlocprovider);
 	collate = PQgetvalue(res, 0, i_collate);
 	ctype = PQgetvalue(res, 0, i_ctype);
+	if (!PQgetisnull(res, 0, i_daticulocale))
+		iculocale = PQgetvalue(res, 0, i_daticulocale);
+	else
+		iculocale = NULL;
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
 	minmxid = atooid(PQgetvalue(res, 0, i_minmxid));
 	dbdacl.acl = PQgetvalue(res, 0, i_datacl);
@@ -2859,6 +2870,17 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ENCODING = ");
 		appendStringLiteralAH(creaQry, encoding, fout);
 	}
+	if (strlen(datlocprovider) > 0)
+	{
+		appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
+		if (datlocprovider[0] == 'c')
+			appendPQExpBufferStr(creaQry, "libc");
+		else if (datlocprovider[0] == 'i')
+			appendPQExpBufferStr(creaQry, "icu");
+		else
+			fatal("unrecognized locale provider: %s",
+				  datlocprovider);
+	}
 	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
 	{
 		appendPQExpBufferStr(creaQry, " LOCALE = ");
@@ -2877,6 +2899,11 @@ dumpDatabase(Archive *fout)
 			appendStringLiteralAH(creaQry, ctype, fout);
 		}
 	}
+	if (iculocale)
+	{
+		appendPQExpBufferStr(creaQry, " ICU_LOCALE = ");
+		appendStringLiteralAH(creaQry, iculocale, fout);
+	}
 
 	/*
 	 * For binary upgrade, carry over the collation version.  For normal
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 019bcb6c7b..cf3b398d9e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
 #include "pg_upgrade.h"
@@ -349,6 +350,18 @@ check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb)
 	if (!equivalent_locale(LC_CTYPE, olddb->db_ctype, newdb->db_ctype))
 		pg_fatal("lc_ctype values for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
 				 olddb->db_name, olddb->db_ctype, newdb->db_ctype);
+	if (olddb->db_collprovider != newdb->db_collprovider)
+		pg_fatal("locale providers for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
+				 olddb->db_name,
+				 collprovider_name(olddb->db_collprovider),
+				 collprovider_name(newdb->db_collprovider));
+	if ((olddb->db_iculocale == NULL && newdb->db_iculocale != NULL) ||
+		(olddb->db_iculocale != NULL && newdb->db_iculocale == NULL) ||
+		(olddb->db_iculocale != NULL && newdb->db_iculocale != NULL && strcmp(olddb->db_iculocale, newdb->db_iculocale) != 0))
+		pg_fatal("ICU locale values for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
+				 olddb->db_name,
+				 olddb->db_iculocale ? olddb->db_iculocale : "(null)",
+				 newdb->db_iculocale ? newdb->db_iculocale : "(null)");
 }
 
 /*
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 69ef23119f..5c3968e0ea 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -312,11 +312,20 @@ get_db_infos(ClusterInfo *cluster)
 				i_encoding,
 				i_datcollate,
 				i_datctype,
+				i_datlocprovider,
+				i_daticulocale,
 				i_spclocation;
 	char		query[QUERY_ALLOC];
 
 	snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "'c' AS datlocprovider, NULL AS daticulocale, ");
+	else
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "datlocprovider, daticulocale, ");
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
 			 "pg_catalog.pg_tablespace_location(t.oid) AS spclocation "
 			 "FROM pg_catalog.pg_database d "
 			 " LEFT OUTER JOIN pg_catalog.pg_tablespace t "
@@ -331,6 +340,8 @@ get_db_infos(ClusterInfo *cluster)
 	i_encoding = PQfnumber(res, "encoding");
 	i_datcollate = PQfnumber(res, "datcollate");
 	i_datctype = PQfnumber(res, "datctype");
+	i_datlocprovider = PQfnumber(res, "datlocprovider");
+	i_daticulocale = PQfnumber(res, "daticulocale");
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
@@ -343,6 +354,11 @@ get_db_infos(ClusterInfo *cluster)
 		dbinfos[tupnum].db_encoding = atoi(PQgetvalue(res, tupnum, i_encoding));
 		dbinfos[tupnum].db_collate = pg_strdup(PQgetvalue(res, tupnum, i_datcollate));
 		dbinfos[tupnum].db_ctype = pg_strdup(PQgetvalue(res, tupnum, i_datctype));
+		dbinfos[tupnum].db_collprovider = PQgetvalue(res, tupnum, i_datlocprovider)[0];
+		if (PQgetisnull(res, tupnum, i_daticulocale))
+			dbinfos[tupnum].db_iculocale = NULL;
+		else
+			dbinfos[tupnum].db_iculocale = pg_strdup(PQgetvalue(res, tupnum, i_daticulocale));
 		snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
 				 PQgetvalue(res, tupnum, i_spclocation));
 	}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 0aca0a77aa..93c31fbc21 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -171,6 +171,8 @@ typedef struct
 											 * path */
 	char	   *db_collate;
 	char	   *db_ctype;
+	char		db_collprovider;
+	char	   *db_iculocale;
 	int			db_encoding;
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
 } DbInfo;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 654ef2d7c3..20d654e9fc 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -896,6 +896,18 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"),
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "       d.daticulocale as \"%s\",\n"
+						  "       CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("ICU Locale"),
+						  gettext_noop("Locale Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       d.datcollate as \"%s\",\n"
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("ICU Locale"),
+						  gettext_noop("Locale Provider"));
 	appendPQExpBufferStr(&buf, "       ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose)
@@ -4614,7 +4626,7 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, false, true, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, true, false};
 
 	initPQExpBuffer(&buf);
 
@@ -4628,6 +4640,15 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
 
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  ",\n       c.colliculocale AS \"%s\"",
+						  gettext_noop("ICU Locale"));
+	else
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collcollate AS \"%s\"",
+						  gettext_noop("ICU Locale"));
+
 	if (pset.sversion >= 100000)
 		appendPQExpBuffer(&buf,
 						  ",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 010edb685f..add13f7de8 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2724,7 +2724,8 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("OWNER", "TEMPLATE", "ENCODING", "TABLESPACE",
 					  "IS_TEMPLATE",
 					  "ALLOW_CONNECTIONS", "CONNECTION LIMIT",
-					  "LC_COLLATE", "LC_CTYPE", "LOCALE", "OID");
+					  "LC_COLLATE", "LC_CTYPE", "LOCALE", "OID",
+					  "LOCALE_PROVIDER", "ICU_LOCALE");
 
 	else if (Matches("CREATE", "DATABASE", MatchAny, "TEMPLATE"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_template_databases);
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index b833109da6..25e7da3d3f 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -53,6 +53,8 @@ clean distclean maintainer-clean:
 	rm -f common.o $(WIN32RES)
 	rm -rf tmp_check
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b0c6805bc9..6f612abf7c 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -38,6 +38,8 @@ main(int argc, char *argv[])
 		{"lc-ctype", required_argument, NULL, 2},
 		{"locale", required_argument, NULL, 'l'},
 		{"maintenance-db", required_argument, NULL, 3},
+		{"locale-provider", required_argument, NULL, 4},
+		{"icu-locale", required_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -61,6 +63,8 @@ main(int argc, char *argv[])
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *locale = NULL;
+	char	   *locale_provider = NULL;
+	char	   *icu_locale = NULL;
 
 	PQExpBufferData sql;
 
@@ -119,6 +123,12 @@ main(int argc, char *argv[])
 			case 3:
 				maintenance_db = pg_strdup(optarg);
 				break;
+			case 4:
+				locale_provider = pg_strdup(optarg);
+				break;
+			case 5:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -217,6 +227,13 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(&sql, " LC_CTYPE ");
 		appendStringLiteralConn(&sql, lc_ctype, conn);
 	}
+	if (locale_provider)
+		appendPQExpBuffer(&sql, " LOCALE_PROVIDER %s", locale_provider);
+	if (icu_locale)
+	{
+		appendPQExpBufferStr(&sql, " ICU_LOCALE ");
+		appendStringLiteralConn(&sql, icu_locale, conn);
+	}
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -273,6 +290,9 @@ help(const char *progname)
 	printf(_("  -l, --locale=LOCALE          locale settings for the database\n"));
 	printf(_("      --lc-collate=LOCALE      LC_COLLATE setting for the database\n"));
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
+	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
+	printf(_("      --locale-provider={libc|icu}\n"
+			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
 	printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
 	printf(_("  -V, --version                output version information, then exit\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 639245466e..35deec9a92 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -25,9 +25,37 @@
 	qr/statement: CREATE DATABASE foobar2 ENCODING 'LATIN1'/,
 	'create database with encoding');
 
+if ($ENV{with_icu} eq 'yes')
+{
+	# This fails because template0 uses libc provider and has no ICU
+	# locale set.  It would succeed if template0 used the icu
+	# provider.  XXX Maybe split into multiple tests?
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		'create database with ICU fails without ICU locale specified');
+
+	$node->issues_sql_like(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=en', 'foobar5' ],
+		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
+		'create database with ICU locale specified');
+
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=@colNumeric=lower', 'foobarX' ],
+		'fails for invalid ICU locale');
+}
+else
+{
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		'create database with ICU fails since no ICU support');
+}
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
+$node->command_fails([ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	'fails for invalid locale provider');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index 4b56825d82..f7470ead49 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -14,8 +14,7 @@
 
 { oid => '100', oid_symbol => 'DEFAULT_COLLATION_OID',
   descr => 'database\'s default collation',
-  collname => 'default', collprovider => 'd', collencoding => '-1',
-  collcollate => '', collctype => '' },
+  collname => 'default', collprovider => 'd', collencoding => '-1' },
 { oid => '950', oid_symbol => 'C_COLLATION_OID',
   descr => 'standard C collation',
   collname => 'C', collprovider => 'c', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index 8763dd4080..c642c3bb95 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -40,8 +40,9 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	bool		collisdeterministic BKI_DEFAULT(t);
 	int32		collencoding;	/* encoding for this collation; -1 = "all" */
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		collcollate BKI_FORCE_NOT_NULL;		/* LC_COLLATE setting */
-	text		collctype BKI_FORCE_NOT_NULL;		/* LC_CTYPE setting */
+	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
+	text		collctype BKI_DEFAULT(_null_);		/* LC_CTYPE setting */
+	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -66,6 +67,20 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on
 #define COLLPROVIDER_ICU		'i'
 #define COLLPROVIDER_LIBC		'c'
 
+static inline const char *
+collprovider_name(char c)
+{
+	switch (c)
+	{
+		case COLLPROVIDER_ICU:
+			return "icu";
+		case COLLPROVIDER_LIBC:
+			return "libc";
+		default:
+			return "???";
+	}
+}
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 
@@ -75,6 +90,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							bool collisdeterministic,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
+							const char *colliculocale,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index e7e42d6023..5feedff7bf 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -14,9 +14,9 @@
 
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'default template for new databases',
-  datname => 'template1', encoding => 'ENCODING', datistemplate => 't',
+  datname => 'template1', encoding => 'ENCODING', datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
   datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datacl => '_null_' },
+  datctype => 'LC_CTYPE', daticulocale => 'ICU_LOCALE', datacl => '_null_' },
 
 ]
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 76adbd4aad..a9f4a8071f 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -40,6 +40,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* character encoding */
 	int32		encoding;
 
+	/* locale provider, see pg_collation.collprovider */
+	char		datlocprovider;
+
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
 
@@ -65,6 +68,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* LC_CTYPE setting */
 	text		datctype BKI_FORCE_NOT_NULL;
 
+	/* ICU locale ID */
+	text		daticulocale;
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 30e423af0e..9b158f24a0 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -103,6 +103,11 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+extern struct pg_locale_struct default_locale;
+
+extern void make_icu_collator(const char *iculocstr,
+							  struct pg_locale_struct *resultp);
+
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 9699ca16cf..d4c8c6de38 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1029,14 +1029,12 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-ERROR:  parameter "lc_ctype" must be specified
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
+ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 242a7ce6b7..b0ddc7db44 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -366,13 +366,11 @@ CREATE SCHEMA test_schema;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;

base-commit: 27d195a57849fbcfb2ef455d40e1901b5002b505
-- 
2.35.1

#46

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#45)

Re: ICU for global collation

Hi,

On Wed, Feb 16, 2022 at 03:25:40PM +0100, Peter Eisentraut wrote:

All that preliminary work has been completed, so here is a new patch.

There isn't actually much left here now except all the new DDL and
command-line options to set this up and documentation for those. I have
given all that another review and I hope it's more intuitive now, but I
guess there will be other opinions.

Sorry it took me a bit of time to come back to this patch.

TL;DR it works as expected. I just have a few comments, mostly on the doc.

I say it works because I did manually check, as far as I can see there isn't
any test that ensures it.

I'm using this naive scenario:

DROP DATABASE IF EXISTS dbicu;
CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'en_US' ICU_LOCALE 'en-u-kf-upper' template 'template0';
\c dbicu
CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
CREATE TABLE icu(def text, en text COLLATE "en_US", upfirst text COLLATE upperfirst);
INSERT INTO icu VALUES ('a', 'a', 'a'), ('b','b','b'), ('A','A','A'), ('B','B','B');
SELECT def AS def FROM icu ORDER BY def;
SELECT def AS en FROM icu ORDER BY en;
SELECT def AS upfirst FROM icu ORDER BY upfirst;
SELECT def AS upfirst_explicit FROM icu ORDER BY en COLLATE upperfirst;
SELECT def AS en_x_explicit FROM icu ORDER BY def COLLATE "en-x-icu";

Maybe there should be some test along those lines included in the patch?

I have changed the terminology a bit to match ICU better. It's now called
"ICU locale ID" and "locale provider" (instead of "collation"). It might
actually cover things that are not strictly collations (such as the isalpha
stuff in text search, maybe, in the future).

I'm not sure that's actually such an improvement as-is. Simply saying "ICU
locale ID" is, at least for me, somewhat ambiguous as I don't see any place in
our docs where it's clearly defined. I'm afraid that users might confuse it
with the OID of a pg_collation line, or even a collation name (like en-x-icu),
especially since there's no simple way to know if what you used means what you
thought it meant.

Also, it's not even used consistently in the patch. I can see e.g. "ICU
locale" or "ICU locale setting" being used:

+     <varlistentry>
+      <term><replaceable class="parameter">icu_locale</replaceable></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale if the ICU locale provider is used.  If
+        this is not specified, the value from the <literal>LOCALE</literal>
+        option is used.
+       </para>
+      </listitem>
+     </varlistentry>

+      <term><option>--icu-locale=<replaceable class="parameter">locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale setting to be used in this database, if the
+        ICU locale provider is selected.
+       </para>

Maybe we could point to the ICU documentation for a clear notion of what a
locale ID is, and/or use their own short definition [1]https://unicode-org.github.io/icu/userguide/locale/#:~:text=The%20locale%20object%20in%20ICU,fields%20separated%20by%20an%20underscore.:

The locale object in ICU is an identifier that specifies a particular locale
and has fields for language, country, and an optional code to specify further
variants or subdivisions. These fields also can be represented as a string
with the fields separated by an underscore.

It seems critical to be crystal clear about what should be specified there.

I spent some time looking at the ICU api trying to figure out if using a
posix locale name (e.g. en_US) was actually compatible with an ICU locale name.
It seems that ICU accept any of 'en-us', 'en-US', 'en_us' or 'en_US' as the
same locale, but I might be wrong. I also didn't find a way to figure out how
to ask ICU if the locale identifier passed is complete garbage or not. One
sure thing is that the system collation we import are of the form 'en-us', so
it seems weird to have this form in pg_collation and by default another form in
pg_database.

All in all I'm a bit worried of having the icu_locale optional. Note that this
is inconsistent with createdb(), as there's at least one code path where the
icu_locale is not optional:

+   if (dblocprovider == COLLPROVIDER_ICU)
+   {
+       /*
+        * This would happen if template0 uses the libc provider but the new
+        * database uses icu.
+        */
+       if (!dbiculocale)
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("ICU locale must be specified")));
+   }

One thing that is left that bothers me is that the invocations of
get_collation_actual_version() have all gotten quite complicated. I'm
thinking about ways to refactor that, but I haven't got a great idea yet.

Indeed, and I don't have a great idea either. Maybe
get_collation_actual_version_extended that would accept both strings?

In CREATE DATABASE manual:

+        Specifies the provider to use for the default collation in this
+        database.  Possible values are:
+        <literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
+        <literal>libc</literal>.  <literal>libc</literal> is the default.  The
+        available choices depend on the operating system and build options.

That's actually not true as pg_strcasecmp is used in createdb():

+       if (pg_strcasecmp(locproviderstr, "icu") == 0)
+           dblocprovider = COLLPROVIDER_ICU;
+       else if (pg_strcasecmp(locproviderstr, "libc") == 0)
+           dblocprovider = COLLPROVIDER_LIBC;
+       else
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                    errmsg("unrecognized locale provider: %s",
+                           locproviderstr)));

By extension that's the same for createdb, but createdb manual doesn't
explicitly mention that case is important so I guess that's ok.

-   To alter the default collation order or character set classes, use the
-   <option>--lc-collate</option> and <option>--lc-ctype</option> options.
-   Collation orders other than <literal>C</literal> or <literal>POSIX</literal> also have
-   a performance penalty.  For these reasons it is important to choose the
-   right locale when running <command>initdb</command>.
+   To choose a different locale for the cluster, use the option
+   <option>--locale</option>.  There are also individual options
+   <option>--lc-*</option> (see below) to set values for the individual locale
+   categories.  Note that inconsistent settings for different locale
+   categories can give nonsensical results, so this should be used with care.

Unless I'm missing something you entirely removed the warninng about the
performance penalty when using non C/POSIX locale?

+   To chose the specific ICU locale ID to apply, use the option
+   <option>--icu-locale</option>.  The ICU locale ID defaults to
+   <option>--locale</option> or the environment, as above (with some name
+   mangling applied to make the locale naming appropriate for ICU).  Note that
+   for implementation reasons and to support legacy code,
+   <command>initdb</command> will still select and initialize libc locale
+   settings when the ICU locale provider is used.

initdb has some specific processing to transform the default libc locale to
something more appropriate, but as far as I can see creatdb / CREATE DATABASE
aren't doing that. It seems inconsistent, and IMHO another reason why
defaulting to the libc locale looks like a bad idea.

While on that topic, the doc should probably mention that default ICU
collations can only be deterministic.

@@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
errmsg("collation \"default\" cannot be copied")));
}

- if (localeEl)
- {
- collcollate = defGetString(localeEl);
- collctype = defGetString(localeEl);
- }
[...]

I tried to read the function and quickly got confused about whether possible
problematic conditions could be reached or not and had protection or not. I
think that DefineCollation is becoming more and more unreadable, with a mix of
fromEl, !fromEl and either. Maybe it would be a good time to refactor the code
a bit and have have something like

if (fromEl)
{
// initialize all specific things
}
else
{
// initialize all specific things
}

// handle defaults and sanity checks

What do you think?

Also unrelated, but how about raising a warning about possibly hiding
corruption if you use CREATE COLLATION ... VERSION or CREATE DATABASE ...
COLLATION_VERSION if !IsBinaryUpgrade()?

in AlterCollation(), pg_collation_actual_version(), AlterDatabaseRefreshColl()
and pg_database_collation_actual_version():

-   datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
-   Assert(!isnull);
-   newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+   if (!isnull)
+       newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   else
+       newversion = NULL;

The columns are now nullable, but can you actually end up with a null locale
name in the expected field without manual DML on the catalog, corruption or
similar? I think it should be a plain error explaining the inconsistency
rather then silently assuming there's no version. Note that at least
pg_newlocale_from_collation() asserts that the specific libc/icu field it's
interested in isn't null.

+       else if (strcmp(defel->defname, "icu_locale") == 0)
+       {
+           if (diculocale)
+               errorConflictingDefElem(defel, pstate);
+           diculocale = defel;
+       }
+       else if (strcmp(defel->defname, "locale_provider") == 0)
+       {
+           if (dlocprovider)
+               ereport(ERROR,
+                       (errcode(ERRCODE_SYNTAX_ERROR),
+                        errmsg("conflicting or redundant options"),
+                        parser_errposition(pstate, defel->location)));
+           dlocprovider = defel;
+       }

Why not using errorConflictingDefElem for locale_provider?

in createdb():

+   if (dblocprovider == COLLPROVIDER_ICU)
+   {
+       /*
+        * This would happen if template0 uses the libc provider but the new
+        * database uses icu.
+        */
+       if (!dbiculocale)
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("ICU locale must be specified")));
+   }
+
+   if (dblocprovider == COLLPROVIDER_ICU)
+   {
+#ifdef USE_ICU
[...]

Seems like a refactoring leftover, no need for two blocks. Also, I think it
would be better to first error out if there's no support for ICU rather than
complaining about a possibly missing locale that wouldn't be accepted anyway.

+void
+make_icu_collator(const char *iculocstr,
+                 struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+[...]
+   /* We will leak this string if we get an error below :-( */
+   resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
+   resultp->info.icu.ucol = collator;
+#else                          /* not USE_ICU */
+   /* could get here if a collation was created by a build with ICU */
+   ereport(ERROR,
+           (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+            errmsg("ICU is not supported in this build"), \
+            errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif                         /* not USE_ICU */
+}

The comment about the leak possibility needs some tweaking since it's been
extracted from a larger function.

Not material for this patch, but:
@@ -1475,7 +1524,12 @@ pg_newlocale_from_collation(Oid collid)
Assert(OidIsValid(collid));

    if (collid == DEFAULT_COLLATION_OID)
-       return (pg_locale_t) 0;
+   {
+       if (default_locale.provider == COLLPROVIDER_ICU)
+           return &default_locale;
+       else
+           return (pg_locale_t) 0;
+   }

I'm wondering if we could now always return &default_locale and avoid having to
check if the function returned something in all callers, since CheckMyDatabase
now initialize it.

@@ -2184,6 +2195,50 @@ setlocales(void)
    check_locale_name(LC_CTYPE, lc_messages, &canonname);
    lc_messages = canonname;
 #endif
+
+   if (locale_provider == COLLPROVIDER_ICU)
+   {
+       if (!icu_locale && locale)
+           icu_locale = locale;
+
+       /*
+        * If ICU is selected but no ICU locale has been given, take the
[...]
+       /*
+        * Check ICU locale name
+        */
+#ifdef USE_ICU
+       {
+           UErrorCode  status;
+
+           status = U_ZERO_ERROR;
+           ucol_open(icu_locale, &status);
+           if (U_FAILURE(status))
+           {
+               pg_log_error("could not open collator for locale \"%s\": %s",
+                            icu_locale, u_errorName(status));
+               exit(1);
+           }
+       }
+#else
+       pg_log_error("ICU is not supported in this build");
+       fprintf(stderr, _("You need to rebuild PostgreSQL using %s.\n"), "--with-icu");
+       exit(1);
+#endif
+   }

Shouldn't all the code before checking the locale name also be in the #ifdef
USE_ICU? Also, the comment should be consistent with the doc and mention
ICU locale ID.

@@ -2859,6 +2870,17 @@ dumpDatabase(Archive *fout)
        appendPQExpBufferStr(creaQry, " ENCODING = ");
        appendStringLiteralAH(creaQry, encoding, fout);
    }
+   if (strlen(datlocprovider) > 0)
+   {
+       appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
+       if (datlocprovider[0] == 'c')
+           appendPQExpBufferStr(creaQry, "libc");
+       else if (datlocprovider[0] == 'i')
+           appendPQExpBufferStr(creaQry, "icu");
+       else
+           fatal("unrecognized locale provider: %s",
+                 datlocprovider);
+   }
    if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
    {

AFAICS datlocprovider shouldn't be empty. Maybe raise an error or an assert if
that's the case?

# CREATE DATABASE db1 LOCALE_PROVIDER icu ICU_LOCALE "fr-x-icu";
ERROR: 22023: new locale provider (i) does not match locale provider of the template database (c)
HINT: Use the same locale provider as in the template database, or use template0 as template.

It feels strange to write "LOCALE_PROVIDER icu" and get "provider (i)" in the
error message. I think it would be better to emit the user-facing value, not
internal one.

Finally, there are some changes in pg_collation, I'm assuming it's done for
consistency since pg_database may need two different locale information, but it
should probably be at least mentioned in the commit message.

[1]: https://unicode-org.github.io/icu/userguide/locale/#:~:text=The%20locale%20object%20in%20ICU,fields%20separated%20by%20an%20underscore.

#47

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Julien Rouhaud (#46)

1 attachment(s)

Re: ICU for global collation

On 05.03.22 09:38, Julien Rouhaud wrote:

@@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
errmsg("collation \"default\" cannot be copied")));
}

- if (localeEl)
- {
- collcollate = defGetString(localeEl);
- collctype = defGetString(localeEl);
- }
[...]

I tried to read the function and quickly got confused about whether possible
problematic conditions could be reached or not and had protection or not. I
think that DefineCollation is becoming more and more unreadable, with a mix of
fromEl, !fromEl and either. Maybe it would be a good time to refactor the code
a bit and have have something like

if (fromEl)
{
// initialize all specific things
}
else
{
// initialize all specific things
}

// handle defaults and sanity checks

What do you think?

How about the attached?

Attachments:

0001-DefineCollation-code-cleanup.patchtext/plain; charset=UTF-8; name=0001-DefineCollation-code-cleanup.patchDownload

From c7b408ba646c5ee5f8c6ae84bec04ca4059ed08f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 10 Mar 2022 10:49:49 +0100
Subject: [PATCH] DefineCollation() code cleanup

Reorganize the code in DefineCollation() so that the parts using the
FROM clause and the parts not doing so are more cleanly separated.  No
functionality change intended.

Reported-by: Julien Rouhaud <rjuju123@gmail.com>
---
 src/backend/commands/collationcmds.c | 109 ++++++++++++++-------------
 1 file changed, 57 insertions(+), 52 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 12fc2316f9..93df1d366c 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -63,12 +63,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *providerEl = NULL;
 	DefElem    *deterministicEl = NULL;
 	DefElem    *versionEl = NULL;
-	char	   *collcollate = NULL;
-	char	   *collctype = NULL;
-	char	   *collproviderstr = NULL;
-	bool		collisdeterministic = true;
-	int			collencoding = 0;
-	char		collprovider = 0;
+	char	   *collcollate;
+	char	   *collctype;
+	bool		collisdeterministic;
+	int			collencoding;
+	char		collprovider;
 	char	   *collversion = NULL;
 	Oid			newoid;
 	ObjectAddress address;
@@ -167,65 +166,71 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("collation \"default\" cannot be copied")));
 	}
-
-	if (localeEl)
+	else
 	{
-		collcollate = defGetString(localeEl);
-		collctype = defGetString(localeEl);
-	}
+		char	   *collproviderstr = NULL;
 
-	if (lccollateEl)
-		collcollate = defGetString(lccollateEl);
+		collcollate = NULL;
+		collctype = NULL;
 
-	if (lcctypeEl)
-		collctype = defGetString(lcctypeEl);
+		if (localeEl)
+		{
+			collcollate = defGetString(localeEl);
+			collctype = defGetString(localeEl);
+		}
 
-	if (providerEl)
-		collproviderstr = defGetString(providerEl);
+		if (lccollateEl)
+			collcollate = defGetString(lccollateEl);
 
-	if (deterministicEl)
-		collisdeterministic = defGetBoolean(deterministicEl);
+		if (lcctypeEl)
+			collctype = defGetString(lcctypeEl);
 
-	if (versionEl)
-		collversion = defGetString(versionEl);
+		if (providerEl)
+			collproviderstr = defGetString(providerEl);
 
-	if (collproviderstr)
-	{
-		if (pg_strcasecmp(collproviderstr, "icu") == 0)
-			collprovider = COLLPROVIDER_ICU;
-		else if (pg_strcasecmp(collproviderstr, "libc") == 0)
-			collprovider = COLLPROVIDER_LIBC;
+		if (deterministicEl)
+			collisdeterministic = defGetBoolean(deterministicEl);
+		else
+			collisdeterministic = true;
+
+		if (versionEl)
+			collversion = defGetString(versionEl);
+
+		if (collproviderstr)
+		{
+			if (pg_strcasecmp(collproviderstr, "icu") == 0)
+				collprovider = COLLPROVIDER_ICU;
+			else if (pg_strcasecmp(collproviderstr, "libc") == 0)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						 errmsg("unrecognized collation provider: %s",
+								collproviderstr)));
+		}
 		else
+			collprovider = COLLPROVIDER_LIBC;
+
+		if (!collcollate)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-					 errmsg("unrecognized collation provider: %s",
-							collproviderstr)));
-	}
-	else if (!fromEl)
-		collprovider = COLLPROVIDER_LIBC;
-
-	if (!collcollate)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_collate\" must be specified")));
+					 errmsg("parameter \"lc_collate\" must be specified")));
 
-	if (!collctype)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("parameter \"lc_ctype\" must be specified")));
+		if (!collctype)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("parameter \"lc_ctype\" must be specified")));
 
-	/*
-	 * Nondeterministic collations are currently only supported with ICU
-	 * because that's the only case where it can actually make a difference.
-	 * So we can save writing the code for the other providers.
-	 */
-	if (!collisdeterministic && collprovider != COLLPROVIDER_ICU)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("nondeterministic collations not supported with this provider")));
+		/*
+		 * Nondeterministic collations are currently only supported with ICU
+		 * because that's the only case where it can actually make a difference.
+		 * So we can save writing the code for the other providers.
+		 */
+		if (!collisdeterministic && collprovider != COLLPROVIDER_ICU)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("nondeterministic collations not supported with this provider")));
 
-	if (!fromEl)
-	{
 		if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
-- 
2.35.1

#48

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#47)

Re: ICU for global collation

On Thu, Mar 10, 2022 at 10:52:41AM +0100, Peter Eisentraut wrote:

On 05.03.22 09:38, Julien Rouhaud wrote:

@@ -168,18 +175,6 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
errmsg("collation \"default\" cannot be copied")));
}

- if (localeEl)
- {
- collcollate = defGetString(localeEl);
- collctype = defGetString(localeEl);
- }
[...]

I tried to read the function and quickly got confused about whether possible
problematic conditions could be reached or not and had protection or not. I
think that DefineCollation is becoming more and more unreadable, with a mix of
fromEl, !fromEl and either. Maybe it would be a good time to refactor the code
a bit and have have something like

if (fromEl)
{
// initialize all specific things
}
else
{
// initialize all specific things
}

// handle defaults and sanity checks

What do you think?

How about the attached?

Thanks! That's exactly what I had in mind. I think it's easier to follow and
make sure of what it's doing exactly, so +1 for it.

#49

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Julien Rouhaud (#46)

1 attachment(s)

Re: ICU for global collation

On 05.03.22 09:38, Julien Rouhaud wrote:

I say it works because I did manually check, as far as I can see there isn't
any test that ensures it.

I'm using this naive scenario:

DROP DATABASE IF EXISTS dbicu;
CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'en_US' ICU_LOCALE 'en-u-kf-upper' template 'template0';
\c dbicu
CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
CREATE TABLE icu(def text, en text COLLATE "en_US", upfirst text COLLATE upperfirst);
INSERT INTO icu VALUES ('a', 'a', 'a'), ('b','b','b'), ('A','A','A'), ('B','B','B');
SELECT def AS def FROM icu ORDER BY def;
SELECT def AS en FROM icu ORDER BY en;
SELECT def AS upfirst FROM icu ORDER BY upfirst;
SELECT def AS upfirst_explicit FROM icu ORDER BY en COLLATE upperfirst;
SELECT def AS en_x_explicit FROM icu ORDER BY def COLLATE "en-x-icu";

Maybe there should be some test along those lines included in the patch?

I added something like this to a new test suite under src/test/icu/.
(src/test/locale/ was already used for something else.)

Also, it's not even used consistently in the patch. I can see e.g. "ICU
locale" or "ICU locale setting" being used:

I have fixed those.

Maybe we could point to the ICU documentation for a clear notion of what a
locale ID is, and/or use their own short definition [1]:

The locale object in ICU is an identifier that specifies a particular locale
and has fields for language, country, and an optional code to specify further
variants or subdivisions. These fields also can be represented as a string
with the fields separated by an underscore.

I think the Localization chapter needs to be reorganized a bit, but I'll
leave that for a separate patch.

I spent some time looking at the ICU api trying to figure out if using a
posix locale name (e.g. en_US) was actually compatible with an ICU locale name.
It seems that ICU accept any of 'en-us', 'en-US', 'en_us' or 'en_US' as the
same locale, but I might be wrong. I also didn't find a way to figure out how
to ask ICU if the locale identifier passed is complete garbage or not. One
sure thing is that the system collation we import are of the form 'en-us', so
it seems weird to have this form in pg_collation and by default another form in
pg_database.

Yeah it seems to be inconsistent about that. The locale ID
documentation appears to indicate that "en_US" is the canonical form,
but when you ask it to list all the locales it knows about it returns
"en-US".

In CREATE DATABASE manual:

+        Specifies the provider to use for the default collation in this
+        database.  Possible values are:
+        <literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
+        <literal>libc</literal>.  <literal>libc</literal> is the default.  The
+        available choices depend on the operating system and build options.

That's actually not true as pg_strcasecmp is used in createdb():

+       if (pg_strcasecmp(locproviderstr, "icu") == 0)
+           dblocprovider = COLLPROVIDER_ICU;
+       else if (pg_strcasecmp(locproviderstr, "libc") == 0)
+           dblocprovider = COLLPROVIDER_LIBC;
+       else
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                    errmsg("unrecognized locale provider: %s",
+                           locproviderstr)));

I don't understand what the concern is here.

Unless I'm missing something you entirely removed the warninng about the
performance penalty when using non C/POSIX locale?

Yeah, I think it is not the job of the initdb man page to lecture people
about the merits of their locale choice. The same performance warning
can also be found in the localization chapter; we don't need to repeat
it everywhere a locale choice is mentioned.

initdb has some specific processing to transform the default libc locale to
something more appropriate, but as far as I can see creatdb / CREATE DATABASE
aren't doing that. It seems inconsistent, and IMHO another reason why
defaulting to the libc locale looks like a bad idea.

This has all been removed. The separate ICU locale option should now be
required everywhere (initdb, createdb, CREATE DATABASE).

While on that topic, the doc should probably mention that default ICU
collations can only be deterministic.

Well, there is no option to do otherwise, so I'm not sure where/how to
mention that. We usually don't document options that don't exist. ;-)

Also unrelated, but how about raising a warning about possibly hiding
corruption if you use CREATE COLLATION ... VERSION or CREATE DATABASE ...
COLLATION_VERSION if !IsBinaryUpgrade()?

Hmm, there is a point to that. But then we should just make that an
error. Otherwise, we raise the warning but then there is no way to
"fix" what was warned about. Seems confusing.

in AlterCollation(), pg_collation_actual_version(), AlterDatabaseRefreshColl()
and pg_database_collation_actual_version():
-   datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
-   Assert(!isnull);
-   newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+   if (!isnull)
+       newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   else
+       newversion = NULL;
The columns are now nullable, but can you actually end up with a null locale
name in the expected field without manual DML on the catalog, corruption or
similar? I think it should be a plain error explaining the inconsistency
rather then silently assuming there's no version. Note that at least
pg_newlocale_from_collation() asserts that the specific libc/icu field it's
interested in isn't null.

This is required because the default collations's fields are null now.

Why not using errorConflictingDefElem for locale_provider?

fixed

in createdb():

+   if (dblocprovider == COLLPROVIDER_ICU)
+   {
+       /*
+        * This would happen if template0 uses the libc provider but the new
+        * database uses icu.
+        */
+       if (!dbiculocale)
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("ICU locale must be specified")));
+   }
+
+   if (dblocprovider == COLLPROVIDER_ICU)
+   {
+#ifdef USE_ICU
[...]

Seems like a refactoring leftover, no need for two blocks.

These are two independent blocks in my mind. It's possible that someone
might want to insert something in the middle at some point.

Also, I think it
would be better to first error out if there's no support for ICU rather than
complaining about a possibly missing locale that wouldn't be accepted anyway.

Seems better to do general syntax checks first and then deeper checks of
the passed values.

+void
+make_icu_collator(const char *iculocstr,
+                 struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+[...]
+   /* We will leak this string if we get an error below :-( */
+   resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
+   resultp->info.icu.ucol = collator;
+#else                          /* not USE_ICU */
+   /* could get here if a collation was created by a build with ICU */
+   ereport(ERROR,
+           (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+            errmsg("ICU is not supported in this build"), \
+            errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif                         /* not USE_ICU */
+}

The comment about the leak possibility needs some tweaking since it's been
extracted from a larger function.

fixed

Not material for this patch, but:
@@ -1475,7 +1524,12 @@ pg_newlocale_from_collation(Oid collid)
Assert(OidIsValid(collid));
if (collid == DEFAULT_COLLATION_OID)
-       return (pg_locale_t) 0;
+   {
+       if (default_locale.provider == COLLPROVIDER_ICU)
+           return &default_locale;
+       else
+           return (pg_locale_t) 0;
+   }
I'm wondering if we could now always return &default_locale and avoid having to
check if the function returned something in all callers, since CheckMyDatabase
now initialize it.

Maybe that's something to look into, but that would probably require
updating a call callers to handle the return values differently, which
would require quite a bit of work.

Shouldn't all the code before checking the locale name also be in the #ifdef
USE_ICU?

I like to keep the #ifdef sections as small as possible and limited to
the code that really uses the respective library.

@@ -2859,6 +2870,17 @@ dumpDatabase(Archive *fout)
appendPQExpBufferStr(creaQry, " ENCODING = ");
appendStringLiteralAH(creaQry, encoding, fout);
}
+   if (strlen(datlocprovider) > 0)
+   {
+       appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
+       if (datlocprovider[0] == 'c')
+           appendPQExpBufferStr(creaQry, "libc");
+       else if (datlocprovider[0] == 'i')
+           appendPQExpBufferStr(creaQry, "icu");
+       else
+           fatal("unrecognized locale provider: %s",
+                 datlocprovider);
+   }
if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
{

AFAICS datlocprovider shouldn't be empty. Maybe raise an error or an assert if
that's the case?

Yeah that was bogus, copied from the earlier encoding handling, which is
probably also bogus, but I'm not touching that here.

# CREATE DATABASE db1 LOCALE_PROVIDER icu ICU_LOCALE "fr-x-icu";
ERROR: 22023: new locale provider (i) does not match locale provider of the template database (c)
HINT: Use the same locale provider as in the template database, or use template0 as template.

It feels strange to write "LOCALE_PROVIDER icu" and get "provider (i)" in the
error message. I think it would be better to emit the user-facing value, not
internal one.

Fixed. I had added a collprovider_name() function but didn't use it here.

Finally, there are some changes in pg_collation, I'm assuming it's done for
consistency since pg_database may need two different locale information, but it
should probably be at least mentioned in the commit message.

done

Attachments:

v6-0001-Add-option-to-use-ICU-as-global-locale-provider.patchtext/plain; charset=UTF-8; name=v6-0001-Add-option-to-use-ICU-as-global-locale-provider.patchDownload

From 7d9703ae72a0b82168eaf8be3d8274e1101df71a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 14 Mar 2022 13:46:15 +0100
Subject: [PATCH v6] Add option to use ICU as global locale provider

This adds the option to use ICU as the default locale provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Since some (legacy) code still uses the libc locale facilities
directly, we still need to set the libc global locale settings even if
ICU is otherwise selected.  So pg_database now has three
locale-related fields: the existing datcollate and datctype, which are
always set, and a new daticulocale, which is only set if ICU is
selected.  A similar change is made in pg_collation for consistency,
but in that case, only the libc-related fields or the ICU-related
field is set, never both.

Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com

XXX catversion bump
---
 doc/src/sgml/catalogs.sgml                    |   9 +
 doc/src/sgml/ref/create_database.sgml         |  25 +++
 doc/src/sgml/ref/createdb.sgml                |  19 +++
 doc/src/sgml/ref/initdb.sgml                  |  72 ++++++--
 src/backend/catalog/pg_collation.c            |  18 +-
 src/backend/commands/collationcmds.c          |  96 +++++++----
 src/backend/commands/dbcommands.c             | 159 +++++++++++++++---
 src/backend/utils/adt/pg_locale.c             | 144 ++++++++++------
 src/backend/utils/init/postinit.c             |  21 ++-
 src/bin/initdb/Makefile                       |   4 +-
 src/bin/initdb/initdb.c                       |  97 +++++++++--
 src/bin/initdb/t/001_initdb.pl                |  27 +++
 src/bin/pg_dump/pg_dump.c                     |  30 +++-
 src/bin/pg_upgrade/check.c                    |  13 ++
 src/bin/pg_upgrade/info.c                     |  18 +-
 src/bin/pg_upgrade/pg_upgrade.h               |   2 +
 src/bin/psql/describe.c                       |  23 ++-
 src/bin/psql/tab-complete.c                   |   3 +-
 src/bin/scripts/Makefile                      |   2 +
 src/bin/scripts/createdb.c                    |  20 +++
 src/bin/scripts/t/020_createdb.pl             |  28 +++
 src/include/catalog/pg_collation.dat          |   3 +-
 src/include/catalog/pg_collation.h            |  20 ++-
 src/include/catalog/pg_database.dat           |   4 +-
 src/include/catalog/pg_database.h             |   6 +
 src/include/utils/pg_locale.h                 |   5 +
 src/test/Makefile                             |   6 +-
 src/test/icu/.gitignore                       |   2 +
 src/test/icu/Makefile                         |  25 +++
 src/test/icu/README                           |  27 +++
 src/test/icu/t/010_database.pl                |  58 +++++++
 .../regress/expected/collate.icu.utf8.out     |  10 +-
 src/test/regress/sql/collate.icu.utf8.sql     |   8 +-
 33 files changed, 837 insertions(+), 167 deletions(-)
 create mode 100644 src/test/icu/.gitignore
 create mode 100644 src/test/icu/Makefile
 create mode 100644 src/test/icu/README
 create mode 100644 src/test/icu/t/010_database.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 7777d60514..bcf2b43206 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2384,6 +2384,15 @@ <title><structname>pg_collation</structname> Columns</title>
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>colliculocale</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU locale ID for this collation object
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>collversion</structfield> <type>text</type>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index f70d0c75b4..33e2d7ce83 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -28,6 +28,8 @@
            [ LOCALE [=] <replaceable class="parameter">locale</replaceable> ]
            [ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
            [ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
+           [ ICU_LOCALE [=] <replaceable class="parameter">icu_locale</replaceable> ]
+           [ LOCALE_PROVIDER [=] <replaceable class="parameter">locale_provider</replaceable> ]
            [ COLLATION_VERSION = <replaceable>collation_version</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
            [ ALLOW_CONNECTIONS [=] <replaceable class="parameter">allowconn</replaceable> ]
@@ -160,6 +162,29 @@ <title>Parameters</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><replaceable class="parameter">icu_locale</replaceable></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale ID if the ICU locale provider is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><replaceable>locale_provider</replaceable></term>
+
+      <listitem>
+       <para>
+        Specifies the provider to use for the default collation in this
+        database.  Possible values are:
+        <literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
+        <literal>libc</literal>.  <literal>libc</literal> is the default.  The
+        available choices depend on the operating system and build options.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><replaceable>collation_version</replaceable></term>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 86473455c9..be42e502d6 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -147,6 +147,25 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable class="parameter">locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale ID to be used in this database, if the
+        ICU locale provider is selected.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        Specifies the locale provider for the database's default collation.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-O <replaceable class="parameter">owner</replaceable></option></term>
       <term><option>--owner=<replaceable class="parameter">owner</replaceable></option></term>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 8f71c7c962..f5d633b0af 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -86,30 +86,45 @@ <title>Description</title>
   </para>
 
   <para>
-   <command>initdb</command> initializes the database cluster's default
-   locale and character set encoding. The character set encoding,
-   collation order (<literal>LC_COLLATE</literal>) and character set classes
-   (<literal>LC_CTYPE</literal>, e.g., upper, lower, digit) can be set separately
-   for a database when it is created. <command>initdb</command> determines
-   those settings for the template databases, which will
-   serve as the default for all other databases.
+   <command>initdb</command> initializes the database cluster's default locale
+   and character set encoding. These can also be set separately for each
+   database when it is created. <command>initdb</command> determines those
+   settings for the template databases, which will serve as the default for
+   all other databases.  By default, <command>initdb</command> uses the
+   locale provider <literal>libc</literal>, takes the locale settings from
+   the environment, and determines the encoding from the locale settings.
+   This is almost always sufficient, unless there are special requirements.
   </para>
 
   <para>
-   To alter the default collation order or character set classes, use the
-   <option>--lc-collate</option> and <option>--lc-ctype</option> options.
-   Collation orders other than <literal>C</literal> or <literal>POSIX</literal> also have
-   a performance penalty.  For these reasons it is important to choose the
-   right locale when running <command>initdb</command>.
+   To choose a different locale for the cluster, use the option
+   <option>--locale</option>.  There are also individual options
+   <option>--lc-*</option> (see below) to set values for the individual locale
+   categories.  Note that inconsistent settings for different locale
+   categories can give nonsensical results, so this should be used with care.
   </para>
 
   <para>
-   The remaining locale categories can be changed later when the server
-   is started.  You can also use <option>--locale</option> to set the
-   default for all locale categories, including collation order and
-   character set classes. All server locale values (<literal>lc_*</literal>) can
-   be displayed via <command>SHOW ALL</command>.
-   More details can be found in <xref linkend="locale"/>.
+   Alternatively, the ICU library can be used to provide locale services.
+   (Again, this only sets the default for subsequently created databases.)  To
+   select this option, specify <literal>--locale-provider=icu</literal>.
+   To chose the specific ICU locale ID to apply, use the option
+   <option>--icu-locale</option>.  Note that
+   for implementation reasons and to support legacy code,
+   <command>initdb</command> will still select and initialize libc locale
+   settings when the ICU locale provider is used.
+  </para>
+
+  <para>
+   When <command>initdb</command> runs, it will print out the locale settings
+   it has chosen.  If you have complex requirements or specified multiple
+   options, it is advisable to check that the result matches what was
+   intended.
+  </para>
+
+  <para>
+   More details about locale settings can be found in <xref
+   linkend="locale"/>.
   </para>
 
   <para>
@@ -210,6 +225,15 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the ICU locale ID, if the ICU locale provider is used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
       <term><option>-k</option></term>
       <term><option>--data-checksums</option></term>
@@ -264,6 +288,18 @@ <title>Options</title>
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <listitem>
+       <para>
+        This option sets the locale provider for databases created in the
+        new cluster.  It can be overridden in the <command>CREATE
+        DATABASE</command> command when new databases are subsequently
+        created.  The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-N</option></term>
       <term><option>--no-sync</option></term>
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index bfc02d3038..93545786df 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -49,6 +49,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 				bool collisdeterministic,
 				int32 collencoding,
 				const char *collcollate, const char *collctype,
+				const char *colliculocale,
 				const char *collversion,
 				bool if_not_exists,
 				bool quiet)
@@ -66,8 +67,7 @@ CollationCreate(const char *collname, Oid collnamespace,
 	AssertArg(collname);
 	AssertArg(collnamespace);
 	AssertArg(collowner);
-	AssertArg(collcollate);
-	AssertArg(collctype);
+	AssertArg((collcollate && collctype) || colliculocale);
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
@@ -161,8 +161,18 @@ CollationCreate(const char *collname, Oid collnamespace,
 	values[Anum_pg_collation_collprovider - 1] = CharGetDatum(collprovider);
 	values[Anum_pg_collation_collisdeterministic - 1] = BoolGetDatum(collisdeterministic);
 	values[Anum_pg_collation_collencoding - 1] = Int32GetDatum(collencoding);
-	values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
-	values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
+	if (collcollate)
+		values[Anum_pg_collation_collcollate - 1] = CStringGetTextDatum(collcollate);
+	else
+		nulls[Anum_pg_collation_collcollate - 1] = true;
+	if (collctype)
+		values[Anum_pg_collation_collctype - 1] = CStringGetTextDatum(collctype);
+	else
+		nulls[Anum_pg_collation_collctype - 1] = true;
+	if (colliculocale)
+		values[Anum_pg_collation_colliculocale - 1] = CStringGetTextDatum(colliculocale);
+	else
+		nulls[Anum_pg_collation_colliculocale - 1] = true;
 	if (collversion)
 		values[Anum_pg_collation_collversion - 1] = CStringGetTextDatum(collversion);
 	else
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 93df1d366c..b86684a833 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -65,6 +65,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *versionEl = NULL;
 	char	   *collcollate;
 	char	   *collctype;
+	char	   *colliculocale;
 	bool		collisdeterministic;
 	int			collencoding;
 	char		collprovider;
@@ -152,6 +153,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collctype = NULL;
 
+		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
+		if (!isnull)
+			colliculocale = TextDatumGetCString(datum);
+		else
+			colliculocale = NULL;
+
 		ReleaseSysCache(tp);
 
 		/*
@@ -172,18 +179,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 
 		collcollate = NULL;
 		collctype = NULL;
-
-		if (localeEl)
-		{
-			collcollate = defGetString(localeEl);
-			collctype = defGetString(localeEl);
-		}
-
-		if (lccollateEl)
-			collcollate = defGetString(lccollateEl);
-
-		if (lcctypeEl)
-			collctype = defGetString(lcctypeEl);
+		colliculocale = NULL;
 
 		if (providerEl)
 			collproviderstr = defGetString(providerEl);
@@ -211,15 +207,42 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collprovider = COLLPROVIDER_LIBC;
 
-		if (!collcollate)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-					 errmsg("parameter \"lc_collate\" must be specified")));
+		if (localeEl)
+		{
+			if (collprovider == COLLPROVIDER_LIBC)
+			{
+				collcollate = defGetString(localeEl);
+				collctype = defGetString(localeEl);
+			}
+			else
+				colliculocale = defGetString(localeEl);
+		}
 
-		if (!collctype)
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-					 errmsg("parameter \"lc_ctype\" must be specified")));
+		if (lccollateEl)
+			collcollate = defGetString(lccollateEl);
+
+		if (lcctypeEl)
+			collctype = defGetString(lcctypeEl);
+
+		if (collprovider == COLLPROVIDER_LIBC)
+		{
+			if (!collcollate)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						 errmsg("parameter \"lc_collate\" must be specified")));
+
+			if (!collctype)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						 errmsg("parameter \"lc_ctype\" must be specified")));
+		}
+		else if (collprovider == COLLPROVIDER_ICU)
+		{
+			if (!colliculocale)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						 errmsg("parameter \"locale\" must be specified")));
+		}
 
 		/*
 		 * Nondeterministic collations are currently only supported with ICU
@@ -260,7 +283,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collcollate);
+		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -270,6 +293,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 							 collencoding,
 							 collcollate,
 							 collctype,
+							 colliculocale,
 							 collversion,
 							 if_not_exists,
 							 false);	/* not quiet */
@@ -352,9 +376,11 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
-	Assert(!isnull);
-	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+	if (!isnull)
+		newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	else
+		newversion = NULL;
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -414,9 +440,11 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 	collprovider = ((Form_pg_collation) GETSTRUCT(tp))->collprovider;
 
-	datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-	Assert(!isnull);
-	version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
+	datum = SysCacheGetAttr(COLLOID, tp, collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+	if (!isnull)
+		version = get_collation_actual_version(collprovider, TextDatumGetCString(datum));
+	else
+		version = NULL;
 
 	ReleaseSysCache(tp);
 
@@ -643,7 +671,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			 */
 			collid = CollationCreate(localebuf, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 localebuf, localebuf,
+									 localebuf, localebuf, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
 									 true, true);
 			if (OidIsValid(collid))
@@ -704,7 +732,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			collid = CollationCreate(alias, nspid, GetUserId(),
 									 COLLPROVIDER_LIBC, true, enc,
-									 locale, locale,
+									 locale, locale, NULL,
 									 get_collation_actual_version(COLLPROVIDER_LIBC, locale),
 									 true, true);
 			if (OidIsValid(collid))
@@ -745,7 +773,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			const char *name;
 			char	   *langtag;
 			char	   *icucomment;
-			const char *collcollate;
+			const char *iculocstr;
 			Oid			collid;
 
 			if (i == -1)
@@ -754,20 +782,20 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 				name = uloc_getAvailable(i);
 
 			langtag = get_icu_language_tag(name);
-			collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
+			iculocstr = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name;
 
 			/*
 			 * Be paranoid about not allowing any non-ASCII strings into
 			 * pg_collation
 			 */
-			if (!pg_is_ascii(langtag) || !pg_is_ascii(collcollate))
+			if (!pg_is_ascii(langtag) || !pg_is_ascii(iculocstr))
 				continue;
 
 			collid = CollationCreate(psprintf("%s-x-icu", langtag),
 									 nspid, GetUserId(),
 									 COLLPROVIDER_ICU, true, -1,
-									 collcollate, collcollate,
-									 get_collation_actual_version(COLLPROVIDER_ICU, collcollate),
+									 NULL, NULL, iculocstr,
+									 get_collation_actual_version(COLLPROVIDER_ICU, iculocstr),
 									 true, true);
 			if (OidIsValid(collid))
 			{
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index c37e3c9a9a..f0b57fd69a 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -86,7 +86,8 @@ static bool get_db_info(const char *name, LOCKMODE lockmode,
 						Oid *dbIdP, Oid *ownerIdP,
 						int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 						TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
-						Oid *dbTablespace, char **dbCollate, char **dbCtype,
+						Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+						char *dbLocProvider,
 						char **dbCollversion);
 static bool have_createdb_privilege(void);
 static void remove_dbtablespaces(Oid db_id);
@@ -107,6 +108,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	int			src_encoding = -1;
 	char	   *src_collate = NULL;
 	char	   *src_ctype = NULL;
+	char	   *src_iculocale = NULL;
+	char		src_locprovider;
 	char	   *src_collversion = NULL;
 	bool		src_istemplate;
 	bool		src_allowconn;
@@ -128,6 +131,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	DefElem    *dlocale = NULL;
 	DefElem    *dcollate = NULL;
 	DefElem    *dctype = NULL;
+	DefElem    *diculocale = NULL;
+	DefElem	   *dlocprovider = NULL;
 	DefElem    *distemplate = NULL;
 	DefElem    *dallowconnections = NULL;
 	DefElem    *dconnlimit = NULL;
@@ -137,6 +142,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	const char *dbtemplate = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
+	char	   *dbiculocale = NULL;
+	char		dblocprovider = '\0';
 	char	   *canonname;
 	int			encoding = -1;
 	bool		dbistemplate = false;
@@ -194,6 +201,18 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				errorConflictingDefElem(defel, pstate);
 			dctype = defel;
 		}
+		else if (strcmp(defel->defname, "icu_locale") == 0)
+		{
+			if (diculocale)
+				errorConflictingDefElem(defel, pstate);
+			diculocale = defel;
+		}
+		else if (strcmp(defel->defname, "locale_provider") == 0)
+		{
+			if (dlocprovider)
+				errorConflictingDefElem(defel, pstate);
+			dlocprovider = defel;
+		}
 		else if (strcmp(defel->defname, "is_template") == 0)
 		{
 			if (distemplate)
@@ -257,12 +276,6 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 parser_errposition(pstate, defel->location)));
 	}
 
-	if (dlocale && (dcollate || dctype))
-		ereport(ERROR,
-				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("conflicting or redundant options"),
-				 errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.")));
-
 	if (downer && downer->arg)
 		dbowner = defGetString(downer);
 	if (dtemplate && dtemplate->arg)
@@ -304,6 +317,31 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = defGetString(dcollate);
 	if (dctype && dctype->arg)
 		dbctype = defGetString(dctype);
+	if (diculocale && diculocale->arg)
+		dbiculocale = defGetString(diculocale);
+	if (dlocprovider && dlocprovider->arg)
+	{
+		char	   *locproviderstr = defGetString(dlocprovider);
+
+		if (pg_strcasecmp(locproviderstr, "icu") == 0)
+			dblocprovider = COLLPROVIDER_ICU;
+		else if (pg_strcasecmp(locproviderstr, "libc") == 0)
+			dblocprovider = COLLPROVIDER_LIBC;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("unrecognized locale provider: %s",
+							locproviderstr)));
+	}
+	if (diculocale && dblocprovider != COLLPROVIDER_ICU)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+				 errmsg("ICU locale cannot be specified unless locale provider is ICU")));
+	if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+	}
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -355,7 +393,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 &src_dboid, &src_owner, &src_encoding,
 					 &src_istemplate, &src_allowconn,
 					 &src_frozenxid, &src_minmxid, &src_deftablespace,
-					 &src_collate, &src_ctype, &src_collversion))
+					 &src_collate, &src_ctype, &src_iculocale, &src_locprovider,
+					 &src_collversion))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("template database \"%s\" does not exist",
@@ -381,6 +420,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbcollate = src_collate;
 	if (dbctype == NULL)
 		dbctype = src_ctype;
+	if (dbiculocale == NULL)
+		dbiculocale = src_iculocale;
+	if (dblocprovider == '\0')
+		dblocprovider = src_locprovider;
 
 	/* Some encodings are client only */
 	if (!PG_VALID_BE_ENCODING(encoding))
@@ -402,6 +445,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
 
+	if (dblocprovider == COLLPROVIDER_ICU)
+	{
+		/*
+		 * This would happen if template0 uses the libc provider but the new
+		 * database uses icu.
+		 */
+		if (!dbiculocale)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("ICU locale must be specified")));
+	}
+
+	if (dblocprovider == COLLPROVIDER_ICU)
+	{
+#ifdef USE_ICU
+		UErrorCode  status;
+
+		status = U_ZERO_ERROR;
+		ucol_open(dbiculocale, &status);
+		if (U_FAILURE(status))
+			ereport(ERROR,
+					(errmsg("could not open collator for locale \"%s\": %s",
+							dbiculocale, u_errorName(status))));
+#else
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build"), \
+				 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif
+	}
+
 	/*
 	 * Check that the new encoding and locale settings match the source
 	 * database.  We insist on this because we simply copy the source data ---
@@ -435,6 +509,25 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("new LC_CTYPE (%s) is incompatible with the LC_CTYPE of the template database (%s)",
 							dbctype, src_ctype),
 					 errhint("Use the same LC_CTYPE as in the template database, or use template0 as template.")));
+
+		if (dblocprovider != src_locprovider)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("new locale provider (%s) does not match locale provider of the template database (%s)",
+							collprovider_name(dblocprovider), collprovider_name(src_locprovider)),
+					 errhint("Use the same locale provider as in the template database, or use template0 as template.")));
+
+		if (dblocprovider == COLLPROVIDER_ICU)
+		{
+			Assert(dbiculocale);
+			Assert(src_iculocale);
+			if (strcmp(dbiculocale, src_iculocale) != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("new ICU locale (%s) is incompatible with the ICU locale of the template database (%s)",
+								dbiculocale, src_iculocale),
+						 errhint("Use the same ICU locale as in the template database, or use template0 as template.")));
+		}
 	}
 
 	/*
@@ -453,7 +546,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		char	   *actual_versionstr;
 
-		actual_versionstr = get_collation_actual_version(COLLPROVIDER_LIBC, dbcollate);
+		actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
 		if (!actual_versionstr)
 			ereport(ERROR,
 					(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
@@ -481,7 +574,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * collation version, which is normally only the case for template0.
 	 */
 	if (dbcollversion == NULL)
-		dbcollversion = get_collation_actual_version(COLLPROVIDER_LIBC, dbcollate);
+		dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
 
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
@@ -620,6 +713,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * block on the unique index, and fail after we commit).
 	 */
 
+	Assert((dblocprovider == COLLPROVIDER_ICU && dbiculocale) ||
+		   (dblocprovider != COLLPROVIDER_ICU && !dbiculocale));
+
 	/* Form tuple */
 	MemSet(new_record, 0, sizeof(new_record));
 	MemSet(new_record_nulls, false, sizeof(new_record_nulls));
@@ -629,6 +725,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		DirectFunctionCall1(namein, CStringGetDatum(dbname));
 	new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba);
 	new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding);
+	new_record[Anum_pg_database_datlocprovider - 1] = CharGetDatum(dblocprovider);
 	new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(dbistemplate);
 	new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(dballowconnections);
 	new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit);
@@ -637,6 +734,10 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	new_record[Anum_pg_database_dattablespace - 1] = ObjectIdGetDatum(dst_deftablespace);
 	new_record[Anum_pg_database_datcollate - 1] = CStringGetTextDatum(dbcollate);
 	new_record[Anum_pg_database_datctype - 1] = CStringGetTextDatum(dbctype);
+	if (dbiculocale)
+		new_record[Anum_pg_database_daticulocale - 1] = CStringGetTextDatum(dbiculocale);
+	else
+		new_record_nulls[Anum_pg_database_daticulocale - 1] = true;
 	if (dbcollversion)
 		new_record[Anum_pg_database_datcollversion - 1] = CStringGetTextDatum(dbcollversion);
 	else
@@ -907,7 +1008,7 @@ dropdb(const char *dbname, bool missing_ok, bool force)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 	{
 		if (!missing_ok)
 		{
@@ -1109,7 +1210,7 @@ RenameDatabase(const char *oldname, const char *newname)
 	rel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", oldname)));
@@ -1222,7 +1323,7 @@ movedb(const char *dbname, const char *tblspcname)
 	pgdbrel = table_open(DatabaseRelationId, RowExclusiveLock);
 
 	if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL,
-					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL))
+					 NULL, NULL, NULL, NULL, &src_tblspcoid, NULL, NULL, NULL, NULL, NULL))
 		ereport(ERROR,
 				(errcode(ERRCODE_UNDEFINED_DATABASE),
 				 errmsg("database \"%s\" does not exist", dbname)));
@@ -1755,9 +1856,11 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
-	Assert(!isnull);
-	newversion = get_collation_actual_version(COLLPROVIDER_LIBC, TextDatumGetCString(datum));
+	datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
+	if (!isnull)
+		newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
+	else
+		newversion = NULL;
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -1943,6 +2046,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 {
 	Oid			dbid = PG_GETARG_OID(0);
 	HeapTuple	tp;
+	char		datlocprovider;
 	Datum		datum;
 	bool		isnull;
 	char	   *version;
@@ -1953,9 +2057,13 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_UNDEFINED_OBJECT),
 				 errmsg("database with OID %u does not exist", dbid)));
 
-	datum = SysCacheGetAttr(DATABASEOID, tp, Anum_pg_database_datcollate, &isnull);
-	Assert(!isnull);
-	version = get_collation_actual_version(COLLPROVIDER_LIBC, TextDatumGetCString(datum));
+	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
+
+	datum = SysCacheGetAttr(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, &isnull);
+	if (!isnull)
+		version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
+	else
+		version = NULL;
 
 	ReleaseSysCache(tp);
 
@@ -1981,7 +2089,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 			Oid *dbIdP, Oid *ownerIdP,
 			int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP,
 			TransactionId *dbFrozenXidP, MultiXactId *dbMinMultiP,
-			Oid *dbTablespace, char **dbCollate, char **dbCtype,
+			Oid *dbTablespace, char **dbCollate, char **dbCtype, char **dbIculocale,
+			char *dbLocProvider,
 			char **dbCollversion)
 {
 	bool		result = false;
@@ -2075,6 +2184,8 @@ get_db_info(const char *name, LOCKMODE lockmode,
 				if (dbTablespace)
 					*dbTablespace = dbform->dattablespace;
 				/* default locale settings for this database */
+				if (dbLocProvider)
+					*dbLocProvider = dbform->datlocprovider;
 				if (dbCollate)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollate, &isnull);
@@ -2087,6 +2198,14 @@ get_db_info(const char *name, LOCKMODE lockmode,
 					Assert(!isnull);
 					*dbCtype = TextDatumGetCString(datum);
 				}
+				if (dbIculocale)
+				{
+					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_daticulocale, &isnull);
+					if (isnull)
+						*dbIculocale = NULL;
+					else
+						*dbIculocale = TextDatumGetCString(datum);
+				}
 				if (dbCollversion)
 				{
 					datum = SysCacheGetAttr(DATABASEOID, tuple, Anum_pg_database_datcollversion, &isnull);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 871a710967..4019255f8e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1288,26 +1288,37 @@ lookup_collation_cache(Oid collation, bool set_flags)
 	{
 		/* Attempt to set the flags */
 		HeapTuple	tp;
-		Datum		datum;
-		bool		isnull;
-		const char *collcollate;
-		const char *collctype;
+		Form_pg_collation collform;
 
 		tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(collation));
 		if (!HeapTupleIsValid(tp))
 			elog(ERROR, "cache lookup failed for collation %u", collation);
+		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-		Assert(!isnull);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
-		Assert(!isnull);
-		collctype = TextDatumGetCString(datum);
-
-		cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
-									 (strcmp(collcollate, "POSIX") == 0));
-		cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
-								   (strcmp(collctype, "POSIX") == 0));
+		if (collform->collprovider == COLLPROVIDER_LIBC)
+		{
+			Datum		datum;
+			bool		isnull;
+			const char *collcollate;
+			const char *collctype;
+
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(collcollate, "C") == 0) ||
+										 (strcmp(collcollate, "POSIX") == 0));
+			cache_entry->ctype_is_c = ((strcmp(collctype, "C") == 0) ||
+									   (strcmp(collctype, "POSIX") == 0));
+		}
+		else
+		{
+			cache_entry->collate_is_c = false;
+			cache_entry->ctype_is_c = false;
+		}
 
 		cache_entry->flags_valid = true;
 
@@ -1340,6 +1351,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_COLLATE, NULL);
@@ -1390,6 +1404,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return false;
+
 		if (result >= 0)
 			return (bool) result;
 		localeptr = setlocale(LC_CTYPE, NULL);
@@ -1418,6 +1435,38 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+struct pg_locale_struct default_locale;
+
+void
+make_icu_collator(const char *iculocstr,
+				  struct pg_locale_struct *resultp)
+{
+#ifdef USE_ICU
+	UCollator  *collator;
+	UErrorCode	status;
+
+	status = U_ZERO_ERROR;
+	collator = ucol_open(iculocstr, &status);
+	if (U_FAILURE(status))
+		ereport(ERROR,
+				(errmsg("could not open collator for locale \"%s\": %s",
+						iculocstr, u_errorName(status))));
+
+	if (U_ICU_VERSION_MAJOR_NUM < 54)
+		icu_set_collation_attributes(collator, iculocstr);
+
+	/* We will leak this string if the caller errors later :-( */
+	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
+	resultp->info.icu.ucol = collator;
+#else							/* not USE_ICU */
+	/* could get here if a collation was created by a build with ICU */
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("ICU is not supported in this build"), \
+			 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
+#endif							/* not USE_ICU */
+}
+
 
 /* simple subroutine for reporting errors from newlocale() */
 #ifdef HAVE_LOCALE_T
@@ -1475,7 +1524,12 @@ pg_newlocale_from_collation(Oid collid)
 	Assert(OidIsValid(collid));
 
 	if (collid == DEFAULT_COLLATION_OID)
-		return (pg_locale_t) 0;
+	{
+		if (default_locale.provider == COLLPROVIDER_ICU)
+			return &default_locale;
+		else
+			return (pg_locale_t) 0;
+	}
 
 	cache_entry = lookup_collation_cache(collid, false);
 
@@ -1484,8 +1538,6 @@ pg_newlocale_from_collation(Oid collid)
 		/* We haven't computed this yet in this session, so do it */
 		HeapTuple	tp;
 		Form_pg_collation collform;
-		const char *collcollate;
-		const char *collctype pg_attribute_unused();
 		struct pg_locale_struct result;
 		pg_locale_t resultp;
 		Datum		datum;
@@ -1496,13 +1548,6 @@ pg_newlocale_from_collation(Oid collid)
 			elog(ERROR, "cache lookup failed for collation %u", collid);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
-		Assert(!isnull);
-		collcollate = TextDatumGetCString(datum);
-		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
-		Assert(!isnull);
-		collctype = TextDatumGetCString(datum);
-
 		/* We'll fill in the result struct locally before allocating memory */
 		memset(&result, 0, sizeof(result));
 		result.provider = collform->collprovider;
@@ -1511,8 +1556,17 @@ pg_newlocale_from_collation(Oid collid)
 		if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
+			const char *collcollate;
+			const char *collctype pg_attribute_unused();
 			locale_t	loc;
 
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+			collcollate = TextDatumGetCString(datum);
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collctype, &isnull);
+			Assert(!isnull);
+			collctype = TextDatumGetCString(datum);
+
 			if (strcmp(collcollate, collctype) == 0)
 			{
 				/* Normal case where they're the same */
@@ -1563,36 +1617,12 @@ pg_newlocale_from_collation(Oid collid)
 		}
 		else if (collform->collprovider == COLLPROVIDER_ICU)
 		{
-#ifdef USE_ICU
-			UCollator  *collator;
-			UErrorCode	status;
-
-			if (strcmp(collcollate, collctype) != 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("collations with different collate and ctype values are not supported by ICU")));
-
-			status = U_ZERO_ERROR;
-			collator = ucol_open(collcollate, &status);
-			if (U_FAILURE(status))
-				ereport(ERROR,
-						(errmsg("could not open collator for locale \"%s\": %s",
-								collcollate, u_errorName(status))));
+			const char *iculocstr;
 
-			if (U_ICU_VERSION_MAJOR_NUM < 54)
-				icu_set_collation_attributes(collator, collcollate);
-
-			/* We will leak this string if we get an error below :-( */
-			result.info.icu.locale = MemoryContextStrdup(TopMemoryContext,
-														 collcollate);
-			result.info.icu.ucol = collator;
-#else							/* not USE_ICU */
-			/* could get here if a collation was created by a build with ICU */
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("ICU is not supported in this build"), \
-					 errhint("You need to rebuild PostgreSQL using %s.", "--with-icu")));
-#endif							/* not USE_ICU */
+			datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_colliculocale, &isnull);
+			Assert(!isnull);
+			iculocstr = TextDatumGetCString(datum);
+			make_icu_collator(iculocstr, &result);
 		}
 
 		datum = SysCacheGetAttr(COLLOID, tp, Anum_pg_collation_collversion,
@@ -1604,7 +1634,11 @@ pg_newlocale_from_collation(Oid collid)
 
 			collversionstr = TextDatumGetCString(datum);
 
-			actual_versionstr = get_collation_actual_version(collform->collprovider, collcollate);
+			datum = SysCacheGetAttr(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+			Assert(!isnull);
+
+			actual_versionstr = get_collation_actual_version(collform->collprovider,
+															 TextDatumGetCString(datum));
 			if (!actual_versionstr)
 			{
 				/*
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 86d193c89f..6452b42dbf 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -318,6 +318,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	bool		isnull;
 	char	   *collate;
 	char	   *ctype;
+	char	   *iculocale;
 
 	/* Fetch our pg_database row normally, via syscache */
 	tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(MyDatabaseId));
@@ -420,6 +421,24 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	if (dbform->datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_daticulocale, &isnull);
+		Assert(!isnull);
+		iculocale = TextDatumGetCString(datum);
+		make_icu_collator(iculocale, &default_locale);
+	}
+	else
+		iculocale = NULL;
+
+	default_locale.provider = dbform->datlocprovider;
+	/*
+	 * Default locale is currently always deterministic.  Nondeterministic
+	 * locales currently don't support pattern matching, which would break a
+	 * lot of things if applied globally.
+	 */
+	default_locale.deterministic = true;
+
 	/*
 	 * Check collation version.  See similar code in
 	 * pg_newlocale_from_collation().  Note that here we warn instead of error
@@ -434,7 +453,7 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 
 		collversionstr = TextDatumGetCString(datum);
 
-		actual_versionstr = get_collation_actual_version(COLLPROVIDER_LIBC, collate);
+		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate);
 		if (!actual_versionstr)
 			ereport(WARNING,
 					(errmsg("database \"%s\" has no actual collation version, but a version was recorded",
diff --git a/src/bin/initdb/Makefile b/src/bin/initdb/Makefile
index eba282267a..8dd25e7afc 100644
--- a/src/bin/initdb/Makefile
+++ b/src/bin/initdb/Makefile
@@ -40,7 +40,7 @@ OBJS = \
 all: initdb
 
 initdb: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
-	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) $(ICU_LIBS) -o $@$(X)
 
 # We must pull in localtime.c from src/timezones
 localtime.c: % : $(top_srcdir)/src/timezone/%
@@ -62,6 +62,8 @@ clean distclean maintainer-clean:
 # ensure that changes in datadir propagate into object file
 initdb.o: initdb.c $(top_builddir)/src/Makefile.global
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 97f15971e2..cbcd55288f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -55,6 +55,10 @@
 #include <signal.h>
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #ifdef HAVE_SHM_OPEN
 #include "sys/mman.h"
 #endif
@@ -132,6 +136,8 @@ static char *lc_monetary = NULL;
 static char *lc_numeric = NULL;
 static char *lc_time = NULL;
 static char *lc_messages = NULL;
+static char locale_provider = COLLPROVIDER_LIBC;
+static char *icu_locale = NULL;
 static const char *default_text_search_config = NULL;
 static char *username = NULL;
 static bool pwprompt = false;
@@ -1405,6 +1411,12 @@ bootstrap_template1(void)
 	bki_lines = replace_token(bki_lines, "LC_CTYPE",
 							  escape_quotes_bki(lc_ctype));
 
+	bki_lines = replace_token(bki_lines, "ICU_LOCALE",
+							  locale_provider == COLLPROVIDER_ICU ? escape_quotes_bki(icu_locale) : "_null_");
+
+	sprintf(buf, "%c", locale_provider);
+	bki_lines = replace_token(bki_lines, "LOCALE_PROVIDER", buf);
+
 	/* Also ensure backend isn't confused by this environment var: */
 	unsetenv("PGCLIENTENCODING");
 
@@ -2165,7 +2177,6 @@ setlocales(void)
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
 	 */
-
 	check_locale_name(LC_CTYPE, lc_ctype, &canonname);
 	lc_ctype = canonname;
 	check_locale_name(LC_COLLATE, lc_collate, &canonname);
@@ -2184,6 +2195,37 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+
+	if (locale_provider == COLLPROVIDER_ICU)
+	{
+		if (!icu_locale)
+		{
+			pg_log_error("ICU locale must be specified");
+			exit(1);
+		}
+
+		/*
+		 * Check ICU locale ID
+		 */
+#ifdef USE_ICU
+		{
+			UErrorCode	status;
+
+			status = U_ZERO_ERROR;
+			ucol_open(icu_locale, &status);
+			if (U_FAILURE(status))
+			{
+				pg_log_error("could not open collator for locale \"%s\": %s",
+							 icu_locale, u_errorName(status));
+				exit(1);
+			}
+		}
+#else
+		pg_log_error("ICU is not supported in this build");
+		fprintf(stderr, _("You need to rebuild PostgreSQL using %s.\n"), "--with-icu");
+		exit(1);
+#endif
+	}
 }
 
 /*
@@ -2202,6 +2244,7 @@ usage(const char *progname)
 	printf(_(" [-D, --pgdata=]DATADIR     location for this database cluster\n"));
 	printf(_("  -E, --encoding=ENCODING   set default encoding for new databases\n"));
 	printf(_("  -g, --allow-group-access  allow group read/execute on data directory\n"));
+	printf(_("      --icu-locale=LOCALE   set ICU locale ID for new databases\n"));
 	printf(_("  -k, --data-checksums      use data page checksums\n"));
 	printf(_("      --locale=LOCALE       set default locale for new databases\n"));
 	printf(_("      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
@@ -2209,6 +2252,8 @@ usage(const char *progname)
 			 "                            set default locale in the respective category for\n"
 			 "                            new databases (default taken from environment)\n"));
 	printf(_("      --no-locale           equivalent to --locale=C\n"));
+	printf(_("      --locale-provider={libc|icu}\n"
+			 "                            set default locale provider for new databases\n"));
 	printf(_("      --pwfile=FILE         read password for the new superuser from file\n"));
 	printf(_("  -T, --text-search-config=CFG\n"
 			 "                            default text search configuration\n"));
@@ -2372,21 +2417,26 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (strcmp(lc_ctype, lc_collate) == 0 &&
+	if (locale_provider == COLLPROVIDER_LIBC &&
+		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0)
+		strcmp(lc_ctype, lc_messages) == 0 &&
+		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
-		printf(_("The database cluster will be initialized with locales\n"
-				 "  COLLATE:  %s\n"
-				 "  CTYPE:    %s\n"
-				 "  MESSAGES: %s\n"
-				 "  MONETARY: %s\n"
-				 "  NUMERIC:  %s\n"
-				 "  TIME:     %s\n"),
+		printf(_("The database cluster will be initialized with this locale configuration:\n"));
+		printf(_("  provider:    %s\n"), collprovider_name(locale_provider));
+		if (icu_locale)
+			printf(_("  ICU locale:  %s\n"), icu_locale);
+		printf(_("  LC_COLLATE:  %s\n"
+				 "  LC_CTYPE:    %s\n"
+				 "  LC_MESSAGES: %s\n"
+				 "  LC_MONETARY: %s\n"
+				 "  LC_NUMERIC:  %s\n"
+				 "  LC_TIME:     %s\n"),
 			   lc_collate,
 			   lc_ctype,
 			   lc_messages,
@@ -2395,7 +2445,9 @@ setup_locale_encoding(void)
 			   lc_time);
 	}
 
-	if (!encoding)
+	if (!encoding && locale_provider == COLLPROVIDER_ICU)
+		encodingid = PG_UTF8;
+	else if (!encoding)
 	{
 		int			ctype_enc;
 
@@ -2899,6 +2951,8 @@ main(int argc, char *argv[])
 		{"data-checksums", no_argument, NULL, 'k'},
 		{"allow-group-access", no_argument, NULL, 'g'},
 		{"discard-caches", no_argument, NULL, 14},
+		{"locale-provider", required_argument, NULL, 15},
+		{"icu-locale", required_argument, NULL, 16},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -3045,6 +3099,20 @@ main(int argc, char *argv[])
 										 extra_options,
 										 "-c debug_discard_caches=1");
 				break;
+			case 15:
+				if (strcmp(optarg, "icu") == 0)
+					locale_provider = COLLPROVIDER_ICU;
+				else if (strcmp(optarg, "libc") == 0)
+					locale_provider = COLLPROVIDER_LIBC;
+				else
+				{
+					pg_log_error("unrecognized locale provider: %s", optarg);
+					exit(1);
+				}
+				break;
+			case 16:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				/* getopt_long already emitted a complaint */
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -3073,6 +3141,13 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (icu_locale && locale_provider != COLLPROVIDER_ICU)
+	{
+		pg_log_error("%s cannot be specified unless locale provider \"%s\" is chosen",
+					 "--icu-locale", "icu");
+		exit(1);
+	}
+
 	atexit(cleanup_directories_atexit);
 
 	/* If we only need to fsync, just do it and exit */
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 7dc8cdd855..c636bf3ab2 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -93,4 +93,31 @@
 		'check PGDATA permissions');
 }
 
+# Locale provider tests
+
+if ($ENV{with_icu} eq 'yes')
+{
+	command_fails_like(['initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2"],
+		qr/initdb: error: ICU locale must be specified/,
+		'locale provider ICU requires --icu-locale');
+
+	command_ok(['initdb', '--no-sync', '--locale-provider=icu', '--icu-locale=en', "$tempdir/data3"],
+		'option --icu-locale');
+
+	command_fails_like(['initdb', '--no-sync', '--locale-provider=icu', '--icu-locale=@colNumeric=lower', "$tempdir/dataX"],
+		qr/initdb: error: could not open collator for locale/,
+		'fails for invalid ICU locale');
+}
+else
+{
+	command_fails(['initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2"],
+				  'locale provider ICU fails since no ICU support');
+}
+
+command_fails(['initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX"],
+			  'fails for invalid locale provider');
+
+command_fails(['initdb', '--no-sync', '--locale-provider=libc', '--icu-locale=en', "$tempdir/dataX"],
+			  'fails for invalid option combination');
+
 done_testing();
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4dd24b8c89..725cd2e4eb 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2753,8 +2753,10 @@ dumpDatabase(Archive *fout)
 				i_datname,
 				i_datdba,
 				i_encoding,
+				i_datlocprovider,
 				i_collate,
 				i_ctype,
+				i_daticulocale,
 				i_frozenxid,
 				i_minmxid,
 				i_datacl,
@@ -2769,8 +2771,10 @@ dumpDatabase(Archive *fout)
 	const char *datname,
 			   *dba,
 			   *encoding,
+			   *datlocprovider,
 			   *collate,
 			   *ctype,
+			   *iculocale,
 			   *datistemplate,
 			   *datconnlimit,
 			   *tablespace;
@@ -2794,9 +2798,9 @@ dumpDatabase(Archive *fout)
 	else
 		appendPQExpBuffer(dbQry, "0 AS datminmxid, ");
 	if (fout->remoteVersion >= 150000)
-		appendPQExpBuffer(dbQry, "datcollversion, ");
+		appendPQExpBuffer(dbQry, "datlocprovider, daticulocale, datcollversion, ");
 	else
-		appendPQExpBuffer(dbQry, "NULL AS datcollversion, ");
+		appendPQExpBuffer(dbQry, "'c' AS datlocprovider, NULL AS daticulocale, NULL AS datcollversion, ");
 	appendPQExpBuffer(dbQry,
 					  "(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, "
 					  "shobj_description(oid, 'pg_database') AS description "
@@ -2810,8 +2814,10 @@ dumpDatabase(Archive *fout)
 	i_datname = PQfnumber(res, "datname");
 	i_datdba = PQfnumber(res, "datdba");
 	i_encoding = PQfnumber(res, "encoding");
+	i_datlocprovider = PQfnumber(res, "datlocprovider");
 	i_collate = PQfnumber(res, "datcollate");
 	i_ctype = PQfnumber(res, "datctype");
+	i_daticulocale = PQfnumber(res, "daticulocale");
 	i_frozenxid = PQfnumber(res, "datfrozenxid");
 	i_minmxid = PQfnumber(res, "datminmxid");
 	i_datacl = PQfnumber(res, "datacl");
@@ -2826,8 +2832,13 @@ dumpDatabase(Archive *fout)
 	datname = PQgetvalue(res, 0, i_datname);
 	dba = getRoleName(PQgetvalue(res, 0, i_datdba));
 	encoding = PQgetvalue(res, 0, i_encoding);
+	datlocprovider = PQgetvalue(res, 0, i_datlocprovider);
 	collate = PQgetvalue(res, 0, i_collate);
 	ctype = PQgetvalue(res, 0, i_ctype);
+	if (!PQgetisnull(res, 0, i_daticulocale))
+		iculocale = PQgetvalue(res, 0, i_daticulocale);
+	else
+		iculocale = NULL;
 	frozenxid = atooid(PQgetvalue(res, 0, i_frozenxid));
 	minmxid = atooid(PQgetvalue(res, 0, i_minmxid));
 	dbdacl.acl = PQgetvalue(res, 0, i_datacl);
@@ -2859,6 +2870,16 @@ dumpDatabase(Archive *fout)
 		appendPQExpBufferStr(creaQry, " ENCODING = ");
 		appendStringLiteralAH(creaQry, encoding, fout);
 	}
+
+	appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
+	if (datlocprovider[0] == 'c')
+		appendPQExpBufferStr(creaQry, "libc");
+	else if (datlocprovider[0] == 'i')
+		appendPQExpBufferStr(creaQry, "icu");
+	else
+		fatal("unrecognized locale provider: %s",
+			  datlocprovider);
+
 	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
 	{
 		appendPQExpBufferStr(creaQry, " LOCALE = ");
@@ -2877,6 +2898,11 @@ dumpDatabase(Archive *fout)
 			appendStringLiteralAH(creaQry, ctype, fout);
 		}
 	}
+	if (iculocale)
+	{
+		appendPQExpBufferStr(creaQry, " ICU_LOCALE = ");
+		appendStringLiteralAH(creaQry, iculocale, fout);
+	}
 
 	/*
 	 * For binary upgrade, carry over the collation version.  For normal
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 019bcb6c7b..cf3b398d9e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -10,6 +10,7 @@
 #include "postgres_fe.h"
 
 #include "catalog/pg_authid_d.h"
+#include "catalog/pg_collation.h"
 #include "fe_utils/string_utils.h"
 #include "mb/pg_wchar.h"
 #include "pg_upgrade.h"
@@ -349,6 +350,18 @@ check_locale_and_encoding(DbInfo *olddb, DbInfo *newdb)
 	if (!equivalent_locale(LC_CTYPE, olddb->db_ctype, newdb->db_ctype))
 		pg_fatal("lc_ctype values for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
 				 olddb->db_name, olddb->db_ctype, newdb->db_ctype);
+	if (olddb->db_collprovider != newdb->db_collprovider)
+		pg_fatal("locale providers for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
+				 olddb->db_name,
+				 collprovider_name(olddb->db_collprovider),
+				 collprovider_name(newdb->db_collprovider));
+	if ((olddb->db_iculocale == NULL && newdb->db_iculocale != NULL) ||
+		(olddb->db_iculocale != NULL && newdb->db_iculocale == NULL) ||
+		(olddb->db_iculocale != NULL && newdb->db_iculocale != NULL && strcmp(olddb->db_iculocale, newdb->db_iculocale) != 0))
+		pg_fatal("ICU locale values for database \"%s\" do not match:  old \"%s\", new \"%s\"\n",
+				 olddb->db_name,
+				 olddb->db_iculocale ? olddb->db_iculocale : "(null)",
+				 newdb->db_iculocale ? newdb->db_iculocale : "(null)");
 }
 
 /*
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 69ef23119f..5c3968e0ea 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -312,11 +312,20 @@ get_db_infos(ClusterInfo *cluster)
 				i_encoding,
 				i_datcollate,
 				i_datctype,
+				i_datlocprovider,
+				i_daticulocale,
 				i_spclocation;
 	char		query[QUERY_ALLOC];
 
 	snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "'c' AS datlocprovider, NULL AS daticulocale, ");
+	else
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "datlocprovider, daticulocale, ");
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
 			 "pg_catalog.pg_tablespace_location(t.oid) AS spclocation "
 			 "FROM pg_catalog.pg_database d "
 			 " LEFT OUTER JOIN pg_catalog.pg_tablespace t "
@@ -331,6 +340,8 @@ get_db_infos(ClusterInfo *cluster)
 	i_encoding = PQfnumber(res, "encoding");
 	i_datcollate = PQfnumber(res, "datcollate");
 	i_datctype = PQfnumber(res, "datctype");
+	i_datlocprovider = PQfnumber(res, "datlocprovider");
+	i_daticulocale = PQfnumber(res, "daticulocale");
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	ntups = PQntuples(res);
@@ -343,6 +354,11 @@ get_db_infos(ClusterInfo *cluster)
 		dbinfos[tupnum].db_encoding = atoi(PQgetvalue(res, tupnum, i_encoding));
 		dbinfos[tupnum].db_collate = pg_strdup(PQgetvalue(res, tupnum, i_datcollate));
 		dbinfos[tupnum].db_ctype = pg_strdup(PQgetvalue(res, tupnum, i_datctype));
+		dbinfos[tupnum].db_collprovider = PQgetvalue(res, tupnum, i_datlocprovider)[0];
+		if (PQgetisnull(res, tupnum, i_daticulocale))
+			dbinfos[tupnum].db_iculocale = NULL;
+		else
+			dbinfos[tupnum].db_iculocale = pg_strdup(PQgetvalue(res, tupnum, i_daticulocale));
 		snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
 				 PQgetvalue(res, tupnum, i_spclocation));
 	}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index b9b3ac81b2..6d7fd88c0c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -171,6 +171,8 @@ typedef struct
 											 * path */
 	char	   *db_collate;
 	char	   *db_ctype;
+	char		db_collprovider;
+	char	   *db_iculocale;
 	int			db_encoding;
 	RelInfoArr	rel_arr;		/* array of all user relinfos */
 } DbInfo;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 9229eacb6d..991bfc1546 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -896,6 +896,18 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"),
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "       d.daticulocale as \"%s\",\n"
+						  "       CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("ICU Locale"),
+						  gettext_noop("Locale Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       d.datcollate as \"%s\",\n"
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("ICU Locale"),
+						  gettext_noop("Locale Provider"));
 	appendPQExpBufferStr(&buf, "       ");
 	printACLColumn(&buf, "d.datacl");
 	if (verbose)
@@ -4625,7 +4637,7 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, false, true, false};
+	static const bool translate_columns[] = {false, false, false, false, false, false, true, false};
 
 	initPQExpBuffer(&buf);
 
@@ -4639,6 +4651,15 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 					  gettext_noop("Collate"),
 					  gettext_noop("Ctype"));
 
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  ",\n       c.colliculocale AS \"%s\"",
+						  gettext_noop("ICU Locale"));
+	else
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collcollate AS \"%s\"",
+						  gettext_noop("ICU Locale"));
+
 	if (pset.sversion >= 100000)
 		appendPQExpBuffer(&buf,
 						  ",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 17172827a9..380cbc0b1f 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2738,7 +2738,8 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("OWNER", "TEMPLATE", "ENCODING", "TABLESPACE",
 					  "IS_TEMPLATE",
 					  "ALLOW_CONNECTIONS", "CONNECTION LIMIT",
-					  "LC_COLLATE", "LC_CTYPE", "LOCALE", "OID");
+					  "LC_COLLATE", "LC_CTYPE", "LOCALE", "OID",
+					  "LOCALE_PROVIDER", "ICU_LOCALE");
 
 	else if (Matches("CREATE", "DATABASE", MatchAny, "TEMPLATE"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_template_databases);
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index b833109da6..25e7da3d3f 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -53,6 +53,8 @@ clean distclean maintainer-clean:
 	rm -f common.o $(WIN32RES)
 	rm -rf tmp_check
 
+export with_icu
+
 check:
 	$(prove_check)
 
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b0c6805bc9..6f612abf7c 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -38,6 +38,8 @@ main(int argc, char *argv[])
 		{"lc-ctype", required_argument, NULL, 2},
 		{"locale", required_argument, NULL, 'l'},
 		{"maintenance-db", required_argument, NULL, 3},
+		{"locale-provider", required_argument, NULL, 4},
+		{"icu-locale", required_argument, NULL, 5},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -61,6 +63,8 @@ main(int argc, char *argv[])
 	char	   *lc_collate = NULL;
 	char	   *lc_ctype = NULL;
 	char	   *locale = NULL;
+	char	   *locale_provider = NULL;
+	char	   *icu_locale = NULL;
 
 	PQExpBufferData sql;
 
@@ -119,6 +123,12 @@ main(int argc, char *argv[])
 			case 3:
 				maintenance_db = pg_strdup(optarg);
 				break;
+			case 4:
+				locale_provider = pg_strdup(optarg);
+				break;
+			case 5:
+				icu_locale = pg_strdup(optarg);
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -217,6 +227,13 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(&sql, " LC_CTYPE ");
 		appendStringLiteralConn(&sql, lc_ctype, conn);
 	}
+	if (locale_provider)
+		appendPQExpBuffer(&sql, " LOCALE_PROVIDER %s", locale_provider);
+	if (icu_locale)
+	{
+		appendPQExpBufferStr(&sql, " ICU_LOCALE ");
+		appendStringLiteralConn(&sql, icu_locale, conn);
+	}
 
 	appendPQExpBufferChar(&sql, ';');
 
@@ -273,6 +290,9 @@ help(const char *progname)
 	printf(_("  -l, --locale=LOCALE          locale settings for the database\n"));
 	printf(_("      --lc-collate=LOCALE      LC_COLLATE setting for the database\n"));
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
+	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
+	printf(_("      --locale-provider={libc|icu}\n"
+			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
 	printf(_("  -T, --template=TEMPLATE      template database to copy\n"));
 	printf(_("  -V, --version                output version information, then exit\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 639245466e..35deec9a92 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -25,9 +25,37 @@
 	qr/statement: CREATE DATABASE foobar2 ENCODING 'LATIN1'/,
 	'create database with encoding');
 
+if ($ENV{with_icu} eq 'yes')
+{
+	# This fails because template0 uses libc provider and has no ICU
+	# locale set.  It would succeed if template0 used the icu
+	# provider.  XXX Maybe split into multiple tests?
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		'create database with ICU fails without ICU locale specified');
+
+	$node->issues_sql_like(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=en', 'foobar5' ],
+		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
+		'create database with ICU locale specified');
+
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=@colNumeric=lower', 'foobarX' ],
+		'fails for invalid ICU locale');
+}
+else
+{
+	$node->command_fails(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		'create database with ICU fails since no ICU support');
+}
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
+$node->command_fails([ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	'fails for invalid locale provider');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index 4b56825d82..f7470ead49 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -14,8 +14,7 @@
 
 { oid => '100', oid_symbol => 'DEFAULT_COLLATION_OID',
   descr => 'database\'s default collation',
-  collname => 'default', collprovider => 'd', collencoding => '-1',
-  collcollate => '', collctype => '' },
+  collname => 'default', collprovider => 'd', collencoding => '-1' },
 { oid => '950', oid_symbol => 'C_COLLATION_OID',
   descr => 'standard C collation',
   collname => 'C', collprovider => 'c', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index 8763dd4080..c642c3bb95 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -40,8 +40,9 @@ CATALOG(pg_collation,3456,CollationRelationId)
 	bool		collisdeterministic BKI_DEFAULT(t);
 	int32		collencoding;	/* encoding for this collation; -1 = "all" */
 #ifdef CATALOG_VARLEN			/* variable-length fields start here */
-	text		collcollate BKI_FORCE_NOT_NULL;		/* LC_COLLATE setting */
-	text		collctype BKI_FORCE_NOT_NULL;		/* LC_CTYPE setting */
+	text		collcollate BKI_DEFAULT(_null_);	/* LC_COLLATE setting */
+	text		collctype BKI_DEFAULT(_null_);		/* LC_CTYPE setting */
+	text		colliculocale BKI_DEFAULT(_null_);	/* ICU locale ID */
 	text		collversion BKI_DEFAULT(_null_);	/* provider-dependent
 													 * version of collation
 													 * data */
@@ -66,6 +67,20 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on
 #define COLLPROVIDER_ICU		'i'
 #define COLLPROVIDER_LIBC		'c'
 
+static inline const char *
+collprovider_name(char c)
+{
+	switch (c)
+	{
+		case COLLPROVIDER_ICU:
+			return "icu";
+		case COLLPROVIDER_LIBC:
+			return "libc";
+		default:
+			return "???";
+	}
+}
+
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
 
@@ -75,6 +90,7 @@ extern Oid	CollationCreate(const char *collname, Oid collnamespace,
 							bool collisdeterministic,
 							int32 collencoding,
 							const char *collcollate, const char *collctype,
+							const char *colliculocale,
 							const char *collversion,
 							bool if_not_exists,
 							bool quiet);
diff --git a/src/include/catalog/pg_database.dat b/src/include/catalog/pg_database.dat
index e7e42d6023..5feedff7bf 100644
--- a/src/include/catalog/pg_database.dat
+++ b/src/include/catalog/pg_database.dat
@@ -14,9 +14,9 @@
 
 { oid => '1', oid_symbol => 'TemplateDbOid',
   descr => 'default template for new databases',
-  datname => 'template1', encoding => 'ENCODING', datistemplate => 't',
+  datname => 'template1', encoding => 'ENCODING', datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
   datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0',
   datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
-  datctype => 'LC_CTYPE', datacl => '_null_' },
+  datctype => 'LC_CTYPE', daticulocale => 'ICU_LOCALE', datacl => '_null_' },
 
 ]
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 76adbd4aad..a9f4a8071f 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -40,6 +40,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* character encoding */
 	int32		encoding;
 
+	/* locale provider, see pg_collation.collprovider */
+	char		datlocprovider;
+
 	/* allowed as CREATE DATABASE template? */
 	bool		datistemplate;
 
@@ -65,6 +68,9 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID
 	/* LC_CTYPE setting */
 	text		datctype BKI_FORCE_NOT_NULL;
 
+	/* ICU locale ID */
+	text		daticulocale;
+
 	/* provider-dependent version of collation data */
 	text		datcollversion BKI_DEFAULT(_null_);
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 30e423af0e..9b158f24a0 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -103,6 +103,11 @@ struct pg_locale_struct
 
 typedef struct pg_locale_struct *pg_locale_t;
 
+extern struct pg_locale_struct default_locale;
+
+extern void make_icu_collator(const char *iculocstr,
+							  struct pg_locale_struct *resultp);
+
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
diff --git a/src/test/Makefile b/src/test/Makefile
index 46275915ff..69ef074d75 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -14,6 +14,10 @@ include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription
 
+ifeq ($(with_icu),yes)
+SUBDIRS += icu
+endif
+
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
 # PG_TEST_EXTRA:
@@ -37,7 +41,7 @@ endif
 # clean" etc to recurse into them.  (We must filter out those that we
 # have conditionally included into SUBDIRS above, else there will be
 # make confusion.)
-ALWAYS_SUBDIRS = $(filter-out $(SUBDIRS),examples kerberos ldap ssl)
+ALWAYS_SUBDIRS = $(filter-out $(SUBDIRS),examples kerberos icu ldap ssl)
 
 # We want to recurse to all subdirs for all standard targets, except that
 # installcheck and install should not recurse into the subdirectory "modules".
diff --git a/src/test/icu/.gitignore b/src/test/icu/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/icu/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/icu/Makefile b/src/test/icu/Makefile
new file mode 100644
index 0000000000..e30f5e9524
--- /dev/null
+++ b/src/test/icu/Makefile
@@ -0,0 +1,25 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/icu
+#
+# Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/icu/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/icu
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+export with_icu
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/icu/README b/src/test/icu/README
new file mode 100644
index 0000000000..cfc9353dff
--- /dev/null
+++ b/src/test/icu/README
@@ -0,0 +1,27 @@
+src/test/icu/README
+
+Regression tests for ICU functionality
+======================================
+
+This directory contains a test suite for ICU functionality.
+
+Running the tests
+=================
+
+NOTE: You must have given the --enable-tap-tests argument to configure.
+Also, to use "make installcheck", you must have built and installed
+contrib/hstore in addition to the core code.
+
+Run
+    make check
+or
+    make installcheck
+You can use "make installcheck" if you previously did "make install".
+In that case, the code in the installation tree is tested.  With
+"make check", a temporary installation tree is built from the current
+sources and then tested.
+
+Either way, this test initializes, starts, and stops several test Postgres
+clusters.
+
+See src/test/perl/README for more info about running these tests.
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
new file mode 100644
index 0000000000..4cc8907b42
--- /dev/null
+++ b/src/test/icu/t/010_database.pl
@@ -0,0 +1,58 @@
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{with_icu} ne 'yes')
+{
+	plan skip_all => 'ICU not supported by this build';
+}
+
+my $node1 = PostgreSQL::Test::Cluster->new('node1');
+$node1->init;
+$node1->start;
+
+$node1->safe_psql('postgres',
+	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' ICU_LOCALE 'en-u-kf-upper' TEMPLATE template0});
+
+$node1->safe_psql('dbicu',
+q{
+CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
+CREATE TABLE icu (def text, en text COLLATE "en-x-icu", upfirst text COLLATE upperfirst);
+INSERT INTO icu VALUES ('a', 'a', 'a'), ('b', 'b', 'b'), ('A', 'A', 'A'), ('B', 'B', 'B');
+});
+
+is($node1->safe_psql('dbicu', q{SELECT def FROM icu ORDER BY def}),
+	qq(A
+a
+B
+b),
+	'sort by database default locale');
+
+is($node1->safe_psql('dbicu', q{SELECT def FROM icu ORDER BY def COLLATE "en-x-icu"}),
+	qq(a
+A
+b
+B),
+	'sort by explicit collation standard');
+
+is($node1->safe_psql('dbicu', q{SELECT def FROM icu ORDER BY en COLLATE upperfirst}),
+	qq(A
+a
+B
+b),
+	'sort by explicit collation upper first');
+
+
+# Test error cases in CREATE DATABASE involving locale-related options
+
+my ($ret, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu TEMPLATE template0});
+isnt($ret, 0, "ICU locale must be specified for ICU provider: exit code not 0");
+like($stderr, qr/ERROR:  ICU locale must be specified/, "ICU locale must be specified for ICU provider: error message");
+
+
+done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 9699ca16cf..d4c8c6de38 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1029,14 +1029,12 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-ERROR:  parameter "lc_ctype" must be specified
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
+ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 242a7ce6b7..b0ddc7db44 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -366,13 +366,11 @@ CREATE SCHEMA test_schema;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test1 (provider = icu, lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
-          ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+  EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
+          quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
-CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
+CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense'); /* never fails with ICU */  DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;

base-commit: 705e20f8550c0e8e47c0b6b20b5f5ffd6ffd9e33
-- 
2.35.1

#50

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#49)

Re: ICU for global collation

On Mon, Mar 14, 2022 at 8:51 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

[ new patches ]

I'm not very knowledgeable about this topic, but to me, it seems
confusing to think of having both a libc collation and an ICU
collation associated with a database. I have two main questions:

1. What will happen if I set the ICU collation to something that
doesn't match the libc collation? How bad are the consequences?

2. If I want to avoid a mismatch between the two, then I will need a
way to figure out which libc collation corresponds to a given ICU
collation. How do I do that?

I have a sneaking suspicion that I'm not going to like the answer to
question #2 very much, but maybe that's life. I feel like this whole
area is far too full of magical things. True practitioners know that
if you utter some mysterious incantation to the Lords of ICU, you can
get exactly the behavior you want. And ... everyone else has no idea
what to do.

--
Robert Haas
EDB: http://www.enterprisedb.com

#51

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#49)

Re: ICU for global collation

On Mon, Mar 14, 2022 at 01:50:50PM +0100, Peter Eisentraut wrote:

On 05.03.22 09:38, Julien Rouhaud wrote:

I say it works because I did manually check, as far as I can see there isn't
any test that ensures it.

I'm using this naive scenario:

DROP DATABASE IF EXISTS dbicu;
CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'en_US' ICU_LOCALE 'en-u-kf-upper' template 'template0';
\c dbicu
CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
CREATE TABLE icu(def text, en text COLLATE "en_US", upfirst text COLLATE upperfirst);
INSERT INTO icu VALUES ('a', 'a', 'a'), ('b','b','b'), ('A','A','A'), ('B','B','B');
SELECT def AS def FROM icu ORDER BY def;
SELECT def AS en FROM icu ORDER BY en;
SELECT def AS upfirst FROM icu ORDER BY upfirst;
SELECT def AS upfirst_explicit FROM icu ORDER BY en COLLATE upperfirst;
SELECT def AS en_x_explicit FROM icu ORDER BY def COLLATE "en-x-icu";

Maybe there should be some test along those lines included in the patch?

I added something like this to a new test suite under src/test/icu/.
(src/test/locale/ was already used for something else.)

Great, thanks!

The locale object in ICU is an identifier that specifies a particular locale
and has fields for language, country, and an optional code to specify further
variants or subdivisions. These fields also can be represented as a string
with the fields separated by an underscore.

I think the Localization chapter needs to be reorganized a bit, but I'll
leave that for a separate patch.

WFM.

I spent some time looking at the ICU api trying to figure out if using a
posix locale name (e.g. en_US) was actually compatible with an ICU locale name.
It seems that ICU accept any of 'en-us', 'en-US', 'en_us' or 'en_US' as the
same locale, but I might be wrong. I also didn't find a way to figure out how
to ask ICU if the locale identifier passed is complete garbage or not. One
sure thing is that the system collation we import are of the form 'en-us', so
it seems weird to have this form in pg_collation and by default another form in
pg_database.

Yeah it seems to be inconsistent about that. The locale ID documentation
appears to indicate that "en_US" is the canonical form, but when you ask it
to list all the locales it knows about it returns "en-US".

Yeah I saw that too when checking is POSIX locale names were valid, and that's
not great.

In CREATE DATABASE manual:

+        Specifies the provider to use for the default collation in this
+        database.  Possible values are:
+        <literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
+        <literal>libc</literal>.  <literal>libc</literal> is the default.  The
+        available choices depend on the operating system and build options.

That's actually not true as pg_strcasecmp is used in createdb():

I don't understand what the concern is here.

Ah sorry I missed the <indexterm>, I thought the possible values listed were
icu, ICU and libc. Shouldn't the ICU <indexterm> be before the icu <literal>
rather than the libc, or at least before the comma?

Yeah, I think it is not the job of the initdb man page to lecture people
about the merits of their locale choice. The same performance warning can
also be found in the localization chapter; we don't need to repeat it
everywhere a locale choice is mentioned.

Ah, I tried to look for another place where the same warning could be found but
missed it. I'm fine with it then!

While on that topic, the doc should probably mention that default ICU
collations can only be deterministic.

Well, there is no option to do otherwise, so I'm not sure where/how to
mention that. We usually don't document options that don't exist. ;-)

Sure, but I'm afraid that users may still be tempted to use ICU locales like
und-u-ks-level2 from the case_insensitive example in the doc and hope that it
will work accordingly. Or maybe it's just me that still sees ICU as dark magic
and want to be extra cautious.

Unrelated, but I just realized that we have PGC_INTERNAL gucs for lc_ctype and
lc_collate. Should we add one for icu_locale too?

Also unrelated, but how about raising a warning about possibly hiding
corruption if you use CREATE COLLATION ... VERSION or CREATE DATABASE ...
COLLATION_VERSION if !IsBinaryUpgrade()?

Hmm, there is a point to that. But then we should just make that an error.
Otherwise, we raise the warning but then there is no way to "fix" what was
warned about. Seems confusing.

I was afraid that an error was a bit too much, but if that's acceptable I agree
that it's better.

in AlterCollation(), pg_collation_actual_version(), AlterDatabaseRefreshColl()
and pg_database_collation_actual_version():
-   datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
-   Assert(!isnull);
-   newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+   if (!isnull)
+       newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   else
+       newversion = NULL;
The columns are now nullable, but can you actually end up with a null locale
name in the expected field without manual DML on the catalog, corruption or
similar? I think it should be a plain error explaining the inconsistency
rather then silently assuming there's no version. Note that at least
pg_newlocale_from_collation() asserts that the specific libc/icu field it's
interested in isn't null.
This is required because the default collations's fields are null now.

Yes I saw that, but that's a specific exception. Detecting whether it's the
DEFAULT_COLLATION_OID or not and raise an error when a null value isn't
expected seems like it could be worthwhile.

I'm wondering if we could now always return &default_locale and avoid having to
check if the function returned something in all callers, since CheckMyDatabase
now initialize it.

Maybe that's something to look into, but that would probably require
updating a call callers to handle the return values differently, which would
require quite a bit of work.

Agreed, I just wanted to mention it before forgetting about it.

So apart from the few details mentioned above I'm happy with this patch!

There might still be some more adjustments needed based on Robert's comment.
At the very least it doesn't seem unreasonable to forbid a default ICU
collation with a C/POSIX lc_ctype for instance.

#52

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Robert Haas (#50)

Re: ICU for global collation

On 14.03.22 19:57, Robert Haas wrote:

1. What will happen if I set the ICU collation to something that
doesn't match the libc collation? How bad are the consequences?

These are unrelated, so there are no consequences.

2. If I want to avoid a mismatch between the two, then I will need a
way to figure out which libc collation corresponds to a given ICU
collation. How do I do that?

You can specify the same name for both.

#53

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#52)

Re: ICU for global collation

On Tue, Mar 15, 2022 at 12:58 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 14.03.22 19:57, Robert Haas wrote:

1. What will happen if I set the ICU collation to something that
doesn't match the libc collation? How bad are the consequences?

These are unrelated, so there are no consequences.

Can you please elaborate on this?

2. If I want to avoid a mismatch between the two, then I will need a
way to figure out which libc collation corresponds to a given ICU
collation. How do I do that?

You can specify the same name for both.

Hmm. If every name were valid in both systems, I don't think you'd be
proposing two fields.

--
Robert Haas
EDB: http://www.enterprisedb.com

#54

Finnerty, Jim

jfinnert@amazon.com

almost 4 years ago

In reply to: Robert Haas (#53)

Re: ICU for global collation

Can we get some more consistent terminology around the term "locale"?

In ICU, the "locale" is just the first part of what we can pass to the "locale" parameter in CREATE COLLATION - the part before the optional '@' delimiter. The ICU locale does not include the secondary or tertiary properties, so it is usually just the country and the language, e.g. en_US (or en-US), but it can also be something like es_TRADITIONAL for traditional Spanish.

I think it would be an improvement in clarity if we consistently use the term 'locale' to mean the same thing that ICU means by that term, and not to have the thing that we call the "locale" also include collation modifiers, or to imply that a locale is the same thing as a collation.

/Jim

#55

Daniel Verite

daniel@manitou-mail.org

almost 4 years ago

In reply to: Julien Rouhaud (#51)

Re: ICU for global collation

Julien Rouhaud wrote:

While on that topic, the doc should probably mention that default ICU
collations can only be deterministic.

Well, there is no option to do otherwise, so I'm not sure where/how to
mention that. We usually don't document options that don't exist. ;-)

Sure, but I'm afraid that users may still be tempted to use ICU locales like
und-u-ks-level2 from the case_insensitive example in the doc and hope that
it will work accordingly.

+1.

The CREATE DATABASE doc says this currently:

icu_locale

Specifies the ICU locale ID if the ICU locale provider is used.

ISTM that we need to say explicitly that this locale will be used by
default to compare all collatable strings, except that it's overruled
by a bytewise comparison to break ties in case of equality.

The idea is to describe what the backend will do with the setting
rather than saying that we don't have a nondeterministic option.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#56

Daniel Verite

daniel@manitou-mail.org

almost 4 years ago

In reply to: Finnerty, Jim (#54)

Re: ICU for global collation

Finnerty, Jim wrote:

In ICU, the "locale" is just the first part of what we can pass to the
"locale" parameter in CREATE COLLATION - the part before the optional '@'
delimiter. The ICU locale does not include the secondary or tertiary
properties,

Why not? Please see
https://unicode-org.github.io/icu/userguide/locale

It says that "a locale consists of one or more pieces of ordered
information", the pieces being a language code, a script code, a
country code, a variant code, and keywords.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#57

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Robert Haas (#53)

Re: ICU for global collation

On 15.03.22 18:28, Robert Haas wrote:

On Tue, Mar 15, 2022 at 12:58 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 14.03.22 19:57, Robert Haas wrote:

1. What will happen if I set the ICU collation to something that
doesn't match the libc collation? How bad are the consequences?

These are unrelated, so there are no consequences.

Can you please elaborate on this?

The code that is aware of ICU generally works like this:

if (locale_provider == ICU)
result = call ICU code
else
result = call libc code
return result

However, there is code out there, both within PostgreSQL itself and in
extensions, that does not do that yet. Ideally, we would eventually
change all that over, but it's not happening now. So we ought to
preserve the ability to set the libc to keep that legacy code working
for now.

This legacy code by definition doesn't know about ICU, so it doesn't
care whether the ICU setting "matches" the libc setting or anything like
that. It will just do its thing depending on its own setting.

The only consequence of settings that don't match is that the different
pieces of code behave semantically inconsistently (e.g., some routine
thinks the data is Greek and other code thinks the data is French). But
that's up to the user to set correctly. And the actual scenarios where
you can actually do anything semantically relevant this way are very
limited.

A second point is that the LC_CTYPE setting tells other parts of libc
what the current encoding is. This affects gettext for example. So you
need to set this to something sensible even if you don't use libc locale
routines otherwise.

2. If I want to avoid a mismatch between the two, then I will need a
way to figure out which libc collation corresponds to a given ICU
collation. How do I do that?

You can specify the same name for both.

Hmm. If every name were valid in both systems, I don't think you'd be
proposing two fields.

Earlier versions of this patch and predecessor patches indeed had common
fields. But in fact the two systems accept different values if you want
to delve into the advanced features. But for basic usage something like
"en_US" will work for both.

#58

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Julien Rouhaud (#51)

Re: ICU for global collation

On 15.03.22 08:41, Julien Rouhaud wrote:

The locale object in ICU is an identifier that specifies a particular locale
and has fields for language, country, and an optional code to specify further
variants or subdivisions. These fields also can be represented as a string
with the fields separated by an underscore.

I think the Localization chapter needs to be reorganized a bit, but I'll
leave that for a separate patch.

WFM.

I ended up writing a bit of content for that chapter.

While on that topic, the doc should probably mention that default ICU
collations can only be deterministic.

Well, there is no option to do otherwise, so I'm not sure where/how to
mention that. We usually don't document options that don't exist. ;-)

Sure, but I'm afraid that users may still be tempted to use ICU locales like
und-u-ks-level2 from the case_insensitive example in the doc and hope that it
will work accordingly. Or maybe it's just me that still sees ICU as dark magic
and want to be extra cautious.

I have added this to the CREATE DATABASE ref page.

Unrelated, but I just realized that we have PGC_INTERNAL gucs for lc_ctype and
lc_collate. Should we add one for icu_locale too?

I'm not sure. I think the existing ones are more for backward
compatibility with the time before it was settable per database.

in AlterCollation(), pg_collation_actual_version(), AlterDatabaseRefreshColl()
and pg_database_collation_actual_version():
-   datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
-   Assert(!isnull);
-   newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+   if (!isnull)
+       newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   else
+       newversion = NULL;
The columns are now nullable, but can you actually end up with a null locale
name in the expected field without manual DML on the catalog, corruption or
similar? I think it should be a plain error explaining the inconsistency
rather then silently assuming there's no version. Note that at least
pg_newlocale_from_collation() asserts that the specific libc/icu field it's
interested in isn't null.
This is required because the default collations's fields are null now.
Yes I saw that, but that's a specific exception. Detecting whether it's the
DEFAULT_COLLATION_OID or not and raise an error when a null value isn't
expected seems like it could be worthwhile.

I have fixed that as you suggest.

So apart from the few details mentioned above I'm happy with this patch!

committed!

#59

Shinoda, Noriyoshi (PN Japan FSIP)

noriyoshi.shinoda@hpe.com

almost 4 years ago

In reply to: Peter Eisentraut (#58)

1 attachment(s)

RE: ICU for global collation

Hi,
Thank you to all the developers.
I found that the description of the pg_database.daticulocale column was not written in the documentation.
The attached small patch adds a description of the daticulocale column to catalogs.sgml.

Regards,
Noriyoshi Shinoda
-----Original Message-----
From: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Sent: Thursday, March 17, 2022 7:29 PM
To: Julien Rouhaud <rjuju123@gmail.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; Daniel Verite <daniel@manitou-mail.org>
Subject: Re: ICU for global collation

On 15.03.22 08:41, Julien Rouhaud wrote:

The locale object in ICU is an identifier that specifies a
particular locale and has fields for language, country, and an
optional code to specify further variants or subdivisions. These
fields also can be represented as a string with the fields separated by an underscore.

I think the Localization chapter needs to be reorganized a bit, but
I'll leave that for a separate patch.

WFM.

I ended up writing a bit of content for that chapter.

While on that topic, the doc should probably mention that default
ICU collations can only be deterministic.

Well, there is no option to do otherwise, so I'm not sure where/how
to mention that. We usually don't document options that don't exist.
;-)

Sure, but I'm afraid that users may still be tempted to use ICU
locales like
und-u-ks-level2 from the case_insensitive example in the doc and hope
that it will work accordingly. Or maybe it's just me that still sees
ICU as dark magic and want to be extra cautious.

I have added this to the CREATE DATABASE ref page.

Unrelated, but I just realized that we have PGC_INTERNAL gucs for
lc_ctype and lc_collate. Should we add one for icu_locale too?

I'm not sure. I think the existing ones are more for backward compatibility with the time before it was settable per database.

in AlterCollation(), pg_collation_actual_version(),
AlterDatabaseRefreshColl() and pg_database_collation_actual_version():
-   datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
-   Assert(!isnull);
-   newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate, &isnull);
+   if (!isnull)
+       newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+   else
+       newversion = NULL;
The columns are now nullable, but can you actually end up with a
null locale name in the expected field without manual DML on the
catalog, corruption or similar? I think it should be a plain error
explaining the inconsistency rather then silently assuming there's
no version. Note that at least
pg_newlocale_from_collation() asserts that the specific libc/icu
field it's interested in isn't null.
This is required because the default collations's fields are null now.
Yes I saw that, but that's a specific exception. Detecting whether
it's the DEFAULT_COLLATION_OID or not and raise an error when a null
value isn't expected seems like it could be worthwhile.

I have fixed that as you suggest.

So apart from the few details mentioned above I'm happy with this patch!

committed!

Attachments:

pg_database_iculocale_v1.diffapplication/octet-stream; name=pg_database_iculocale_v1.diffDownload

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index bcf2b43..4dc5b34 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3054,6 +3054,15 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
 
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>daticulocale</structfield> <type>text</type>
+      </para>
+      <para>
+       ICU locale ID for this database
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>datcollversion</structfield> <type>text</type>
       </para>
       <para>

#60

Peter Eisentraut

peter.eisentraut@enterprisedb.com

almost 4 years ago

In reply to: Shinoda, Noriyoshi (PN Japan FSIP) (#59)

Re: ICU for global collation

On 17.03.22 13:01, Shinoda, Noriyoshi (PN Japan FSIP) wrote:

Thank you to all the developers.
I found that the description of the pg_database.daticulocale column was not written in the documentation.
The attached small patch adds a description of the daticulocale column to catalogs.sgml.

committed, thanks

#61

Peter Geoghegan

pg@bowt.ie

almost 4 years ago

In reply to: Peter Eisentraut (#60)

Re: ICU for global collation

On Thu, Mar 17, 2022 at 6:15 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

committed, thanks

Glad that this finally happened. Thanks to everybody involved!

--
Peter Geoghegan

#62

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Peter Eisentraut (#60)

Re: ICU for global collation

Hi,

On 2022-03-17 14:14:52 +0100, Peter Eisentraut wrote:

On 17.03.22 13:01, Shinoda, Noriyoshi (PN Japan FSIP) wrote:

Thank you to all the developers.
I found that the description of the pg_database.daticulocale column was not written in the documentation.
The attached small patch adds a description of the daticulocale column to catalogs.sgml.

committed, thanks

Wee! That's a long time weakness addressed...

Just saw a weird failure after rebasing my meson branch ontop of this. Tests
passed on debian, suse, centos 8 stream, fedora rawhide (failed due to an
independent reason), but not on centos 7.

all runs: https://cirrus-ci.com/build/5190538184884224
centos 7: https://cirrus-ci.com/task/4786632883699712?logs=tests_world#L204
centos 7 failure: https://api.cirrus-ci.com/v1/artifact/task/4786632883699712/log/build/testrun/icu/t/010_database/log/regress_log_010_database

not ok 1 - sort by database default locale
# Failed test 'sort by database default locale'
# at /tmp/cirrus-ci-build/src/test/icu/t/010_database.pl line 28.
# got: 'a
# A
# b
# B'
# expected: 'A
# a
# B
# b'
ok 2 - sort by explicit collation standard
not ok 3 - sort by explicit collation upper first
# Failed test 'sort by explicit collation upper first'
# at /tmp/cirrus-ci-build/src/test/icu/t/010_database.pl line 42.
# got: 'a
# A
# b
# B'
# expected: 'A
# a
# B
# b'
ok 4 - ICU locale must be specified for ICU provider: exit code not 0
ok 5 - ICU locale must be specified for ICU provider: error message
1..5

This is a run building with meson. But I've now triggered builds with autoconf
on centos 7 as well and that also failed. See
https://cirrus-ci.com/task/6194007767252992?logs=test_world#L378

So it looks like older ICU versions don't work?

Greetings,

Andres Freund

PS: I had not yet passed with_icu in the initdb tests for meson, that's why
there's two failures with autoconf but only one with meson.

#63

Julien Rouhaud

rjuju123@gmail.com

almost 4 years ago

In reply to: Peter Eisentraut (#60)

Re: ICU for global collation

On Thu, Mar 17, 2022 at 02:14:52PM +0100, Peter Eisentraut wrote:

On 17.03.22 13:01, Shinoda, Noriyoshi (PN Japan FSIP) wrote:

Thank you to all the developers.
I found that the description of the pg_database.daticulocale column was not written in the documentation.
The attached small patch adds a description of the daticulocale column to catalogs.sgml.

committed, thanks

Thanks a lot both! Glad to finally have that feature, as soon as we'll fix
the few reported problems.

#64

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Peter Eisentraut (#60)

Re: ICU for global collation

Hi,

On 2022-03-17 14:14:52 +0100, Peter Eisentraut wrote:

committed, thanks

Just noticed that this adds a new warning when building with -O3:

In file included from /home/andres/src/postgresql/src/include/catalog/pg_collation.h:22,
from /home/andres/src/postgresql/src/backend/commands/dbcommands.c:39:
In function ‘collprovider_name’,
inlined from ‘createdb’ at /home/andres/src/postgresql/src/backend/commands/dbcommands.c:514:4:
../../../src/include/catalog/pg_collation_d.h:47:9: warning: ‘src_locprovider’ may be used uninitialized [-Wmaybe-uninitialized]
47 | switch (c)
| ^~~~~~
/home/andres/src/postgresql/src/backend/commands/dbcommands.c: In function ‘createdb’:
/home/andres/src/postgresql/src/backend/commands/dbcommands.c:112:25: note: ‘src_locprovider’ was declared here
112 | char src_locprovider;
| ^~~~~~~~~~~~~~~

I'd fixed that for nearby variables in 3f6b3be39ca9... Gonna just NULL
initialize it as well.

Greetings,

Andres Freund

#65

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Peter Eisentraut (#44)

Re: ICU for global collation

commit f2553d43060edb210b36c63187d52a632448e1d2 says >=1500 in a few places,
but in pg_upgrade says <=1500, which looks wrong for upgrades from v15.
I think it should say <= 1400.

Show quoted text

On Wed, Feb 02, 2022 at 02:01:23PM +0100, Peter Eisentraut wrote:

--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2792,6 +2794,10 @@ dumpDatabase(Archive *fout)
appendPQExpBuffer(dbQry, "datminmxid, ");
else
appendPQExpBuffer(dbQry, "0 AS datminmxid, ");
+	if (fout->remoteVersion >= 150000)
+		appendPQExpBuffer(dbQry, "datcollprovider, ");
+	else
+		appendPQExpBuffer(dbQry, "'c' AS datcollprovider, ");
appendPQExpBuffer(dbQry,
"(SELECT spcname FROM pg_tablespace t WHERE t.oid = dattablespace) AS tablespace, "
"shobj_description(oid, 'pg_database') AS description "

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 69ef23119f..2a9ca0e389 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -312,11 +312,20 @@ get_db_infos(ClusterInfo *cluster)
i_spclocation;
char		query[QUERY_ALLOC];

snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "'c' AS datcollprovider, NULL AS daticucoll, ");
+	else
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "datcollprovider, daticucoll, ");
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
"pg_catalog.pg_tablespace_location(t.oid) AS spclocation "
"FROM pg_catalog.pg_database d "
" LEFT OUTER JOIN pg_catalog.pg_tablespace t "

--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -896,6 +896,18 @@ listAllDbs(const char *pattern, bool verbose)
gettext_noop("Encoding"),
gettext_noop("Collate"),
gettext_noop("Ctype"));
+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  "       d.daticucoll as \"%s\",\n"
+						  "       CASE d.datcollprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  gettext_noop("ICU Collation"),
+						  gettext_noop("Coll. Provider"));
+	else
+		appendPQExpBuffer(&buf,
+						  "       d.datcollate as \"%s\",\n"
+						  "       'libc' AS \"%s\",\n",
+						  gettext_noop("ICU Collation"),
+						  gettext_noop("Coll. Provider"));
appendPQExpBufferStr(&buf, "       ");
printACLColumn(&buf, "d.datacl");
if (verbose)

@@ -4617,6 +4629,15 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
gettext_noop("Collate"),
gettext_noop("Ctype"));

+	if (pset.sversion >= 150000)
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collicucoll AS \"%s\"",
+						  gettext_noop("ICU Collation"));
+	else
+		appendPQExpBuffer(&buf,
+						  ",\n       c.collcollate AS \"%s\"",
+						  gettext_noop("ICU Collation"));
+
if (pset.sversion >= 100000)
appendPQExpBuffer(&buf,
",\n       CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\"",

#66

Julien Rouhaud

rjuju123@gmail.com

over 3 years ago

In reply to: Justin Pryzby (#65)

Re: ICU for global collation

Hi,

On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote:

commit f2553d43060edb210b36c63187d52a632448e1d2 says >=1500 in a few places,
but in pg_upgrade says <=1500, which looks wrong for upgrades from v15.
I think it should say <= 1400.

On Wed, Feb 02, 2022 at 02:01:23PM +0100, Peter Eisentraut wrote:

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 69ef23119f..2a9ca0e389 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -312,11 +312,20 @@ get_db_infos(ClusterInfo *cluster)
i_spclocation;
char		query[QUERY_ALLOC];

snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "'c' AS datcollprovider, NULL AS daticucoll, ");
+	else
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "datcollprovider, daticucoll, ");
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
"pg_catalog.pg_tablespace_location(t.oid) AS spclocation "
"FROM pg_catalog.pg_database d "
" LEFT OUTER JOIN pg_catalog.pg_tablespace t "

Indeed!

#67

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Julien Rouhaud (#66)

Re: ICU for global collation

On Sun, Jun 26, 2022 at 11:51:24AM +0800, Julien Rouhaud wrote:

On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote:

+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "'c' AS datcollprovider, NULL AS daticucoll, ");
+	else
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "datcollprovider, daticucoll, ");
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
"pg_catalog.pg_tablespace_location(t.oid) AS spclocation "
"FROM pg_catalog.pg_database d "
" LEFT OUTER JOIN pg_catalog.pg_tablespace t "

Indeed!

Oops. Beta2 tagging is very close by, so I think that it would be
better to not take a risk on that now, and this is an issue only when
upgrading from v15 where datcollprovider is ICU for a database.
As things stand, someone using beta1 with this new feature, running
pg_upgrade to beta2 would lose any non-libc locale provider set.
--
Michael

#68

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Julien Rouhaud (#66)

Re: ICU for global collation

On 26.06.22 05:51, Julien Rouhaud wrote:

Hi,

On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote:
commit f2553d43060edb210b36c63187d52a632448e1d2 says >=1500 in a few places,
but in pg_upgrade says <=1500, which looks wrong for upgrades from v15.
I think it should say <= 1400.

On Wed, Feb 02, 2022 at 02:01:23PM +0100, Peter Eisentraut wrote:
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 69ef23119f..2a9ca0e389 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -312,11 +312,20 @@ get_db_infos(ClusterInfo *cluster)
i_spclocation;
char		query[QUERY_ALLOC];
snprintf(query, sizeof(query),
-			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "
+			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)

I think the fix here is to change <= to < ?

Show quoted text

+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "'c' AS datcollprovider, NULL AS daticucoll, ");
+	else
+		snprintf(query + strlen(query), sizeof(query) - strlen(query),
+				 "datcollprovider, daticucoll, ");
+	snprintf(query + strlen(query), sizeof(query) - strlen(query),
"pg_catalog.pg_tablespace_location(t.oid) AS spclocation "
"FROM pg_catalog.pg_database d "
" LEFT OUTER JOIN pg_catalog.pg_tablespace t "

Indeed!

#69

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Peter Eisentraut (#68)

Re: ICU for global collation

On Mon, Jun 27, 2022 at 08:23:59AM +0200, Peter Eisentraut wrote:

On 26.06.22 05:51, Julien Rouhaud wrote:

On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote:

+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)

I think the fix here is to change <= to < ?

Yes.
--
Michael

#70

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Michael Paquier (#69)

Re: ICU for global collation

On 27.06.22 08:42, Michael Paquier wrote:

On Mon, Jun 27, 2022 at 08:23:59AM +0200, Peter Eisentraut wrote:

On 26.06.22 05:51, Julien Rouhaud wrote:

On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote:

+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)

I think the fix here is to change <= to < ?

Yes.

Ok, committed.

(I see now that in the context of pg_upgrade, writing <= 1400 is
equivalent, but I find that confusing, so I did < 1500.)

#71

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Peter Eisentraut (#70)

Re: ICU for global collation

On Mon, Jun 27, 2022 at 09:10:29AM +0200, Peter Eisentraut wrote:

On 27.06.22 08:42, Michael Paquier wrote:

On Mon, Jun 27, 2022 at 08:23:59AM +0200, Peter Eisentraut wrote:

On 26.06.22 05:51, Julien Rouhaud wrote:

On Sat, Jun 25, 2022 at 10:19:30AM -0500, Justin Pryzby wrote:

+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)

I think the fix here is to change <= to < ?

Yes.

Ok, committed.

(I see now that in the context of pg_upgrade, writing <= 1400 is equivalent,
but I find that confusing, so I did < 1500.)

I suggested using <= 1400 for consistency with the other code, and per
bc1fbc960. But YMMV.

--
Justin

#72

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Justin Pryzby (#71)

Re: ICU for global collation

Hello everyone in this thread!

While reading and testing the patch that adds ICU for global collations
[1]: https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2
REL_15_STABLE (63b64d8270691894a9a8f2d4e929e7780020edb8) that:

1) pg_upgrade from REL_14_STABLE
(63b64d8270691894a9a8f2d4e929e7780020edb8) does not always work:

For REL_14_STABLE:

$ initdb -D data_old

For REL_15_STABLE or master:

$ initdb -D data_new --locale-provider icu --icu-locale ru-RU
$ pg_upgrade -d .../data_old -D data_new -b ... -B ...
...
Restoring database schemas in the new cluster
template1
*failure*

Consult the last few lines of
"data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting

In
data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_dump_1.log:

pg_restore: error: could not execute query: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Command was: CREATE DATABASE "template1" WITH TEMPLATE = template0 OID =
1 ENCODING = 'UTF8' LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8';

In
data_new/pg_upgrade_output.d/20220815T142454.223/log/pg_upgrade_server.log:

TRAP: FailedAssertion("(dblocprovider == COLLPROVIDER_ICU &&
dbiculocale) || (dblocprovider != COLLPROVIDER_ICU && !dbiculocale)",
File: "dbcommands.c", Line: 1292, PID: 69247)
postgres: marina postgres [local] CREATE
DATABASE(ExceptionalCondition+0xb9)[0xb4d8ec]
postgres: marina postgres [local] CREATE
DATABASE(createdb+0x1abc)[0x68ca99]
postgres: marina postgres [local] CREATE
DATABASE(standard_ProcessUtility+0x651)[0x9b1d82]
postgres: marina postgres [local] CREATE
DATABASE(ProcessUtility+0x122)[0x9b172a]
postgres: marina postgres [local] CREATE DATABASE[0x9b01cf]
postgres: marina postgres [local] CREATE DATABASE[0x9b0433]
postgres: marina postgres [local] CREATE
DATABASE(PortalRun+0x2fe)[0x9af95d]
postgres: marina postgres [local] CREATE DATABASE[0x9a953b]
postgres: marina postgres [local] CREATE
DATABASE(PostgresMain+0x733)[0x9ada6b]
postgres: marina postgres [local] CREATE DATABASE[0x8ec632]
postgres: marina postgres [local] CREATE DATABASE[0x8ebfbb]
postgres: marina postgres [local] CREATE DATABASE[0x8e8653]
postgres: marina postgres [local] CREATE
DATABASE(PostmasterMain+0x1226)[0x8e7f26]
postgres: marina postgres [local] CREATE DATABASE[0x7bbccb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7eff082f90b3]
postgres: marina postgres [local] CREATE DATABASE(_start+0x2e)[0x49c29e]
2022-08-15 14:24:56.124 MSK [69231] LOG: server process (PID 69247) was
terminated by signal 6: Aborted
2022-08-15 14:24:56.124 MSK [69231] DETAIL: Failed process was running:
CREATE DATABASE "template1" WITH TEMPLATE = template0 OID = 1 ENCODING =
'UTF8' LOCALE_PROVIDER = libc LOCALE = 'en_US.UTF-8';

1.1) It looks like there's a bug in the function get_db_infos
(src/bin/pg_upgrade/info.c), where the version of the old cluster is
always checked:

if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"'c' AS datlocprovider, NULL AS daticulocale, ");
else
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"datlocprovider, daticulocale, ");

With the simple patch

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 
df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 
100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)

  	snprintf(query, sizeof(query),
  			 "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
-	if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
+	if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
  		snprintf(query + strlen(query), sizeof(query) - strlen(query),
  				 "'c' AS datlocprovider, NULL AS daticulocale, ");
  	else

I got the expected error during the upgrade:

locale providers for database "template1" do not match: old "libc", new
"icu"
Failure, exiting

1.2) It looks like the mentioned asserion in dbcommands.c conflicts with
the following lines earlier:

if (dbiculocale == NULL)
dbiculocale = src_iculocale;

The following patch works for me:

diff --git a/src/backend/commands/dbcommands.c 
b/src/backend/commands/dbcommands.c
index 
b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9 
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt 
*stmt)
  					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  					 errmsg("ICU locale must be specified")));
  	}
+	else
+		dbiculocale = NULL;

if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);

2) CREATE DATABASE does not always require the icu locale unlike initdb
and createdb:

$ initdb -D data --locale en_US.UTF-8 --locale-provider icu
...
initdb: error: ICU locale must be specified

$ initdb -D data --locale en_US.UTF-8
$ pg_ctl -D data -l logfile start

$ createdb mydb --locale en_US.UTF-8 --template template0
--locale-provider icu
createdb: error: database creation failed: ERROR: ICU locale must be
specified

$ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE
template0 LOCALE_PROVIDER icu" postgres
CREATE DATABASE

$ psql -c "CREATE DATABASE mydb TEMPLATE template0 LOCALE_PROVIDER icu"
postgres
ERROR: ICU locale must be specified

I'm wondering if this is not a fully-supported feature (because createdb
creates an SQL command with LC_COLLATE and LC_CTYPE options instead of
LOCALE option) or is it a bug in CREATE DATABASE?.. From
src/backend/commands/dbcommands.c:

if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
{
if (dlocale && dlocale->arg)
dbiculocale = defGetString(dlocale);
}

[1]: https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2
https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#73

Julien Rouhaud

rjuju123@gmail.com

over 3 years ago

In reply to: Marina Polyakova (#72)

Re: ICU for global collation

Hi,

On Mon, Aug 15, 2022 at 03:06:32PM +0300, Marina Polyakova wrote:

1.1) It looks like there's a bug in the function get_db_infos
(src/bin/pg_upgrade/info.c), where the version of the old cluster is always
checked:

if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"'c' AS datlocprovider, NULL AS daticulocale, ");
else
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"datlocprovider, daticulocale, ");

With the simple patch
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87
100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)
snprintf(query, sizeof(query),
"SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
-	if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
+	if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
snprintf(query + strlen(query), sizeof(query) - strlen(query),
"'c' AS datlocprovider, NULL AS daticulocale, ");
else
I got the expected error during the upgrade:

locale providers for database "template1" do not match: old "libc", new
"icu"
Failure, exiting

Good catch. There's unfortunately not a lot of regression tests for
ICU-initialized clusters. I'm wondering if the build-farm client could be
taught about the locale provider rather than assuming libc. Clearly that
wouldn't have caught this issue, but it should still increase the coverage a
bit (I'm thinking of the recent problem with the abbreviated keys).

1.2) It looks like the mentioned asserion in dbcommands.c conflicts with the
following lines earlier:

if (dbiculocale == NULL)
dbiculocale = src_iculocale;

The following patch works for me:
diff --git a/src/backend/commands/dbcommands.c
b/src/backend/commands/dbcommands.c
index b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("ICU locale must be specified")));
}
+	else
+		dbiculocale = NULL;
if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);

I think it would be better to do that in the various variable initialization.
Maybe switch the dblocprovider and dbiculocale initialization, and only
initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU.

2) CREATE DATABASE does not always require the icu locale unlike initdb and
createdb:

$ createdb mydb --locale en_US.UTF-8 --template template0 --locale-provider
icu
createdb: error: database creation failed: ERROR: ICU locale must be
specified

$ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE template0
LOCALE_PROVIDER icu" postgres
CREATE DATABASE

I'm wondering if this is not a fully-supported feature (because createdb
creates an SQL command with LC_COLLATE and LC_CTYPE options instead of
LOCALE option) or is it a bug in CREATE DATABASE?.. From
src/backend/commands/dbcommands.c:

if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
{
if (dlocale && dlocale->arg)
dbiculocale = defGetString(dlocale);
}

This discrepancy between createdb and CREATE DATABASE looks like an oversight,
as createdb indeed interprets --locale as:

if (locale)
{
if (lc_ctype)
pg_fatal("only one of --locale and --lc-ctype can be specified");
if (lc_collate)
pg_fatal("only one of --locale and --lc-collate can be specified");
lc_ctype = locale;
lc_collate = locale;
}

AFAIK the fallback in the CREATE DATABASE case is expected as POSIX locale
names should be accepted by icu, so this should work for createdb too.

#74

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Julien Rouhaud (#73)

1 attachment(s)

Re: ICU for global collation

On 2022-08-17 19:53, Julien Rouhaud wrote:

Good catch. There's unfortunately not a lot of regression tests for
ICU-initialized clusters. I'm wondering if the build-farm client could
be
taught about the locale provider rather than assuming libc. Clearly
that
wouldn't have caught this issue, but it should still increase the
coverage a
bit (I'm thinking of the recent problem with the abbreviated keys).

Looking at installchecks with different locales (e.g. see [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2022-08-18%2006%3A25%3A17 with
sv_SE.UTF-8) - why not?..

diff --git a/src/backend/commands/dbcommands.c
b/src/backend/commands/dbcommands.c
index 
b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt 
*stmt)
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("ICU locale must be specified")));
}
+	else
+		dbiculocale = NULL;
if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);
I think it would be better to do that in the various variable
initialization.
Maybe switch the dblocprovider and dbiculocale initialization, and only
initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU.

diff --git a/src/backend/commands/dbcommands.c 
b/src/backend/commands/dbcommands.c
index 
b31a30550b025d48ba3cc250dc4c15f41f9a80be..883f381f3453142790f728a3725586cebe2e2f20 
100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1012,10 +1012,10 @@ createdb(ParseState *pstate, const CreatedbStmt 
*stmt)
  		dbcollate = src_collate;
  	if (dbctype == NULL)
  		dbctype = src_ctype;
-	if (dbiculocale == NULL)
-		dbiculocale = src_iculocale;
  	if (dblocprovider == '\0')
  		dblocprovider = src_locprovider;
+	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
+		dbiculocale = src_iculocale;

/* Some encodings are client only */
if (!PG_VALID_BE_ENCODING(encoding))

Then it seemed to me that it was easier to first get all the parameters
from the template database as usual and then process them as needed. But
with your suggestion the failed assertion will check the code above more
accurately...

This discrepancy between createdb and CREATE DATABASE looks like an
oversight,
as createdb indeed interprets --locale as:

if (locale)
{
if (lc_ctype)
pg_fatal("only one of --locale and --lc-ctype can be specified");
if (lc_collate)
pg_fatal("only one of --locale and --lc-collate can be specified");
lc_ctype = locale;
lc_collate = locale;
}

AFAIK the fallback in the CREATE DATABASE case is expected as POSIX
locale
names should be accepted by icu, so this should work for createdb too.

Oh, great, thanks!

I spent some time looking at the ICU api trying to figure out if using a
posix locale name (e.g. en_US) was actually compatible with an ICU locale name.
It seems that ICU accept any of 'en-us', 'en-US', 'en_us' or 'en_US' as the
same locale, but I might be wrong. I also didn't find a way to figure out how
to ask ICU if the locale identifier passed is complete garbage or not. One
sure thing is that the system collation we import are of the form 'en-us', so
it seems weird to have this form in pg_collation and by default another form in
pg_database.

Yeah it seems to be inconsistent about that. The locale ID documentation
appears to indicate that "en_US" is the canonical form, but when you ask it
to list all the locales it knows about it returns "en-US".

Yeah I saw that too when checking is POSIX locale names were valid, and
that's
not great.

I'm sorry but IIUC pg_import_system_collations uses uloc_getAvailable to
get the locale ID and then specifically calls uloc_toLanguageTag?..

I don't think that initdb --collation-provider icu should be allowed
without
--icu-locale, same for --collation-provider libc *with* --icu-locale.

initdb has some specific processing to transform the default libc locale to
something more appropriate, but as far as I can see creatdb / CREATE DATABASE
aren't doing that. It seems inconsistent, and IMHO another reason why
defaulting to the libc locale looks like a bad idea.

This has all been removed. The separate ICU locale option should now
be
required everywhere (initdb, createdb, CREATE DATABASE).

If it's a feature and not a bug in CREATE DATABASE, why should not it
work in initdb too? Here we define locale/lc_collate/lc_ctype for the
first 3 databases in the cluster in much the same way...

P.S. FYI there seems to be a bug for very old ICU versions: in master
(92fce4e1eda9b24d73f583fbe9b58f4e03f097a4):

$ initdb -D data &&
pg_ctl -D data -l logfile start &&
psql -c "CREATE DATABASE mydb LOCALE \"C.UTF-8\" LOCALE_PROVIDER icu
TEMPLATE template0" postgres &&
psql -c "SELECT 1" mydb

WARNING: database "mydb" has a collation version mismatch
DETAIL: The database was created using collation version 49.192.0.42,
but the operating system provides version 49.192.5.42.
HINT: Rebuild all objects in this database that use the default
collation and run ALTER DATABASE mydb REFRESH COLLATION VERSION, or
build PostgreSQL with the right library version.

See the additional output (diff_log_icu_collator_locale.patch) in the
logfile:

2022-08-20 11:38:30.162 MSK [136546] LOG: check_icu_locale
uloc_getDefault() en_US
2022-08-20 11:38:30.162 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.163 MSK [136546] LOG: check_icu_locale icu_locale
C.UTF-8 valid_locale en_US version 49.192.0.42
2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version
uloc_getDefault() en_US
2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version
icu_locale C.UTF-8 valid_locale en_US version 49.192.0.42
2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb
LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0
2022-08-20 11:38:30.224 MSK [136548] LOG: make_icu_collator
uloc_getDefault() c
2022-08-20 11:38:30.225 MSK [136548] LOG: make_icu_collator icu_locale
C.UTF-8 valid_locale root version 49.192.5.42
2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version
uloc_getDefault() c
2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version
icu_locale C.UTF-8 valid_locale root version 49.192.5.42
2022-08-20 11:38:30.225 MSK [136548] WARNING: database "mydb" has a
collation version mismatch
2022-08-20 11:38:30.225 MSK [136548] DETAIL: The database was created
using collation version 49.192.0.42, but the operating system provides
version 49.192.5.42.
2022-08-20 11:38:30.225 MSK [136548] HINT: Rebuild all objects in this
database that use the default collation and run ALTER DATABASE mydb
REFRESH COLLATION VERSION, or build PostgreSQL with the right library
version.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2022-08-18%2006%3A25%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2022-08-18%2006%3A25%3A17

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

diff_log_icu_collator_locale.patchtext/x-diff; name=diff_log_icu_collator_locale.patchDownload

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 1a047a97d74e7fe5b2cad22cd54cf30f7129bf84..21cfbfffdd1b58df821adbdd577fb599bc2747cf 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1406,6 +1406,14 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
+	const char *default_icu_locale;
+	UVersionInfo versioninfo;
+	char		buf[U_MAX_VERSION_STRING_LENGTH];
+	const char *valid_locale;
+
+	default_icu_locale = uloc_getDefault();
+	elog(LOG, "make_icu_collator uloc_getDefault() %s",
+		 default_icu_locale ? default_icu_locale : "(null)");
 
 	status = U_ZERO_ERROR;
 	collator = ucol_open(iculocstr, &status);
@@ -1417,6 +1425,20 @@ make_icu_collator(const char *iculocstr,
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, iculocstr);
 
+	status = U_ZERO_ERROR;
+	valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, &status);
+	if (U_FAILURE(status) || valid_locale == NULL)
+		ereport(ERROR,
+				(errmsg("failed to get valid locale for collator with "
+						"requested locale \"%s\": %s",
+						iculocstr, u_errorName(status))));
+
+	ucol_getVersion(collator, versioninfo);
+	u_versionToString(versioninfo, buf);
+
+	elog(LOG, "make_icu_collator icu_locale %s valid_locale %s version %s",
+		 iculocstr ? iculocstr : "(null)", valid_locale, buf);
+
 	/* We will leak this string if the caller errors later :-( */
 	resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr);
 	resultp->info.icu.ucol = collator;
@@ -1654,6 +1676,12 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 		UErrorCode	status;
 		UVersionInfo versioninfo;
 		char		buf[U_MAX_VERSION_STRING_LENGTH];
+		const char *default_icu_locale;
+		const char *valid_locale;
+
+		default_icu_locale = uloc_getDefault();
+		elog(LOG, "get_collation_actual_version uloc_getDefault() %s",
+			 default_icu_locale ? default_icu_locale : "(null)");
 
 		status = U_ZERO_ERROR;
 		collator = ucol_open(collcollate, &status);
@@ -1661,10 +1689,25 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 			ereport(ERROR,
 					(errmsg("could not open collator for locale \"%s\": %s",
 							collcollate, u_errorName(status))));
-		ucol_getVersion(collator, versioninfo);
-		ucol_close(collator);
 
+		status = U_ZERO_ERROR;
+		valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE,
+											&status);
+		if (U_FAILURE(status) || valid_locale == NULL)
+			ereport(ERROR,
+					(errmsg("failed to get valid locale for collator with "
+							"requested locale \"%s\": %s",
+							collcollate, u_errorName(status))));
+
+		ucol_getVersion(collator, versioninfo);
 		u_versionToString(versioninfo, buf);
+
+		elog(LOG,
+			 "get_collation_actual_version icu_locale %s valid_locale %s "
+			 "version %s",
+			 collcollate ? collcollate : "(null)", valid_locale, buf);
+
+		ucol_close(collator);
 		collversion = pstrdup(buf);
 	}
 	else
@@ -1955,6 +1998,14 @@ check_icu_locale(const char *icu_locale)
 #ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
+	const char *default_icu_locale;
+	UVersionInfo versioninfo;
+	char		buf[U_MAX_VERSION_STRING_LENGTH];
+	const char *valid_locale;
+
+	default_icu_locale = uloc_getDefault();
+	elog(LOG, "check_icu_locale uloc_getDefault() %s",
+		 default_icu_locale ? default_icu_locale : "(null)");
 
 	status = U_ZERO_ERROR;
 	collator = ucol_open(icu_locale, &status);
@@ -1965,6 +2016,21 @@ check_icu_locale(const char *icu_locale)
 
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
+
+	status = U_ZERO_ERROR;
+	valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, &status);
+	if (U_FAILURE(status) || valid_locale == NULL)
+		ereport(ERROR,
+				(errmsg("failed to get valid locale for collator with "
+						"requested locale \"%s\": %s",
+						icu_locale, u_errorName(status))));
+
+	ucol_getVersion(collator, versioninfo);
+	u_versionToString(versioninfo, buf);
+
+	elog(LOG, "check_icu_locale icu_locale %s valid_locale %s version %s",
+		 icu_locale ? icu_locale : "(null)", valid_locale, buf);
+
 	ucol_close(collator);
 #else
 	ereport(ERROR,

#75

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#72)

Re: ICU for global collation

On 15.08.22 14:06, Marina Polyakova wrote:

1.1) It looks like there's a bug in the function get_db_infos
(src/bin/pg_upgrade/info.c), where the version of the old cluster is
always checked:

if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
    snprintf(query + strlen(query), sizeof(query) - strlen(query),
             "'c' AS datlocprovider, NULL AS daticulocale, ");
else
    snprintf(query + strlen(query), sizeof(query) - strlen(query),
             "datlocprovider, daticulocale, ");

With the simple patch
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 
df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 
100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)
     snprintf(query, sizeof(query),
              "SELECT d.oid, d.datname, d.encoding, d.datcollate, 
d.datctype, ");
-    if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
+    if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
         snprintf(query + strlen(query), sizeof(query) - strlen(query),
                  "'c' AS datlocprovider, NULL AS daticulocale, ");
     else

fixed

1.2) It looks like the mentioned asserion in dbcommands.c conflicts with
the following lines earlier:

if (dbiculocale == NULL)
dbiculocale = src_iculocale;

fixed

I'm wondering if this is not a fully-supported feature (because createdb
creates an SQL command with LC_COLLATE and LC_CTYPE options instead of
LOCALE option) or is it a bug in CREATE DATABASE?.. From
src/backend/commands/dbcommands.c:

if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
{
if (dlocale && dlocale->arg)
dbiculocale = defGetString(dlocale);
}

I think this piece of code was left over from some earlier attempts to
specify the libc locale and the icu locale with one option, which never
really worked well. The CREATE DATABASE man page does not mention that
LOCALE provides the default for ICU_LOCALE. Hence, I think we should
delete this.

#76

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#75)

Re: ICU for global collation

On 2022-08-22 17:10, Peter Eisentraut wrote:

On 15.08.22 14:06, Marina Polyakova wrote:
1.1) It looks like there's a bug in the function get_db_infos
(src/bin/pg_upgrade/info.c), where the version of the old cluster is
always checked:

if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
    snprintf(query + strlen(query), sizeof(query) - strlen(query),
             "'c' AS datlocprovider, NULL AS daticulocale, ");
else
    snprintf(query + strlen(query), sizeof(query) - strlen(query),
             "datlocprovider, daticulocale, ");

With the simple patch
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 
df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 
100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)
     snprintf(query, sizeof(query),
              "SELECT d.oid, d.datname, d.encoding, d.datcollate, 
d.datctype, ");
-    if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
+    if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
         snprintf(query + strlen(query), sizeof(query) - 
strlen(query),
                  "'c' AS datlocprovider, NULL AS daticulocale, ");
     else
fixed

1.2) It looks like the mentioned asserion in dbcommands.c conflicts
with the following lines earlier:

if (dbiculocale == NULL)
    dbiculocale = src_iculocale;

fixed

I'm wondering if this is not a fully-supported feature (because
createdb creates an SQL command with LC_COLLATE and LC_CTYPE options
instead of LOCALE option) or is it a bug in CREATE DATABASE?.. From
src/backend/commands/dbcommands.c:

if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
{
    if (dlocale && dlocale->arg)
        dbiculocale = defGetString(dlocale);
}

I think this piece of code was left over from some earlier attempts to
specify the libc locale and the icu locale with one option, which
never really worked well. The CREATE DATABASE man page does not
mention that LOCALE provides the default for ICU_LOCALE. Hence, I
think we should delete this.

Thank you!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#77

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Peter Eisentraut (#75)

Re: ICU for global collation

On Mon, Aug 22, 2022 at 04:10:59PM +0200, Peter Eisentraut wrote:

I think this piece of code was left over from some earlier attempts to
specify the libc locale and the icu locale with one option, which never
really worked well. The CREATE DATABASE man page does not mention that
LOCALE provides the default for ICU_LOCALE. Hence, I think we should delete
this.

As of 36f729e, is there anything left to address on this thread or
should this open item be closed?
--
Michael

#78

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Michael Paquier (#77)

2 attachment(s)

Re: ICU for global collation

My colleague Andrew Bille found another bug in master
(b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE
(2c63b0930aee1bb5c265fad4a65c9d0b62b1f9da): pg_collation.colliculocale
is not dumped. See check_icu_locale.sh:

In the old cluster:
SELECT collname, colliculocale FROM pg_collation WHERE collname =
'testcoll_backwards'
collname | colliculocale
--------------------+-------------------
testcoll_backwards | @colBackwards=yes
(1 row)

In the new cluster:
SELECT collname, colliculocale FROM pg_collation WHERE collname =
'testcoll_backwards'
collname | colliculocale
--------------------+---------------
testcoll_backwards |
(1 row)

diff_dump_colliculocale.patch works for me.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

check_icu_locale.shtext/plain; name=check_icu_locale.shDownload

diff_dump_colliculocale.patchtext/x-diff; name=diff_dump_colliculocale.patchDownload

diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile
index 2f524b09bf53a55037013d78148f8cbca4fa7eee..9dc5a784dd2d1ce58a6284f19c54730364779c4d 100644
--- a/src/bin/pg_dump/Makefile
+++ b/src/bin/pg_dump/Makefile
@@ -17,6 +17,7 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 export GZIP_PROGRAM=$(GZIP)
+export with_icu
 
 override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2c6891573296b395c5bd6b627d061de657355e9c..e8e8f69e30c8a787e5dff157f5d1619917638d7d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -13084,9 +13084,11 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	int			i_collisdeterministic;
 	int			i_collcollate;
 	int			i_collctype;
+	int			i_colliculocale;
 	const char *collprovider;
 	const char *collcollate;
 	const char *collctype;
+	const char *colliculocale;
 
 	/* Do nothing in data-only dump */
 	if (dopt->dataOnly)
@@ -13117,6 +13119,13 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 		appendPQExpBufferStr(query,
 							 "true AS collisdeterministic, ");
 
+	if (fout->remoteVersion >= 150000)
+		appendPQExpBufferStr(query,
+							 "colliculocale, ");
+	else
+		appendPQExpBufferStr(query,
+							 "NULL AS colliculocale, ");
+
 	appendPQExpBuffer(query,
 					  "collcollate, "
 					  "collctype "
@@ -13130,10 +13139,24 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	i_collisdeterministic = PQfnumber(res, "collisdeterministic");
 	i_collcollate = PQfnumber(res, "collcollate");
 	i_collctype = PQfnumber(res, "collctype");
+	i_colliculocale = PQfnumber(res, "colliculocale");
 
 	collprovider = PQgetvalue(res, 0, i_collprovider);
-	collcollate = PQgetvalue(res, 0, i_collcollate);
-	collctype = PQgetvalue(res, 0, i_collctype);
+
+	if (!PQgetisnull(res, 0, i_collcollate))
+		collcollate = PQgetvalue(res, 0, i_collcollate);
+	else
+		collcollate = NULL;
+
+	if (!PQgetisnull(res, 0, i_collctype))
+		collctype = PQgetvalue(res, 0, i_collctype);
+	else
+		collctype = NULL;
+
+	if (!PQgetisnull(res, 0, i_colliculocale))
+		colliculocale = PQgetvalue(res, 0, i_colliculocale);
+	else
+		colliculocale = NULL;
 
 	appendPQExpBuffer(delq, "DROP COLLATION %s;\n",
 					  fmtQualifiedDumpable(collinfo));
@@ -13156,17 +13179,28 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	if (strcmp(PQgetvalue(res, 0, i_collisdeterministic), "f") == 0)
 		appendPQExpBufferStr(q, ", deterministic = false");
 
-	if (strcmp(collcollate, collctype) == 0)
+	if (colliculocale != NULL)
 	{
 		appendPQExpBufferStr(q, ", locale = ");
-		appendStringLiteralAH(q, collcollate, fout);
+		appendStringLiteralAH(q, colliculocale, fout);
 	}
 	else
 	{
-		appendPQExpBufferStr(q, ", lc_collate = ");
-		appendStringLiteralAH(q, collcollate, fout);
-		appendPQExpBufferStr(q, ", lc_ctype = ");
-		appendStringLiteralAH(q, collctype, fout);
+		Assert(collcollate != NULL);
+		Assert(collctype != NULL);
+
+		if (strcmp(collcollate, collctype) == 0)
+		{
+			appendPQExpBufferStr(q, ", locale = ");
+			appendStringLiteralAH(q, collcollate, fout);
+		}
+		else
+		{
+			appendPQExpBufferStr(q, ", lc_collate = ");
+			appendStringLiteralAH(q, collcollate, fout);
+			appendPQExpBufferStr(q, ", lc_ctype = ");
+			appendStringLiteralAH(q, collctype, fout);
+		}
 	}
 
 	/*
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index b10e1c4c0d4d27bf1a427c8cfe36f8dea4dad78a..7f4be26bce6fb37b83e4735c541a2215841f5e95 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -1593,6 +1593,15 @@ my %tests = (
 		like      => { %full_runs, section_pre_data => 1, },
 	},
 
+	'CREATE COLLATION icu_collation' => {
+		create_order => 76,
+		create_sql   => "CREATE COLLATION icu_collation (PROVIDER = icu, LOCALE = 'C');",
+		regexp =>
+		  qr/CREATE COLLATION public.icu_collation \(provider = icu, locale = 'C'(, version = '[^']*')?\);/m,
+		icu_collation => 1,
+		like      => { %full_runs, section_pre_data => 1, },
+	},
+
 	'CREATE CAST FOR timestamptz' => {
 		create_order => 51,
 		create_sql =>
@@ -3931,6 +3940,11 @@ foreach my $test (
 		$test_db = $tests{$test}->{database};
 	}
 
+	if (defined($tests{$test}->{icu_collation}))
+	{
+		$tests{$test}->{collation} = 1;
+	}
+
 	if ($tests{$test}->{create_sql})
 	{
 
@@ -3940,6 +3954,12 @@ foreach my $test (
 			next;
 		}
 
+		# Skip any icu-related collation commands if build was without icu
+		if ($ENV{with_icu} ne 'yes' && defined($tests{$test}->{icu_collation}))
+		{
+			next;
+		}
+
 		# Skip tests specific to LZ4 if this build does not support
 		# this option.
 		if (!$supports_lz4 && defined($tests{$test}->{lz4}))
@@ -4141,6 +4161,12 @@ foreach my $run (sort keys %pgdump_runs)
 			next;
 		}
 
+		# Skip any icu-related collation commands if build was without icu
+		if ($ENV{with_icu} ne 'yes' && defined($tests{$test}->{icu_collation}))
+		{
+			next;
+		}
+
 		# Skip tests specific to LZ4 if this build does not support
 		# this option.
 		if (!$supports_lz4 && defined($tests{$test}->{lz4}))

#79

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Marina Polyakova (#78)

Re: ICU for global collation

On Tue, Aug 23, 2022 at 08:59:02PM +0300, Marina Polyakova wrote:

My colleague Andrew Bille found another bug in master
(b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE
(2c63b0930aee1bb5c265fad4a65c9d0b62b1f9da): pg_collation.colliculocale is
not dumped. See check_icu_locale.sh:

In the old cluster:
SELECT collname, colliculocale FROM pg_collation WHERE collname =
'testcoll_backwards'
collname | colliculocale
--------------------+-------------------
testcoll_backwards | @colBackwards=yes
(1 row)

In the new cluster:
SELECT collname, colliculocale FROM pg_collation WHERE collname =
'testcoll_backwards'
collname | colliculocale
--------------------+---------------
testcoll_backwards |
(1 row)

diff_dump_colliculocale.patch works for me.

Ugh. Good catch, again! I have not tested the patch in details but
this looks rather sane to me on a quick read. Peter?
--
Michael

#80

Julien Rouhaud

rjuju123@gmail.com

over 3 years ago

In reply to: Michael Paquier (#79)

Re: ICU for global collation

On Wed, Aug 24, 2022 at 01:38:44PM +0900, Michael Paquier wrote:

On Tue, Aug 23, 2022 at 08:59:02PM +0300, Marina Polyakova wrote:

My colleague Andrew Bille found another bug in master
(b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE
(2c63b0930aee1bb5c265fad4a65c9d0b62b1f9da): pg_collation.colliculocale is
not dumped. See check_icu_locale.sh:

In the old cluster:
SELECT collname, colliculocale FROM pg_collation WHERE collname =
'testcoll_backwards'
collname | colliculocale
--------------------+-------------------
testcoll_backwards | @colBackwards=yes
(1 row)

In the new cluster:
SELECT collname, colliculocale FROM pg_collation WHERE collname =
'testcoll_backwards'
collname | colliculocale
--------------------+---------------
testcoll_backwards |
(1 row)

diff_dump_colliculocale.patch works for me.

Ugh. Good catch, again!

I have not tested the patch in details but
this looks rather sane to me on a quick read. Peter?

Patch looks good to me too.

#81

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Julien Rouhaud (#80)

Re: ICU for global collation

On 24.08.22 10:59, Julien Rouhaud wrote:

I have not tested the patch in details but
this looks rather sane to me on a quick read. Peter?

Patch looks good to me too.

Committed, thanks.

(This should conclude all the issues discussed in this thread recently.)

#82

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Peter Eisentraut (#81)

Re: ICU for global collation

On Wed, Aug 24, 2022 at 08:28:24PM +0200, Peter Eisentraut wrote:

Committed, thanks.

(This should conclude all the issues discussed in this thread recently.)

Please note that this open item was still listed as open. I have
closed it now.
--
Michael

#83

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Michael Paquier (#82)

1 attachment(s)

Re: ICU for global collation

Hello!

IMO after adding ICU for global collations [1]https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2 the behaviour of createdb
and CREATE DATABASE is a bit inconsistent when both locale and
lc_collate (or locale and lc_ctype) options are used:

$ createdb mydb --locale C --lc-collate C --template template0
createdb: error: only one of --locale and --lc-collate can be specified
$ psql -c "create database mydb locale = 'C' lc_collate = 'C' template =
'template0'" postgres
CREATE DATABASE

From the CREATE DATABASE documentation [2]https://www.postgresql.org/docs/devel/sql-createdatabase.html:

locale
This is a shortcut for setting LC_COLLATE and LC_CTYPE at once. If you
specify this, you cannot specify either of those parameters.

The patch diff_return_back_create_database_error.patch returns back the
removed code for CREATE DATABASE so it behaves like createdb as
before...

[1]: https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2
https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2
[2]: https://www.postgresql.org/docs/devel/sql-createdatabase.html

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

diff_return_back_create_database_error.patchtext/x-diff; name=diff_return_back_create_database_error.patchDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6ff48bb18f3639ae45d9528b32df51a4aebc60c0..0a22cace11d9df6b5fc085bfd7b86319f4b13165 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -851,6 +851,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 parser_errposition(pstate, defel->location)));
 	}
 
+	if (dlocale && (dcollate || dctype))
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("conflicting or redundant options"),
+				 errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.")));
+
 	if (downer && downer->arg)
 		dbowner = defGetString(downer);
 	if (dtemplate && dtemplate->arg)

#84

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Marina Polyakova (#83)

Re: ICU for global collation

In pg14:
|postgres=# create database a LC_COLLATE C LC_CTYPE C LOCALE C;
|ERROR: conflicting or redundant options
|DETAIL: LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.

In pg15:
|postgres=# create database a LC_COLLATE "en_US.UTF-8" LC_CTYPE "en_US.UTF-8" LOCALE "en_US.UTF-8" ;
|CREATE DATABASE

f2553d430 actually relaxed the restriction by removing this check:

- if (dlocale && (dcollate || dctype))
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("conflicting or redundant options"),
- errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.")));

But isn't the right fix to do the corresponding thing in createdb
(relaxing the frontend restriction rather than reverting its relaxation
in the backend).

diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index e523e58b218..5b80e56dfd9 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -159,15 +159,10 @@ main(int argc, char *argv[])
 			exit(1);
 	}

-	if (locale)
-	{
-		if (lc_ctype)
-			pg_fatal("only one of --locale and --lc-ctype can be specified");
-		if (lc_collate)
-			pg_fatal("only one of --locale and --lc-collate can be specified");
+	if (locale && !lc_ctype)
 		lc_ctype = locale;
+	if (locale && !lc_collate)
 		lc_collate = locale;
-	}

if (encoding)
{

BTW it's somewhat crummy that it uses a string comparison, so if you
write "UTF8" without a dash, it says this; it took me a few minutes to
see the difference...

postgres=# create database a LC_COLLATE "en_US.UTF8" LC_CTYPE "en_US.UTF8" LOCALE "en_US.UTF8";
ERROR: new collation (en_US.UTF8) is incompatible with the collation of the template database (en_US.UTF-8)

#85

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Justin Pryzby (#84)

Re: ICU for global collation

On 2022-09-09 19:46, Justin Pryzby wrote:

In pg14:
|postgres=# create database a LC_COLLATE C LC_CTYPE C LOCALE C;
|ERROR: conflicting or redundant options
|DETAIL: LOCALE cannot be specified together with LC_COLLATE or
LC_CTYPE.

In pg15:
|postgres=# create database a LC_COLLATE "en_US.UTF-8" LC_CTYPE
"en_US.UTF-8" LOCALE "en_US.UTF-8" ;
|CREATE DATABASE

f2553d430 actually relaxed the restriction by removing this check:

- if (dlocale && (dcollate || dctype))
- ereport(ERROR,
- (errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("conflicting or redundant
options"),
- errdetail("LOCALE cannot be specified
together with LC_COLLATE or LC_CTYPE.")));

But isn't the right fix to do the corresponding thing in createdb
(relaxing the frontend restriction rather than reverting its relaxation
in the backend).
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index e523e58b218..5b80e56dfd9 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -159,15 +159,10 @@ main(int argc, char *argv[])
exit(1);
}
-	if (locale)
-	{
-		if (lc_ctype)
-			pg_fatal("only one of --locale and --lc-ctype can be specified");
-		if (lc_collate)
-			pg_fatal("only one of --locale and --lc-collate can be specified");
+	if (locale && !lc_ctype)
lc_ctype = locale;
+	if (locale && !lc_collate)
lc_collate = locale;
-	}
if (encoding)
{

I agree with you that it is more comfortable and more similar to what
has already been done in initdb. IMO it would be easier to do it like
this:

diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 
e523e58b2189275dc603a06324a2f28b0f49d8b7..a1482df3d981a680dd3322052e7c03ddacc8dc26 
100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -161,12 +161,10 @@ main(int argc, char *argv[])

  	if (locale)
  	{
-		if (lc_ctype)
-			pg_fatal("only one of --locale and --lc-ctype can be specified");
-		if (lc_collate)
-			pg_fatal("only one of --locale and --lc-collate can be specified");
-		lc_ctype = locale;
-		lc_collate = locale;
+		if (!lc_ctype)
+			lc_ctype = locale;
+		if (!lc_collate)
+			lc_collate = locale;
  	}

if (encoding)

Should we change the behaviour of createdb and CREATE DATABASE in
previous major versions?..

BTW it's somewhat crummy that it uses a string comparison, so if you
write "UTF8" without a dash, it says this; it took me a few minutes to
see the difference...

postgres=# create database a LC_COLLATE "en_US.UTF8" LC_CTYPE
"en_US.UTF8" LOCALE "en_US.UTF8";
ERROR: new collation (en_US.UTF8) is incompatible with the collation
of the template database (en_US.UTF-8)

Perhaps we could check the locale itself with the function
normalize_libc_locale_name (collationcmds.c). But ISTM that the current
check is a safety net in case the function pg_get_encoding_from_locale
(chklocale.c) returns -1 or PG_SQL_ASCII...

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#86

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#85)

Re: ICU for global collation

On 13.09.22 07:34, Marina Polyakova wrote:

I agree with you that it is more comfortable and more similar to what
has already been done in initdb. IMO it would be easier to do it like this:

diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 
e523e58b2189275dc603a06324a2f28b0f49d8b7..a1482df3d981a680dd3322052e7c03ddacc8dc26 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -161,12 +161,10 @@ main(int argc, char *argv[])

     if (locale)
     {
-        if (lc_ctype)
-            pg_fatal("only one of --locale and --lc-ctype can be 
specified");
-        if (lc_collate)
-            pg_fatal("only one of --locale and --lc-collate can be 
specified");
-        lc_ctype = locale;
-        lc_collate = locale;
+        if (!lc_ctype)
+            lc_ctype = locale;
+        if (!lc_collate)
+            lc_collate = locale;
     }

if (encoding)

done that way

Should we change the behaviour of createdb and CREATE DATABASE in
previous major versions?..

I don't see a need for that.

BTW it's somewhat crummy that it uses a string comparison, so if you
write "UTF8" without a dash, it says this; it took me a few minutes to
see the difference...

postgres=# create database a LC_COLLATE "en_US.UTF8" LC_CTYPE
"en_US.UTF8" LOCALE "en_US.UTF8";
ERROR: new collation (en_US.UTF8) is incompatible with the collation
of the template database (en_US.UTF-8)

Perhaps we could check the locale itself with the function
normalize_libc_locale_name (collationcmds.c). But ISTM that the current
check is a safety net in case the function pg_get_encoding_from_locale
(chklocale.c) returns -1 or PG_SQL_ASCII...

This is not new behavior in PG15, is it?

#87

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#86)

Re: ICU for global collation

On 2022-09-13 15:41, Peter Eisentraut wrote:

On 13.09.22 07:34, Marina Polyakova wrote:

I agree with you that it is more comfortable and more similar to what
has already been done in initdb. IMO it would be easier to do it like
this:

diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 
e523e58b2189275dc603a06324a2f28b0f49d8b7..a1482df3d981a680dd3322052e7c03ddacc8dc26 
100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -161,12 +161,10 @@ main(int argc, char *argv[])

     if (locale)
     {
-        if (lc_ctype)
-            pg_fatal("only one of --locale and --lc-ctype can be 
specified");
-        if (lc_collate)
-            pg_fatal("only one of --locale and --lc-collate can be 
specified");
-        lc_ctype = locale;
-        lc_collate = locale;
+        if (!lc_ctype)
+            lc_ctype = locale;
+        if (!lc_collate)
+            lc_collate = locale;
     }

if (encoding)

done that way

Thank you!

BTW it's somewhat crummy that it uses a string comparison, so if you
write "UTF8" without a dash, it says this; it took me a few minutes
to
see the difference...

postgres=# create database a LC_COLLATE "en_US.UTF8" LC_CTYPE
"en_US.UTF8" LOCALE "en_US.UTF8";
ERROR: new collation (en_US.UTF8) is incompatible with the collation
of the template database (en_US.UTF-8)

Perhaps we could check the locale itself with the function
normalize_libc_locale_name (collationcmds.c). But ISTM that the
current check is a safety net in case the function
pg_get_encoding_from_locale (chklocale.c) returns -1 or
PG_SQL_ASCII...

This is not new behavior in PG15, is it?

No, it has always existed [1]https://github.com/postgres/postgres/commit/61d967498802ab86d8897cb3c61740d7e9d712f6 AFAICS..

[1]: https://github.com/postgres/postgres/commit/61d967498802ab86d8897cb3c61740d7e9d712f6
https://github.com/postgres/postgres/commit/61d967498802ab86d8897cb3c61740d7e9d712f6

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#88

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Marina Polyakova (#87)

1 attachment(s)

Re: ICU for global collation

Hello!

I was surprised that it is allowed to create clusters/databases where
the default ICU collations do not actually work due to unsupported
encodings:

$ initdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US
-D data &&
pg_ctl -D data -l logfile start &&
psql -c "SELECT 'a' < 'b'" template1
...
waiting for server to start.... done
server started
ERROR: encoding "SQL_ASCII" not supported by ICU

$ createdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US
--template template0 mydb &&
psql -c "SELECT 'a' < 'b'" mydb
ERROR: encoding "SQL_ASCII" not supported by ICU

The patch diff_check_icu_encoding.patch prohibits the creation of such
objects...

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

diff_check_icu_encoding.patchtext/x-diff; name=diff_check_icu_encoding.patchDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6ff48bb18f3639ae45d9528b32df51a4aebc60c0..07758d15e8613d5a049537ddf2c5992e57ad6424 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1042,11 +1042,16 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
-	}
 
-	if (dblocprovider == COLLPROVIDER_ICU)
 		check_icu_locale(dbiculocale);
 
+		if (!(is_encoding_supported_by_icu(encoding)))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("encoding \"%s\" is not supported with ICU provider",
+							pg_encoding_to_char(encoding))));
+	}
+
 	/*
 	 * Check that the new encoding and locale settings match the source
 	 * database.  We insist on this because we simply copy the source data ---
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index e00837ecacf4885cf2a176168c283f3e67c6eb53..8a762ced8340c9d8256f7832a4c19a43f1d5538a 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2362,6 +2362,16 @@ setup_locale_encoding(void)
 	if (!check_locale_encoding(lc_ctype, encodingid) ||
 		!check_locale_encoding(lc_collate, encodingid))
 		exit(1);				/* check_locale_encoding printed the error */
+
+	if (locale_provider == COLLPROVIDER_ICU &&
+		!(is_encoding_supported_by_icu(encodingid)))
+	{
+		pg_log_error("encoding \"%s\" is not supported with ICU provider",
+					 pg_encoding_to_char(encodingid));
+		pg_log_error_hint("Rerun %s and choose a matching combination.",
+						  progname);
+		exit(1);
+	}
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index a37f6dd9b334b6ee22d9fdd4d51422795cb54a39..e4bb3d0cdd9c23729c5fb97886374f8df558f239 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -118,6 +118,15 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		qr/FATAL:  could not open collator for locale/,
 		'fails for invalid ICU locale');
+
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=en',
+			'--encoding=SQL_ASCII',  "$tempdir/dataX"
+		],
+		qr/error: encoding "SQL_ASCII" is not supported with ICU provider/,
+		'encoding "SQL_ASCII" is not supported with ICU provider');
 }
 else
 {
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index e91c1d013d08d8bd1e3a92f2aba958c5c7713ca6..eaab3caa32669ead068719d98bb953c5c6ff5a17 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -50,6 +50,16 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'fails for invalid ICU locale');
 
+	$node->command_fails_like(
+		[
+			'createdb',        '-T',
+			'template0',       '--locale-provider=icu',
+			'--icu-locale=en', '--encoding=SQL_ASCII',
+			'foobarX'
+		],
+		qr/ERROR:  encoding "SQL_ASCII" is not supported with ICU provider/,
+		'encoding "SQL_ASCII" is not supported with ICU provider');
+
 	# additional node, which uses the icu provider
 	my $node2 = PostgreSQL::Test::Cluster->new('icu');
 	$node2->init(extra => ['--locale-provider=icu', '--icu-locale=en']);

#89

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 3 years ago

In reply to: Marina Polyakova (#88)

Re: ICU for global collation

At Wed, 14 Sep 2022 17:19:34 +0300, Marina Polyakova <m.polyakova@postgrespro.ru> wrote in

Hello!

I was surprised that it is allowed to create clusters/databases where
the default ICU collations do not actually work due to unsupported
encodings:

$ initdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US
-D data &&
pg_ctl -D data -l logfile start &&
psql -c "SELECT 'a' < 'b'" template1
...
waiting for server to start.... done
server started
ERROR: encoding "SQL_ASCII" not supported by ICU

Indeed. If I did the following, the direction of the patch looks fine
to me.

If I executed initdb as follows, I would be told to specify
--icu-locale option.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

However, when I reran the command, it complains about incompatible
encoding this time. I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#90

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Kyotaro Horiguchi (#89)

1 attachment(s)

Re: ICU for global collation

On 2022-09-15 09:52, Kyotaro Horiguchi wrote:

If I executed initdb as follows, I would be told to specify
--icu-locale option.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

However, when I reran the command, it complains about incompatible
encoding this time. I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.

regards.

I agree with you. Here's another version of the patch. The
locale/encoding checks and reports in initdb have been reordered,
because now the encoding is set first and only then the ICU locale is
checked.

P.S. While working on the patch, I discovered that UTF8 encoding is
always used for the ICU provider in initdb unless it is explicitly
specified by the user:

if (!encoding && locale_provider == COLLPROVIDER_ICU)
encodingid = PG_UTF8;

IMO this creates additional errors for locales with other encodings:

$ initdb --locale de_DE.iso885915@euro --locale-provider icu
--icu-locale de-DE
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that
the selected locale uses (LATIN9) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

And ICU supports many encodings, see the contents of pg_enc2icu_tbl in
encnames.c...

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v2-diff_check_icu_encoding.patchtext/x-diff; name=v2-diff_check_icu_encoding.patchDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6ff48bb18f3639ae45d9528b32df51a4aebc60c0..f248ad42b77c8c0cf2089963d4357b120914ce20 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1034,6 +1034,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+		if (!(is_encoding_supported_by_icu(encoding)))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("encoding \"%s\" is not supported with ICU provider",
+							pg_encoding_to_char(encoding))));
+
 		/*
 		 * This would happen if template0 uses the libc provider but the new
 		 * database uses icu.
@@ -1042,10 +1048,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
-	}
 
-	if (dblocprovider == COLLPROVIDER_ICU)
 		check_icu_locale(dbiculocale);
+	}
 
 	/*
 	 * Check that the new encoding and locale settings match the source
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 6aeec8d426c52414b827686781c245291f27ed1f..999e7c2bf8dc707063eddf19a5c27eed03d3ca09 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2092,20 +2092,30 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+}
 
-	if (locale_provider == COLLPROVIDER_ICU)
+static void
+check_icu_locale_encoding(void)
+{
+	if (!(is_encoding_supported_by_icu(encodingid)))
 	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
+		pg_log_error("encoding \"%s\" is not supported with ICU provider",
+					 pg_encoding_to_char(encodingid));
+		pg_log_error_hint("Rerun %s and choose a matching combination.",
+						  progname);
+		exit(1);
+	}
 
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
 #ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
+	pg_fatal("ICU is not supported in this build");
 #endif
-	}
 }
 
 /*
@@ -2281,34 +2291,6 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (locale_provider == COLLPROVIDER_LIBC &&
-		strcmp(lc_ctype, lc_collate) == 0 &&
-		strcmp(lc_ctype, lc_time) == 0 &&
-		strcmp(lc_ctype, lc_numeric) == 0 &&
-		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0 &&
-		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
-		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
-	else
-	{
-		printf(_("The database cluster will be initialized with this locale configuration:\n"));
-		printf(_("  provider:    %s\n"), collprovider_name(locale_provider));
-		if (icu_locale)
-			printf(_("  ICU locale:  %s\n"), icu_locale);
-		printf(_("  LC_COLLATE:  %s\n"
-				 "  LC_CTYPE:    %s\n"
-				 "  LC_MESSAGES: %s\n"
-				 "  LC_MONETARY: %s\n"
-				 "  LC_NUMERIC:  %s\n"
-				 "  LC_TIME:     %s\n"),
-			   lc_collate,
-			   lc_ctype,
-			   lc_messages,
-			   lc_monetary,
-			   lc_numeric,
-			   lc_time);
-	}
-
 	if (!encoding && locale_provider == COLLPROVIDER_ICU)
 		encodingid = PG_UTF8;
 	else if (!encoding)
@@ -2350,11 +2332,7 @@ setup_locale_encoding(void)
 #endif
 		}
 		else
-		{
 			encodingid = ctype_enc;
-			printf(_("The default database encoding has accordingly been set to \"%s\".\n"),
-				   pg_encoding_to_char(encodingid));
-		}
 	}
 	else
 		encodingid = get_encoding_id(encoding);
@@ -2362,6 +2340,39 @@ setup_locale_encoding(void)
 	if (!check_locale_encoding(lc_ctype, encodingid) ||
 		!check_locale_encoding(lc_collate, encodingid))
 		exit(1);				/* check_locale_encoding printed the error */
+
+	if (locale_provider == COLLPROVIDER_ICU)
+		check_icu_locale_encoding();
+
+	if (locale_provider == COLLPROVIDER_LIBC &&
+		strcmp(lc_ctype, lc_collate) == 0 &&
+		strcmp(lc_ctype, lc_time) == 0 &&
+		strcmp(lc_ctype, lc_numeric) == 0 &&
+		strcmp(lc_ctype, lc_monetary) == 0 &&
+		strcmp(lc_ctype, lc_messages) == 0)
+		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
+	else
+	{
+		printf(_("The database cluster will be initialized with this locale configuration:\n"));
+		printf(_("  provider:    %s\n"), collprovider_name(locale_provider));
+		if (icu_locale)
+			printf(_("  ICU locale:  %s\n"), icu_locale);
+		printf(_("  LC_COLLATE:  %s\n"
+				 "  LC_CTYPE:    %s\n"
+				 "  LC_MESSAGES: %s\n"
+				 "  LC_MONETARY: %s\n"
+				 "  LC_NUMERIC:  %s\n"
+				 "  LC_TIME:     %s\n"),
+			   lc_collate,
+			   lc_ctype,
+			   lc_messages,
+			   lc_monetary,
+			   lc_numeric,
+			   lc_time);
+	}
+
+	printf(_("The default database encoding will be set to \"%s\".\n"),
+		   pg_encoding_to_char(encodingid));
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index a37f6dd9b334b6ee22d9fdd4d51422795cb54a39..40d6d6305a858ec01587d184c69d4f28d0796d74 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -118,6 +118,15 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		qr/FATAL:  could not open collator for locale/,
 		'fails for invalid ICU locale');
+
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--encoding=SQL_ASCII',
+			"$tempdir/dataX"
+		],
+		qr/error: encoding "SQL_ASCII" is not supported with ICU provider/,
+		'encoding "SQL_ASCII" is not supported with ICU provider');
 }
 else
 {
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index e91c1d013d08d8bd1e3a92f2aba958c5c7713ca6..56c578d3d81b8e61b687a8b508b3b741d4df270a 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -50,6 +50,15 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'fails for invalid ICU locale');
 
+	$node->command_fails_like(
+		[
+			'createdb',             '-T',
+			'template0',            '--locale-provider=icu',
+			'--encoding=SQL_ASCII', 'foobarX'
+		],
+		qr/ERROR:  encoding "SQL_ASCII" is not supported with ICU provider/,
+		'encoding "SQL_ASCII" is not supported with ICU provider');
+
 	# additional node, which uses the icu provider
 	my $node2 = PostgreSQL::Test::Cluster->new('icu');
 	$node2->init(extra => ['--locale-provider=icu', '--icu-locale=en']);

#91

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Marina Polyakova (#88)

Re: ICU for global collation

On Wed, Sep 14, 2022 at 05:19:34PM +0300, Marina Polyakova wrote:

I was surprised that it is allowed to create clusters/databases where the
default ICU collations do not actually work due to unsupported encodings:

$ initdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US -D
data &&
pg_ctl -D data -l logfile start &&
psql -c "SELECT 'a' < 'b'" template1
...
waiting for server to start.... done
server started
ERROR: encoding "SQL_ASCII" not supported by ICU

$ createdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US
--template template0 mydb &&
psql -c "SELECT 'a' < 'b'" mydb
ERROR: encoding "SQL_ASCII" not supported by ICU

The patch diff_check_icu_encoding.patch prohibits the creation of such
objects...

Agreed that it is a bit confusing to get this type of error after the
database has been created when querying it due to a mix of unsupported
options. Peter?
--
Michael

#92

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 3 years ago

In reply to: Marina Polyakova (#90)

Re: ICU for global collation

At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova <m.polyakova@postgrespro.ru> wrote in

P.S. While working on the patch, I discovered that UTF8 encoding is
always used for the ICU provider in initdb unless it is explicitly
specified by the user:

if (!encoding && locale_provider == COLLPROVIDER_ICU)
encodingid = PG_UTF8;

IMO this creates additional errors for locales with other encodings:

$ initdb --locale de_DE.iso885915@euro --locale-provider icu
--icu-locale de-DE
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that
the selected locale uses (LATIN9) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

And ICU supports many encodings, see the contents of pg_enc2icu_tbl in
encnames.c...

It seems to me the best default that fits almost all cases using icu
locales.

So, we need to specify encoding explicitly in that case.

$ initdb --encoding iso-8859-15 --locale de_DE.iso885915@euro --locale-provider icu --icu-locale de-DE

However, I think it is hardly understantable from the documentation.

(I checked this using euc-jp [1]initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider icu --icu-locale ja-x-icu so it might be wrong..)

[1]: initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider icu --icu-locale ja-x-icu

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#93

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Kyotaro Horiguchi (#89)

Re: ICU for global collation

On 2022-09-15 09:52, Kyotaro Horiguchi wrote:

If I executed initdb as follows, I would be told to specify
--icu-locale option.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

However, when I reran the command, it complains about incompatible
encoding this time. I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.

regards.

In continuation of options check: AFAICS the following checks in initdb

if (locale_provider == COLLPROVIDER_ICU)
{
if (!icu_locale)
pg_fatal("ICU locale must be specified");

/*
* In supported builds, the ICU locale ID will be checked by the
* backend during post-bootstrap initialization.
*/
#ifndef USE_ICU
pg_fatal("ICU is not supported in this build");
#endif
}

are executed approximately when they are executed in create database
after getting all the necessary data from the template database:

if (dblocprovider == COLLPROVIDER_ICU)
{
/*
* This would happen if template0 uses the libc provider but the new
* database uses icu.
*/
if (!dbiculocale)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("ICU locale must be specified")));
}

if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);

But perhaps the check that --icu-locale cannot be specified unless
locale provider icu is chosen should also be moved here? So all these
checks will be in one place and it will use the provider from the
template database (which could be icu):

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#94

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Kyotaro Horiguchi (#92)

Re: ICU for global collation

On 2022-09-16 07:55, Kyotaro Horiguchi wrote:

At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova
<m.polyakova@postgrespro.ru> wrote in

P.S. While working on the patch, I discovered that UTF8 encoding is
always used for the ICU provider in initdb unless it is explicitly
specified by the user:

if (!encoding && locale_provider == COLLPROVIDER_ICU)
encodingid = PG_UTF8;

IMO this creates additional errors for locales with other encodings:

$ initdb --locale de_DE.iso885915@euro --locale-provider icu
--icu-locale de-DE
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that
the selected locale uses (LATIN9) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

And ICU supports many encodings, see the contents of pg_enc2icu_tbl in
encnames.c...

It seems to me the best default that fits almost all cases using icu
locales.

So, we need to specify encoding explicitly in that case.

$ initdb --encoding iso-8859-15 --locale de_DE.iso885915@euro
--locale-provider icu --icu-locale de-DE

However, I think it is hardly understantable from the documentation.

(I checked this using euc-jp [1] so it might be wrong..)

[1] initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider
icu --icu-locale ja-x-icu

regards.

Thank you!

IMO it is hardly understantable from the program output either - it
looks like I manually chose the encoding UTF8. Maybe first inform about
selected encoding?..

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 
6aeec8d426c52414b827686781c245291f27ed1f..348bbbeba0f5bc7ff601912bf883510d580b814c 
100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2310,7 +2310,11 @@ setup_locale_encoding(void)
  	}

  	if (!encoding && locale_provider == COLLPROVIDER_ICU)
+	{
  		encodingid = PG_UTF8;
+		printf(_("The default database encoding has been set to \"%s\" for a 
better experience with the ICU provider.\n"),
+			   pg_encoding_to_char(encodingid));
+	}
  	else if (!encoding)
  	{
  		int			ctype_enc;

ISTM that such choices (e.g. UTF8 for Windows in some cases) are
described in the documentation [1]https://www.postgresql.org/docs/devel/app-initdb.html as

By default, initdb uses the locale provider libc, takes the locale
settings from the environment, and determines the encoding from the
locale settings. This is almost always sufficient, unless there are
special requirements.

[1]: https://www.postgresql.org/docs/devel/app-initdb.html

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#95

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#90)

Re: ICU for global collation

On 15.09.22 17:41, Marina Polyakova wrote:

I agree with you. Here's another version of the patch. The
locale/encoding checks and reports in initdb have been reordered,
because now the encoding is set first and only then the ICU locale is
checked.

I committed something based on the first version of your patch. This
reordering of the messages here was a little too much surgery for me at
this point. For instance, there are also messages in #ifdef WIN32 code
that would need to be reordered as well. I kept the overall structure
of the code the same and just inserted the additional proposed checks.

If you want to pursue the reordering of the checks and messages overall,
a patch for the master branch could be considered.

#96

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#94)

Re: ICU for global collation

On 16.09.22 09:31, Marina Polyakova wrote:

IMO it is hardly understantable from the program output either - it
looks like I manually chose the encoding UTF8. Maybe first inform about
selected encoding?..

Yes, I included something like that in the patch just committed.

Show quoted text

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 
6aeec8d426c52414b827686781c245291f27ed1f..348bbbeba0f5bc7ff601912bf883510d580b814c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2310,7 +2310,11 @@ setup_locale_encoding(void)
     }

     if (!encoding && locale_provider == COLLPROVIDER_ICU)
+    {
         encodingid = PG_UTF8;
+        printf(_("The default database encoding has been set to \"%s\" 
for a better experience with the ICU provider.\n"),
+               pg_encoding_to_char(encodingid));
+    }
     else if (!encoding)
     {
         int            ctype_enc;

#97

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 3 years ago

In reply to: Marina Polyakova (#93)

Re: ICU for global collation

At Fri, 16 Sep 2022 09:49:28 +0300, Marina Polyakova <m.polyakova@postgrespro.ru> wrote in

On 2022-09-15 09:52, Kyotaro Horiguchi wrote:

However, when I reran the command, it complains about incompatible
encoding this time. I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.
regards.

In continuation of options check: AFAICS the following checks in
initdb

if (locale_provider == COLLPROVIDER_ICU)
{
if (!icu_locale)
pg_fatal("ICU locale must be specified");

/*
* In supported builds, the ICU locale ID will be checked by the
* backend during post-bootstrap initialization.
*/
#ifndef USE_ICU
pg_fatal("ICU is not supported in this build");
#endif
}

are executed approximately when they are executed in create database
after getting all the necessary data from the template database:

initdb doesn't work that way, but anyway, I realized that I am
proposing to move that code in setlocales() to the caller function as
the result. I don't think setlocales() is the place for the code
because icu locale has no business with what the function does. That
being said there's no obvious reason we *need* to move the code out to
its caller.

if (dblocprovider == COLLPROVIDER_ICU)
{
/*
* This would happen if template0 uses the libc provider but the new
* database uses icu.
*/
if (!dbiculocale)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("ICU locale must be specified")));
}

if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);

But perhaps the check that --icu-locale cannot be specified unless
locale provider icu is chosen should also be moved here? So all these
checks will be in one place and it will use the provider from the
template database (which could be icu):

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

And, I realized that this causes bigger churn than I thought. So, I'm
sorry but I withdraw the comment.

Thus the first proposed patch will be more or less the direction we
would go. And the patch looks good to me as a whole.

+ errmsg("encoding \"%s\" is not supported with ICU provider",

+		pg_log_error("encoding \"%s\" is not supported with ICU provider",
+					 pg_encoding_to_char(encodingid));

I might be wrong, but the messages look wrong to me. The alternatives
below might work.

"encoding \"%s\" is not supported by ICU"
"encoding \"%s\" cannot be used for/with ICU locales"

+		pg_log_error_hint("Rerun %s and choose a matching combination.",
+						  progname);

This doesn't seem to provide users with useful information.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#98

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 3 years ago

In reply to: Peter Eisentraut (#95)

Re: ICU for global collation

At Fri, 16 Sep 2022 09:56:31 +0200, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote in

On 15.09.22 17:41, Marina Polyakova wrote:

I agree with you. Here's another version of the patch. The
locale/encoding checks and reports in initdb have been reordered,
because now the encoding is set first and only then the ICU locale is
checked.

I committed something based on the first version of your patch. This
reordering of the messages here was a little too much surgery for me
at this point. For instance, there are also messages in #ifdef WIN32
code that would need to be reordered as well. I kept the overall
structure of the code the same and just inserted the additional
proposed checks.

Yeah, as I sent just before, I reached the same conclusion.

If you want to pursue the reordering of the checks and messages
overall, a patch for the master branch could be considered.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#99

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#93)

Re: ICU for global collation

On 16.09.22 08:49, Marina Polyakova wrote:

But perhaps the check that --icu-locale cannot be specified unless
locale provider icu is chosen should also be moved here? So all these
checks will be in one place and it will use the provider from the
template database (which could be icu):

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

Can you be more specific about what you are proposing here? I'm not
following.

#100

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#96)

Re: ICU for global collation

On 2022-09-16 10:57, Peter Eisentraut wrote:

On 16.09.22 09:31, Marina Polyakova wrote:

IMO it is hardly understantable from the program output either - it
looks like I manually chose the encoding UTF8. Maybe first inform
about selected encoding?..

Yes, I included something like that in the patch just committed.

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 
6aeec8d426c52414b827686781c245291f27ed1f..348bbbeba0f5bc7ff601912bf883510d580b814c 
100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2310,7 +2310,11 @@ setup_locale_encoding(void)
     }

     if (!encoding && locale_provider == COLLPROVIDER_ICU)
+    {
         encodingid = PG_UTF8;
+        printf(_("The default database encoding has been set to 
\"%s\" for a better experience with the ICU provider.\n"),
+               pg_encoding_to_char(encodingid));
+    }
     else if (!encoding)
     {
         int            ctype_enc;

Thank you!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#101

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Kyotaro Horiguchi (#97)

Re: ICU for global collation

On 2022-09-16 11:11, Kyotaro Horiguchi wrote:

At Fri, 16 Sep 2022 09:49:28 +0300, Marina Polyakova
<m.polyakova@postgrespro.ru> wrote in

In continuation of options check: AFAICS the following checks in
initdb

if (locale_provider == COLLPROVIDER_ICU)
{
if (!icu_locale)
pg_fatal("ICU locale must be specified");

/*
* In supported builds, the ICU locale ID will be checked by the
* backend during post-bootstrap initialization.
*/
#ifndef USE_ICU
pg_fatal("ICU is not supported in this build");
#endif
}

are executed approximately when they are executed in create database
after getting all the necessary data from the template database:

initdb doesn't work that way, but anyway, I realized that I am
proposing to move that code in setlocales() to the caller function as
the result. I don't think setlocales() is the place for the code
because icu locale has no business with what the function does. That
being said there's no obvious reason we *need* to move the code out to
its caller.

Excuse me, but could you explain your last sentence in more detail? I
read that this code is not for setlocales and then - that it should not
moved from here, so I'm confused...

+ errmsg("encoding \"%s\" is not supported with ICU provider",
+		pg_log_error("encoding \"%s\" is not supported with ICU provider",
+					 pg_encoding_to_char(encodingid));
I might be wrong, but the messages look wrong to me. The alternatives
below might work.

"encoding \"%s\" is not supported by ICU"
"encoding \"%s\" cannot be used for/with ICU locales"

The message indicates that the selected encoding cannot be used with the
ICU provider because it does not support it. But if the text of the
message becomes better and clearer, I will only be glad.

+		pg_log_error_hint("Rerun %s and choose a matching combination.",
+						  progname);
This doesn't seem to provide users with useful information.

It was commited in more verbose form:

pg_log_error_hint("Rerun %s and either do not specify an encoding
explicitly, "
"or choose a matching combination.",

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#102

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#99)

Re: ICU for global collation

Thanks to Kyotaro Horiguchi review we found out that there're
interesting cases due to the order of some ICU checks:

1. ICU locale vs supported encoding:

1.1.

On 2022-09-15 09:52, Kyotaro Horiguchi wrote:

If I executed initdb as follows, I would be told to specify
--icu-locale option.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

However, when I reran the command, it complains about incompatible
encoding this time. I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.

1.2. (ok?)

$ initdb --encoding sql-ascii --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider
"icu" is chosen

$ initdb --encoding sql-ascii --icu-locale en-US --locale-provider icu
hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

$ createdb --encoding sql-ascii --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU
$ createdb --encoding sql-ascii --icu-locale en-US --locale-provider icu
hoge
createdb: error: database creation failed: ERROR: encoding "SQL_ASCII"
is not supported with ICU provider

2. For builds without ICU:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en-US hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR: ICU locale must be
specified

$ createdb --locale-provider icu --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

IMO, it would be more user-friendly to inform an unsupported build in
the first runs too..

2.2. (ok?)

$ initdb --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider
"icu" is chosen
$ initdb --icu-locale en-US --locale-provider icu hoge
...
initdb: error: ICU is not supported in this build

$ createdb --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU
$ createdb --icu-locale en-US --locale-provider icu hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

2.3.

$ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii
hoge
createdb: error: database creation failed: ERROR: encoding "SQL_ASCII"
is not supported with ICU provider
$ createdb --locale-provider icu --icu-locale en-US --encoding utf8 hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

IMO, it would be more user-friendly to inform an unsupported build in
the first run too..

The locale provider is ICU, but it has not yet been set from the
template database:

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#103

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#95)

Re: ICU for global collation

On 2022-09-16 10:56, Peter Eisentraut wrote:

On 15.09.22 17:41, Marina Polyakova wrote:

I agree with you. Here's another version of the patch. The
locale/encoding checks and reports in initdb have been reordered,
because now the encoding is set first and only then the ICU locale is
checked.

I committed something based on the first version of your patch. This
reordering of the messages here was a little too much surgery for me
at this point. For instance, there are also messages in #ifdef WIN32
code that would need to be reordered as well. I kept the overall
structure of the code the same and just inserted the additional
proposed checks.

If you want to pursue the reordering of the checks and messages
overall, a patch for the master branch could be considered.

Thank you! I already wrote about the order of the ICU checks in
initdb/create database, they were the only reason to propose such
changes...

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#104

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#102)

1 attachment(s)

Re: ICU for global collation

On 17.09.22 10:33, Marina Polyakova wrote:

Thanks to Kyotaro Horiguchi review we found out that there're
interesting cases due to the order of some ICU checks:

1. ICU locale vs supported encoding:

1.1.

On 2022-09-15 09:52, Kyotaro Horiguchi wrote:

If I executed initdb as follows, I would be told to specify
--icu-locale option.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

However, when I reran the command, it complains about incompatible
encoding this time. I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.

This a valid point, but it would require quite a bit of work to move all
those checks around and re-verify the result, so I don't want to do it
in PG15.

1.2. (ok?)

$ initdb --encoding sql-ascii --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider
"icu" is chosen

$ initdb --encoding sql-ascii --icu-locale en-US --locale-provider icu hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

$ createdb --encoding sql-ascii --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU
$ createdb --encoding sql-ascii --icu-locale en-US --locale-provider icu
hoge
createdb: error: database creation failed: ERROR: encoding "SQL_ASCII"
is not supported with ICU provider

I don't see a problem here.

2. For builds without ICU:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU locale must be specified

$ initdb --locale-provider icu --icu-locale en-US hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR: ICU locale must be
specified

$ createdb --locale-provider icu --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

IMO, it would be more user-friendly to inform an unsupported build in
the first runs too..

Again, this would require reorganizing a bunch of code to get some
cosmetic benefit, which isn't a good idea now for PG15.

2.2. (ok?)
2.3.

same here

3.

The locale provider is ICU, but it has not yet been set from the
template database:

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

Please see attached patch for a fix. Does that work for you?

Attachments:

0001-Improve-ICU-option-handling-in-CREATE-DATABASE.patchtext/plain; charset=UTF-8; name=0001-Improve-ICU-option-handling-in-CREATE-DATABASE.patchDownload

From 5bb07e390baf8a471f7d30582469268c8a6d71ca Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 20 Sep 2022 05:50:41 -0400
Subject: [PATCH] Improve ICU option handling in CREATE DATABASE

We check that the ICU locale is only specified if the ICU locale
provider is selected.  But we did that too early.  We need to wait
until we load the settings of the template database, since that could
also set what the locale provider is.

Reported-by: Marina Polyakova <m.polyakova@postgrespro.ru>
Discussion: https://www.postgresql.org/message-id/9ba4cd1ea6ed6b7b15c0ff15e6f540cd@postgrespro.ru
---
 src/backend/commands/dbcommands.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index cba929b7f998..5dfec5c6b056 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -907,10 +907,6 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("unrecognized locale provider: %s",
 							locproviderstr)));
 	}
-	if (diculocale && dblocprovider != COLLPROVIDER_ICU)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("ICU locale cannot be specified unless locale provider is ICU")));
 	if (distemplate && distemplate->arg)
 		dbistemplate = defGetBoolean(distemplate);
 	if (dallowconnections && dallowconnections->arg)
@@ -1050,6 +1046,13 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 		check_icu_locale(dbiculocale);
 	}
+	else
+	{
+		if (dbiculocale)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("ICU locale cannot be specified unless locale provider is ICU")));
+	}
 
 	/*
 	 * Check that the new encoding and locale settings match the source
-- 
2.37.3

#105

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#104)

Re: ICU for global collation

On 2022-09-20 12:59, Peter Eisentraut wrote:

On 17.09.22 10:33, Marina Polyakova wrote:

3.

The locale provider is ICU, but it has not yet been set from the
template database:

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR: ICU locale cannot
be
specified unless locale provider is ICU

Please see attached patch for a fix. Does that work for you?

Yes, it works. The following test checks this fix:

diff --git a/src/bin/scripts/t/020_createdb.pl 
b/src/bin/scripts/t/020_createdb.pl
index 
b87d8fc63b5246b02bcd4499aae815269b60df7c..c2464a99618cd7ca5616cc21121e1e4379b52baf 
100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -71,6 +71,14 @@ if ($ENV{with_icu} eq 'yes')
  	$node2->command_ok(
  		[ 'createdb', '-T', 'template0', '--locale-provider=libc', 'foobar55' 
],
  		'create database with libc provider from template database with icu 
provider');
+
+	$node2->command_ok(
+		[
+			'createdb', '-T', 'template0', '--icu-locale',
+			'en-US', 'foobar56'
+		],
+		'create database with icu locale from template database with icu 
provider'
+	);
  }
  else
  {

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#106

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#105)

Re: ICU for global collation

On 21.09.22 08:50, Marina Polyakova wrote:

On 2022-09-20 12:59, Peter Eisentraut wrote:

On 17.09.22 10:33, Marina Polyakova wrote:

3.

The locale provider is ICU, but it has not yet been set from the
template database:

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

Please see attached patch for a fix. Does that work for you?

Yes, it works. The following test checks this fix:

Committed with that test, thanks. I think that covers all the ICU
issues you reported for PG15 for now?

#107

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#106)

1 attachment(s)

Re: ICU for global collation

On 2022-09-21 17:53, Peter Eisentraut wrote:

Committed with that test, thanks. I think that covers all the ICU
issues you reported for PG15 for now?

I thought about the order of the ICU checks - if it is ok to check that
the selected encoding is supported by ICU after printing all the locale
& encoding information, why not to move almost all the ICU checks
here?..

Examples of the work of the attached patch:

1. ICU locale vs supported encoding:

1.1.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

1.2. (like before)

$ initdb --encoding sql-ascii --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider
"icu" is chosen

$ createdb --encoding sql-ascii --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

2. For builds without ICU:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

2.2. (like before)

$ initdb --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider
"icu" is chosen

$ createdb --icu-locale en-US hoge
createdb: error: database creation failed: ERROR: ICU locale cannot be
specified unless locale provider is ICU

2.3.

$ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii
hoge
createdb: error: database creation failed: ERROR: ICU is not supported
in this build

4. About errors in initdb:

4.1. If icu_locale is not specified, but it is required, then we get
this:

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user
"marina".
This user must also own the server process.

The database cluster will be initialized with this locale configuration:
provider: icu
LC_COLLATE: en_US.UTF-8
LC_CTYPE: en_US.UTF-8
LC_MESSAGES: en_US.UTF-8
LC_MONETARY: ru_RU.UTF-8
LC_NUMERIC: ru_RU.UTF-8
LC_TIME: ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU locale must be specified

Almost the same if ICU is not supported in this build:

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user
"marina".
This user must also own the server process.

4.2. If icu_locale is specified for the wrong provider, the error will
be at the beginning of the program start as before:

$ initdb --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider
"icu" is chosen

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

diff_icu_options_check_order.patchtext/x-diff; name=diff_icu_options_check_order.patchDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index e0753c1badcc299e2bac45f3bdd2f23f59d70cbc..2589a2523e07c9543c99c7d7b446438d62382b89 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+#ifndef USE_ICU
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#else
 		if (!(is_encoding_supported_by_icu(encoding)))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("ICU locale must be specified")));
 
 		check_icu_locale(dbiculocale);
+#endif
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8b751bcb5cda3a1f4c7803a68bc0a4a..743f11e1d1fd62d500fc2846976472d13e08046b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 	}
 }
 
-#endif							/* USE_ICU */
-
 /*
  * Check if the given locale ID is valid, and ereport(ERROR) if it isn't.
  */
 void
 check_icu_locale(const char *icu_locale)
 {
-#ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
 
@@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale)
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
 	ucol_close(collator);
-#else
-	ereport(ERROR,
-			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("ICU is not supported in this build")));
-#endif
 }
 
+#endif							/* USE_ICU */
+
 /*
  * These functions convert from/to libc's wchar_t, *not* pg_wchar_t.
  * Therefore we keep them here rather than with the mbutils code.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f61a04305590c7d09ec59fe202d6f22c0a605827..82e6644f89b07004c067785c86cad83e10b1282c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2047,9 +2047,12 @@ check_locale_encoding(const char *locale, int user_enc)
  *
  * this should match the similar check in the backend createdb() function
  */
-static bool
+static void
 check_icu_locale_encoding(int user_enc)
 {
+#ifndef USE_ICU
+	pg_fatal("ICU is not supported in this build");
+#else
 	if (!(is_encoding_supported_by_icu(user_enc)))
 	{
 		pg_log_error("encoding mismatch");
@@ -2058,9 +2061,17 @@ check_icu_locale_encoding(int user_enc)
 		pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, "
 						  "or choose a matching combination.",
 						  progname);
-		return false;
+		exit(1);
 	}
-	return true;
+
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
+#endif
 }
 
 /*
@@ -2113,20 +2124,6 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
-
-	if (locale_provider == COLLPROVIDER_ICU)
-	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
-
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
-#ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
-#endif
-	}
 }
 
 /*
@@ -2388,9 +2385,8 @@ setup_locale_encoding(void)
 		!check_locale_encoding(lc_collate, encodingid))
 		exit(1);				/* check_locale_encoding printed the error */
 
-	if (locale_provider == COLLPROVIDER_ICU &&
-		!check_icu_locale_encoding(encodingid))
-		exit(1);
+	if (locale_provider == COLLPROVIDER_ICU)
+		check_icu_locale_encoding(encodingid);
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 164fc11cbffc8c466f84d51c07106b602d022bc6..884506a1bc58752669694df95c6dcb05726507a5 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -123,28 +123,41 @@ if ($ENV{with_icu} eq 'yes')
 		[
 			'initdb',                '--no-sync',
 			'--locale-provider=icu', '--encoding=SQL_ASCII',
-			'--icu-locale=en', "$tempdir/dataX"
+			"$tempdir/dataX"
 		],
 		qr/error: encoding mismatch/,
 		'fails for encoding not supported by ICU');
 }
 else
 {
-	command_fails(
+	command_fails_like(
 		[ 'initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2" ],
+		qr/error: ICU is not supported in this build/,
 		'locale provider ICU fails since no ICU support');
+
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--encoding=SQL_ASCII',
+			"$tempdir/data2"
+		],
+		qr/error: ICU is not supported in this build/,
+		'locale provider ICU with not supported ICU encoding fails since no ICU support'
+	);
 }
 
-command_fails(
+command_fails_like(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
+	qr/error: unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
-command_fails(
+command_fails_like(
 	[
 		'initdb',                 '--no-sync',
 		'--locale-provider=libc', '--icu-locale=en',
 		"$tempdir/dataX"
 	],
+	qr/error: --icu-locale cannot be specified unless locale provider "icu" is chosen/,
 	'fails for invalid option combination');
 
 done_testing();
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 8ed8628db11c1410db4987f11250535bfb1aa008..7a262d227061ace1f72498371ab0c4fa8767d7a9 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -30,11 +30,12 @@ if ($ENV{with_icu} eq 'yes')
 	# This fails because template0 uses libc provider and has no ICU
 	# locale set.  It would succeed if template0 used the icu
 	# provider.  XXX Maybe split into multiple tests?
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu', 'foobar4'
 		],
+		qr/ERROR:  ICU locale must be specified/,
 		'create database with ICU fails without ICU locale specified');
 
 	$node->issues_sql_like(
@@ -46,12 +47,13 @@ if ($ENV{with_icu} eq 'yes')
 		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
 		'create database with ICU locale specified');
 
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu',
 			'--icu-locale=@colNumeric=lower', 'foobarX'
 		],
+		qr/ERROR:  could not open collator for locale/,
 		'fails for invalid ICU locale');
 
 	$node->command_fails_like(
@@ -78,18 +80,39 @@ if ($ENV{with_icu} eq 'yes')
 }
 else
 {
-	$node->command_fails(
+	$node->command_fails_like(
 		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		qr/ERROR:  ICU is not supported in this build/,
 		'create database with ICU fails since no ICU support');
+
+	$node->command_fails_like(
+		[
+			'createdb',             '-T',
+			'template0',            '--locale-provider=icu',
+			'--encoding=SQL_ASCII', 'foobar4'
+		],
+		qr/ERROR:  ICU is not supported in this build/,
+		'create database with ICU and not supported ICU encoding fails since no ICU support'
+	);
 }
 
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
-$node->command_fails(
+$node->command_fails_like(
 	[ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	qr/ERROR:  unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
+$node->command_fails_like(
+	[
+		'createdb',        '-T',
+		'template0',       '--locale-provider=libc',
+		'--icu-locale=en', 'foobarX'
+	],
+	qr/ERROR:  ICU locale cannot be specified unless locale provider is ICU/,
+	'fails for invalid option combination');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a87594212364aaac3337ef07188dd291d62299d7..6e8af399769fdb257a65446b3c476bbfeca49f33 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -104,8 +104,8 @@ extern char *get_collation_actual_version(char collprovider, const char *collcol
 #ifdef USE_ICU
 extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
 extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
 extern void check_icu_locale(const char *icu_locale);
+#endif
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,

#108

Peter Eisentraut

peter.eisentraut@enterprisedb.com

over 3 years ago

In reply to: Marina Polyakova (#107)

Re: ICU for global collation

On 22.09.22 20:06, Marina Polyakova wrote:

On 2022-09-21 17:53, Peter Eisentraut wrote:

Committed with that test, thanks. I think that covers all the ICU
issues you reported for PG15 for now?

I thought about the order of the ICU checks - if it is ok to check that
the selected encoding is supported by ICU after printing all the locale
& encoding information, why not to move almost all the ICU checks here?..

It's possible that we can do better, but I'm not going to add things
like that to PG 15 at this point unless it fixes a faulty behavior.

#109

Marina Polyakova

m.polyakova@postgrespro.ru

over 3 years ago

In reply to: Peter Eisentraut (#108)

1 attachment(s)

Re: ICU for global collation

On 2022-10-01 15:07, Peter Eisentraut wrote:

On 22.09.22 20:06, Marina Polyakova wrote:

On 2022-09-21 17:53, Peter Eisentraut wrote:

Committed with that test, thanks. I think that covers all the ICU
issues you reported for PG15 for now?

I thought about the order of the ICU checks - if it is ok to check
that the selected encoding is supported by ICU after printing all the
locale & encoding information, why not to move almost all the ICU
checks here?..

It's possible that we can do better, but I'm not going to add things
like that to PG 15 at this point unless it fixes a faulty behavior.

Will PG 15 always have this order of ICU checks, is the current
behaviour correct enough? On the other hand, there may be a better fix
for PG 16+ and not all changes can be backported...

On 2022-09-16 10:56, Peter Eisentraut wrote:

On 15.09.22 17:41, Marina Polyakova wrote:

I agree with you. Here's another version of the patch. The
locale/encoding checks and reports in initdb have been reordered,
because now the encoding is set first and only then the ICU locale is
checked.

I committed something based on the first version of your patch. This
reordering of the messages here was a little too much surgery for me
at this point. For instance, there are also messages in #ifdef WIN32
code that would need to be reordered as well. I kept the overall
structure of the code the same and just inserted the additional
proposed checks.

If you want to pursue the reordering of the checks and messages
overall, a patch for the master branch could be considered.

I've worked on this again (see attached patch) but I'm not sure if the
messages of encoding mismatches are clear enough without the full locale
information. For

$ initdb -D data --icu-locale en --locale-provider icu

compare the outputs:

The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: en
LC_COLLATE: de_DE.iso885915@euro
LC_CTYPE: de_DE.iso885915@euro
LC_MESSAGES: en_US.utf8
LC_MONETARY: de_DE.iso885915@euro
LC_NUMERIC: de_DE.iso885915@euro
LC_TIME: de_DE.iso885915@euro
The default database encoding has been set to "UTF8".
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that
the selected locale uses (LATIN9) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

and

Encoding "UTF8" implied by locale will be set as the default database
encoding.
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that
the selected locale uses (LATIN9) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

The same without ICU, e.g. for

$ initdb -D data

the output with locale information:

The database cluster will be initialized with this locale configuration:
provider: libc
LC_COLLATE: en_US.utf8
LC_CTYPE: de_DE.iso885915@euro
LC_MESSAGES: en_US.utf8
LC_MONETARY: de_DE.iso885915@euro
LC_NUMERIC: de_DE.iso885915@euro
LC_TIME: de_DE.iso885915@euro
The default database encoding has accordingly been set to "LATIN9".
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (LATIN9) and the encoding that
the selected locale uses (UTF8) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

and the "shorter" version:

Encoding "LATIN9" implied by locale will be set as the default database
encoding.
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (LATIN9) and the encoding that
the selected locale uses (UTF8) do not match. This would lead to
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding
explicitly, or choose a matching combination.

BTW, what did you mean that "there are also messages in #ifdef WIN32
code that would need to be reordered as well"?..

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

v2-diff_icu_options_check_order.patchtext/x-diff; name=v2-diff_icu_options_check_order.patchDownload

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 96b46cbc020ef8b85b6d54d3d4ca8ad116277832..242d68f58287aeb6f95619c2ce8b78e38433cf18 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+#ifndef USE_ICU
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("ICU is not supported in this build")));
+#else
 		if (!(is_encoding_supported_by_icu(encoding)))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					 errmsg("ICU locale must be specified")));
 
 		check_icu_locale(dbiculocale);
+#endif
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8b751bcb5cda3a1f4c7803a68bc0a4a..743f11e1d1fd62d500fc2846976472d13e08046b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 	}
 }
 
-#endif							/* USE_ICU */
-
 /*
  * Check if the given locale ID is valid, and ereport(ERROR) if it isn't.
  */
 void
 check_icu_locale(const char *icu_locale)
 {
-#ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
 
@@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale)
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
 	ucol_close(collator);
-#else
-	ereport(ERROR,
-			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-			 errmsg("ICU is not supported in this build")));
-#endif
 }
 
+#endif							/* USE_ICU */
+
 /*
  * These functions convert from/to libc's wchar_t, *not* pg_wchar_t.
  * Therefore we keep them here rather than with the mbutils code.
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f61a04305590c7d09ec59fe202d6f22c0a605827..47dde552e218055f5d73003799b4ccc7c96a49a7 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2043,24 +2043,35 @@ check_locale_encoding(const char *locale, int user_enc)
 }
 
 /*
- * check if the chosen encoding matches is supported by ICU
+ * check if the chosen locale and encoding are supported by ICU
  *
  * this should match the similar check in the backend createdb() function
  */
-static bool
-check_icu_locale_encoding(int user_enc)
+static void
+check_icu_locale_encoding(void)
 {
-	if (!(is_encoding_supported_by_icu(user_enc)))
+#ifndef USE_ICU
+	pg_fatal("ICU is not supported in this build");
+#else
+	if (!(is_encoding_supported_by_icu(encodingid)))
 	{
 		pg_log_error("encoding mismatch");
 		pg_log_error_detail("The encoding you selected (%s) is not supported with the ICU provider.",
-							pg_encoding_to_char(user_enc));
+							pg_encoding_to_char(encodingid));
 		pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, "
 						  "or choose a matching combination.",
 						  progname);
-		return false;
+		exit(1);
 	}
-	return true;
+
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
+#endif
 }
 
 /*
@@ -2113,20 +2124,6 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
-
-	if (locale_provider == COLLPROVIDER_ICU)
-	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
-
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
-#ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
-#endif
-	}
 }
 
 /*
@@ -2297,18 +2294,86 @@ setup_bin_paths(const char *argv0)
 	canonicalize_path(share_path);
 }
 
+static void
+set_encoding(void)
+{
+	int			ctype_enc;
+
+	if (encoding)
+	{
+		encodingid = get_encoding_id(encoding);
+		return;
+	}
+
+	if (locale_provider == COLLPROVIDER_ICU)
+	{
+		encodingid = PG_UTF8;
+		printf(_("Encoding \"%s\" implied by locale will be set as the default database encoding.\n"),
+			   pg_encoding_to_char(encodingid));
+		return;
+	}
+
+	ctype_enc = pg_get_encoding_from_locale(lc_ctype, true);
+
+	if (ctype_enc == -1)
+	{
+		/* Couldn't recognize the locale's codeset */
+		pg_log_error("could not find suitable encoding for locale \"%s\"",
+					 lc_ctype);
+		pg_log_error_hint("Rerun %s with the -E option.", progname);
+		pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+		exit(1);
+	}
+
+	if (!pg_valid_server_encoding_id(ctype_enc))
+	{
+		/*
+		 * We recognized it, but it's not a legal server encoding. On Windows,
+		 * UTF-8 works with any locale, so we can fall back to UTF-8.
+		 */
+#ifdef WIN32
+		encodingid = PG_UTF8;
+		printf(_("Encoding \"%s\" implied by locale is not allowed as a server-side encoding.\n"
+				 "The default database encoding will be set to \"%s\" instead.\n"),
+			   pg_encoding_to_char(ctype_enc),
+			   pg_encoding_to_char(encodingid));
+		return;
+#else
+		pg_log_error("locale \"%s\" requires unsupported encoding \"%s\"",
+					 lc_ctype, pg_encoding_to_char(ctype_enc));
+		pg_log_error_detail("Encoding \"%s\" is not allowed as a server-side encoding.",
+							pg_encoding_to_char(ctype_enc));
+		pg_log_error_hint("Rerun %s with a different locale selection.",
+						  progname);
+		exit(1);
+#endif
+	}
+
+	encodingid = ctype_enc;
+	printf(_("Encoding \"%s\" implied by locale will be set as the default database encoding.\n"),
+		   pg_encoding_to_char(encodingid));
+}
+
 void
 setup_locale_encoding(void)
 {
 	setlocales();
 
+	set_encoding();
+
+	if (!check_locale_encoding(lc_ctype, encodingid) ||
+		!check_locale_encoding(lc_collate, encodingid))
+		exit(1);				/* check_locale_encoding printed the error */
+
+	if (locale_provider == COLLPROVIDER_ICU)
+		check_icu_locale_encoding();
+
 	if (locale_provider == COLLPROVIDER_LIBC &&
 		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
 		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0 &&
-		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
+		strcmp(lc_ctype, lc_messages) == 0)
 		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
 	else
 	{
@@ -2329,68 +2394,6 @@ setup_locale_encoding(void)
 			   lc_numeric,
 			   lc_time);
 	}
-
-	if (!encoding && locale_provider == COLLPROVIDER_ICU)
-	{
-		encodingid = PG_UTF8;
-		printf(_("The default database encoding has been set to \"%s\".\n"),
-			   pg_encoding_to_char(encodingid));
-	}
-	else if (!encoding)
-	{
-		int			ctype_enc;
-
-		ctype_enc = pg_get_encoding_from_locale(lc_ctype, true);
-
-		if (ctype_enc == -1)
-		{
-			/* Couldn't recognize the locale's codeset */
-			pg_log_error("could not find suitable encoding for locale \"%s\"",
-						 lc_ctype);
-			pg_log_error_hint("Rerun %s with the -E option.", progname);
-			pg_log_error_hint("Try \"%s --help\" for more information.", progname);
-			exit(1);
-		}
-		else if (!pg_valid_server_encoding_id(ctype_enc))
-		{
-			/*
-			 * We recognized it, but it's not a legal server encoding. On
-			 * Windows, UTF-8 works with any locale, so we can fall back to
-			 * UTF-8.
-			 */
-#ifdef WIN32
-			encodingid = PG_UTF8;
-			printf(_("Encoding \"%s\" implied by locale is not allowed as a server-side encoding.\n"
-					 "The default database encoding will be set to \"%s\" instead.\n"),
-				   pg_encoding_to_char(ctype_enc),
-				   pg_encoding_to_char(encodingid));
-#else
-			pg_log_error("locale \"%s\" requires unsupported encoding \"%s\"",
-						 lc_ctype, pg_encoding_to_char(ctype_enc));
-			pg_log_error_detail("Encoding \"%s\" is not allowed as a server-side encoding.",
-								pg_encoding_to_char(ctype_enc));
-			pg_log_error_hint("Rerun %s with a different locale selection.",
-							  progname);
-			exit(1);
-#endif
-		}
-		else
-		{
-			encodingid = ctype_enc;
-			printf(_("The default database encoding has accordingly been set to \"%s\".\n"),
-				   pg_encoding_to_char(encodingid));
-		}
-	}
-	else
-		encodingid = get_encoding_id(encoding);
-
-	if (!check_locale_encoding(lc_ctype, encodingid) ||
-		!check_locale_encoding(lc_collate, encodingid))
-		exit(1);				/* check_locale_encoding printed the error */
-
-	if (locale_provider == COLLPROVIDER_ICU &&
-		!check_icu_locale_encoding(encodingid))
-		exit(1);
 }
 
 
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 164fc11cbffc8c466f84d51c07106b602d022bc6..3a5166408ce2ef7831d7bd80e1ec95cb8867517a 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -102,12 +102,14 @@ if ($ENV{with_icu} eq 'yes')
 		qr/initdb: error: ICU locale must be specified/,
 		'locale provider ICU requires --icu-locale');
 
-	command_ok(
+	command_like(
 		[
 			'initdb',                '--no-sync',
+			'-A',                    'trust',
 			'--locale-provider=icu', '--icu-locale=en',
 			"$tempdir/data3"
 		],
+		qr/Encoding "UTF8" implied by locale will be set as the default database encoding/,
 		'option --icu-locale');
 
 	command_fails_like(
@@ -123,28 +125,41 @@ if ($ENV{with_icu} eq 'yes')
 		[
 			'initdb',                '--no-sync',
 			'--locale-provider=icu', '--encoding=SQL_ASCII',
-			'--icu-locale=en', "$tempdir/dataX"
+			"$tempdir/dataX"
 		],
 		qr/error: encoding mismatch/,
 		'fails for encoding not supported by ICU');
 }
 else
 {
-	command_fails(
+	command_fails_like(
 		[ 'initdb', '--no-sync', '--locale-provider=icu', "$tempdir/data2" ],
+		qr/error: ICU is not supported in this build/,
 		'locale provider ICU fails since no ICU support');
+
+	command_fails_like(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--encoding=SQL_ASCII',
+			"$tempdir/data2"
+		],
+		qr/error: ICU is not supported in this build/,
+		'locale provider ICU with not supported ICU encoding fails since no ICU support'
+	);
 }
 
-command_fails(
+command_fails_like(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
+	qr/error: unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
-command_fails(
+command_fails_like(
 	[
 		'initdb',                 '--no-sync',
 		'--locale-provider=libc', '--icu-locale=en',
 		"$tempdir/dataX"
 	],
+	qr/error: --icu-locale cannot be specified unless locale provider "icu" is chosen/,
 	'fails for invalid option combination');
 
 done_testing();
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index a74bf3b0d8bb9642d0b24f5781d5794727466a88..45b44fce460376240a2617fa207e44e686439fab 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -30,11 +30,12 @@ if ($ENV{with_icu} eq 'yes')
 	# This fails because template0 uses libc provider and has no ICU
 	# locale set.  It would succeed if template0 used the icu
 	# provider.  XXX Maybe split into multiple tests?
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu', 'foobar4'
 		],
+		qr/ERROR:  ICU locale must be specified/,
 		'create database with ICU fails without ICU locale specified');
 
 	$node->issues_sql_like(
@@ -46,12 +47,13 @@ if ($ENV{with_icu} eq 'yes')
 		qr/statement: CREATE DATABASE foobar5 .* LOCALE_PROVIDER icu ICU_LOCALE 'en'/,
 		'create database with ICU locale specified');
 
-	$node->command_fails(
+	$node->command_fails_like(
 		[
 			'createdb', '-T', 'template0', '-E', 'UTF8',
 			'--locale-provider=icu',
 			'--icu-locale=@colNumeric=lower', 'foobarX'
 		],
+		qr/ERROR:  could not open collator for locale/,
 		'fails for invalid ICU locale');
 
 	$node->command_fails_like(
@@ -78,18 +80,39 @@ if ($ENV{with_icu} eq 'yes')
 }
 else
 {
-	$node->command_fails(
+	$node->command_fails_like(
 		[ 'createdb', '-T', 'template0', '--locale-provider=icu', 'foobar4' ],
+		qr/ERROR:  ICU is not supported in this build/,
 		'create database with ICU fails since no ICU support');
+
+	$node->command_fails_like(
+		[
+			'createdb',             '-T',
+			'template0',            '--locale-provider=icu',
+			'--encoding=SQL_ASCII', 'foobar4'
+		],
+		qr/ERROR:  ICU is not supported in this build/,
+		'create database with ICU and not supported ICU encoding fails since no ICU support'
+	);
 }
 
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
-$node->command_fails(
+$node->command_fails_like(
 	[ 'createdb', '-T', 'template0', '--locale-provider=xyz', 'foobarX' ],
+	qr/ERROR:  unrecognized locale provider: xyz/,
 	'fails for invalid locale provider');
 
+$node->command_fails_like(
+	[
+		'createdb',        '-T',
+		'template0',       '--locale-provider=libc',
+		'--icu-locale=en', 'foobarX'
+	],
+	qr/ERROR:  ICU locale cannot be specified unless locale provider is ICU/,
+	'fails for invalid option combination');
+
 # Check use of templates with shared dependencies copied from the template.
 my ($ret, $stdout, $stderr) = $node->psql(
 	'foobar2',
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a87594212364aaac3337ef07188dd291d62299d7..6e8af399769fdb257a65446b3c476bbfeca49f33 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -104,8 +104,8 @@ extern char *get_collation_actual_version(char collprovider, const char *collcol
 #ifdef USE_ICU
 extern int32_t icu_to_uchar(UChar **buff_uchar, const char *buff, size_t nbytes);
 extern int32_t icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar);
-#endif
 extern void check_icu_locale(const char *icu_locale);
+#endif
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,

#110

Marina Polyakova

m.polyakova@postgrespro.ru

about 3 years ago

In reply to: Marina Polyakova (#109)

2 attachment(s)

Re: ICU for global collation

Hello!

I discovered an interesting behaviour during installcheck runs when the
cluster was initialized with ICU locale provider:

$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start

1) The ECPG tests fail because they use the SQL_ASCII encoding [1]https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18, the
database template0 uses the ICU locale provider and SQL_ASCII is not
supported by ICU:

$ make -C src/interfaces/ecpg/ installcheck
...
============== creating database "ecpg1_regression" ==============
ERROR: encoding "SQL_ASCII" is not supported with ICU provider
ERROR: database "ecpg1_regression" does not exist
command failed: "/home/marina/postgresql/master/my/inst/bin/psql" -X -c
"CREATE DATABASE \"ecpg1_regression\" TEMPLATE=template0
ENCODING='SQL_ASCII'" -c "ALTER DATABASE \"ecpg1_regression\" SET
lc_messages TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_monetary
TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_numeric TO 'C';ALTER
DATABASE \"ecpg1_regression\" SET lc_time TO 'C';ALTER DATABASE
\"ecpg1_regression\" SET bytea_output TO 'hex';ALTER DATABASE
\"ecpg1_regression\" SET timezone_abbreviations TO 'Default';"
"postgres"

2) The option --no-locale in pg_regress is described as "use C locale"
[2]: https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992
provider with the ICU cluster locale from template0 (see
diff_check_backend_used_provider.patch):

$ make NO_LOCALE=1 installcheck

In regression.diffs:

diff -U3 
/home/marina/postgresql/master/src/test/regress/expected/test_setup.out 
/home/marina/postgresql/master/src/test/regress/results/test_setup.out
--- 
/home/marina/postgresql/master/src/test/regress/expected/test_setup.out	2022-09-27 
05:31:27.674628815 +0300
+++ 
/home/marina/postgresql/master/src/test/regress/results/test_setup.out	2022-10-21 
15:09:31.232992885 +0300
@@ -143,6 +143,798 @@
  \set filename :abs_srcdir '/data/person.data'
  COPY person FROM :'filename';
  VACUUM ANALYZE person;
+NOTICE:  varstrfastcmp_locale sss->collate_c 0 sss->locale 0xefacd0
+NOTICE:  varstrfastcmp_locale sss->locale->provider i
+NOTICE:  varstrfastcmp_locale sss->locale->info.icu.locale en-US
...

The patch diff_fix_pg_regress_create_database.patch fixes both issues
for me.

[1]: https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18
https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18
[2]: https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992
https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

diff_check_backend_used_provider.patchtext/x-diff; name=diff_check_backend_used_provider.patchDownload

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index b57ed946c42bb54ede800e95045aa937a8dbad85..b3c0f6f753f8428274389844ccf9778a7ed47ea4 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -281,6 +281,14 @@ hashtext(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashtext lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashtext mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashtext mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any((unsigned char *) VARDATA_ANY(key),
@@ -337,6 +345,14 @@ hashtextextended(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashtextextended lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashtextextended mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashtextextended mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 02d462a659778016f3c4479d425ba0a84feb6e26..9627c84a7ccfb4c4013556a51c989e9e6d611634 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -243,6 +243,8 @@ pg_set_regex_collation(Oid collation)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	elog(NOTICE, "pg_set_regex_collation lc_ctype_is_c(collid) %d", lc_ctype_is_c(collation));
+
 	if (lc_ctype_is_c(collation))
 	{
 		/* C/POSIX collations use this path regardless of database encoding */
@@ -259,6 +261,14 @@ pg_set_regex_collation(Oid collation)
 		 */
 		pg_regex_locale = pg_newlocale_from_collation(collation);
 
+		elog(NOTICE, "pg_set_regex_collation pg_regex_locale %p", pg_regex_locale);
+		if (pg_regex_locale)
+		{
+			elog(NOTICE, "pg_set_regex_collation pg_regex_locale->provider %c", pg_regex_locale->provider);
+			if (pg_regex_locale->provider == COLLPROVIDER_ICU)
+				elog(NOTICE, "pg_set_regex_collation pg_regex_locale->info.icu.locale %s", pg_regex_locale->info.icu.locale ? pg_regex_locale->info.icu.locale : "(null)");
+		}
+
 		if (pg_regex_locale && !pg_regex_locale->deterministic)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 26f498b5df4d8eb280a0e6af69fd92d4ce0d89b7..a0616a0457c9abe4635064964146027008271ff8 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1682,6 +1682,8 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	elog(NOTICE, "str_tolower lc_ctype_is_c(collid) %d", lc_ctype_is_c(collid));
+
 	/* C/POSIX collations use this path regardless of database encoding */
 	if (lc_ctype_is_c(collid))
 	{
@@ -1693,6 +1695,14 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 
 		mylocale = pg_newlocale_from_collation(collid);
 
+		elog(NOTICE, "str_tolower mylocale %p", mylocale);
+		if (mylocale)
+		{
+			elog(NOTICE, "str_tolower mylocale->provider %c", mylocale->provider);
+			if (mylocale->provider == COLLPROVIDER_ICU)
+				elog(NOTICE, "str_tolower mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+		}
+
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
 		{
@@ -1804,6 +1814,8 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	elog(NOTICE, "str_toupper lc_ctype_is_c(collid) %d", lc_ctype_is_c(collid));
+
 	/* C/POSIX collations use this path regardless of database encoding */
 	if (lc_ctype_is_c(collid))
 	{
@@ -1815,6 +1827,14 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 
 		mylocale = pg_newlocale_from_collation(collid);
 
+		elog(NOTICE, "str_toupper mylocale %p", mylocale);
+		if (mylocale)
+		{
+			elog(NOTICE, "str_toupper mylocale->provider %c", mylocale->provider);
+			if (mylocale->provider == COLLPROVIDER_ICU)
+				elog(NOTICE, "str_toupper mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+		}
+
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
 		{
@@ -1927,6 +1947,8 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	elog(NOTICE, "str_initcap lc_ctype_is_c(collid) %d", lc_ctype_is_c(collid));
+
 	/* C/POSIX collations use this path regardless of database encoding */
 	if (lc_ctype_is_c(collid))
 	{
@@ -1938,6 +1960,14 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 
 		mylocale = pg_newlocale_from_collation(collid);
 
+		elog(NOTICE, "str_initcap mylocale %p", mylocale);
+		if (mylocale)
+		{
+			elog(NOTICE, "str_initcap mylocale->provider %c", mylocale->provider);
+			if (mylocale->provider == COLLPROVIDER_ICU)
+				elog(NOTICE, "str_initcap mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+		}
+
 #ifdef USE_ICU
 		if (mylocale && mylocale->provider == COLLPROVIDER_ICU)
 		{
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 68e2e6f7a719866eacc506e1780f6d1b58951599..dacf044a4ee05555b4742300d0c078b3880db60c 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -1010,6 +1010,14 @@ hashbpchar(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashbpchar lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashbpchar mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashbpchar mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any((unsigned char *) keydata, keylen);
@@ -1070,6 +1078,14 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashbpcharextended lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashbpcharextended mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashbpcharextended mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any_extended((unsigned char *) keydata, keylen,
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index c5e7ee7ca2d3073c067928cfa35c1e746218bb64..012565dd2986c7ae9781f61dbb5f724770575fab 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1521,6 +1521,8 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 
 	check_collation_set(collid);
 
+	elog(NOTICE, "varstr_cmp lc_collate_is_c(collid) %d", lc_collate_is_c(collid));
+
 	/*
 	 * Unfortunately, there is no strncoll(), so in the non-C locale case we
 	 * have to do some memory copying.  This turns out to be significantly
@@ -1543,6 +1545,14 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 
 		mylocale = pg_newlocale_from_collation(collid);
 
+		elog(NOTICE, "varstr_cmp mylocale %p", mylocale);
+		if (mylocale)
+		{
+			elog(NOTICE, "varstr_cmp mylocale->provider %c", mylocale->provider);
+			if (mylocale->provider == COLLPROVIDER_ICU)
+				elog(NOTICE, "varstr_cmp mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+		}
+
 		/*
 		 * memcmp() can't tell us which of two unequal strings sorts first,
 		 * but it's a cheap way to tell if they're equal.  Testing shows that
@@ -2377,6 +2387,14 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 		return sss->last_returned;
 	}
 
+	elog(NOTICE, "varstrfastcmp_locale sss->collate_c %d sss->locale %p", sss->collate_c, sss->locale);
+	if (sss->locale)
+	{
+		elog(NOTICE, "varstrfastcmp_locale sss->locale->provider %c", sss->locale->provider);
+		if (sss->locale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "varstrfastcmp_locale sss->locale->info.icu.locale %s", sss->locale->info.icu.locale ? sss->locale->info.icu.locale : "(null)");
+	}
+
 	if (sss->locale)
 	{
 		if (sss->locale->provider == COLLPROVIDER_ICU)
@@ -2472,6 +2490,14 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
 	if (sss->typid == BPCHAROID)
 		len = bpchartruelen(authoritative_data, len);
 
+	elog(NOTICE, "varstr_abbrev_convert sss->collate_c %d sss->locale %p", sss->collate_c, sss->locale);
+	if (sss->locale)
+	{
+		elog(NOTICE, "varstr_abbrev_convert sss->locale->provider %c", sss->locale->provider);
+		if (sss->locale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "varstr_abbrev_convert sss->locale->info.icu.locale %s", sss->locale->info.icu.locale ? sss->locale->info.icu.locale : "(null)");
+	}
+
 	/*
 	 * If we're using the C collation, use memcpy(), rather than strxfrm(), to
 	 * abbreviate keys.  The full comparator for the C locale is always

diff_fix_pg_regress_create_database.patchtext/x-diff; name=diff_fix_pg_regress_create_database.patchDownload

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index dda076847a38a74a765ca1283226815e9d10ada1..4aa42c6b1c7f2f2c36af539bafca4d64cdb7db3a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1899,14 +1899,25 @@ create_database(const char *dbname)
 	/*
 	 * We use template0 so that any installation-local cruft in template1 will
 	 * not mess up the tests.
+	 *
+	 * Explicitly set the locale provider libc for the option --no-locale.
+	 * Otherwise during installcheck the new database may use the ICU locale
+	 * provider with the custom ICU locale from template0.
 	 */
 	header(_("creating database \"%s\""), dbname);
 	if (encoding)
+
+		/*
+		 * Explicitly set the locale provider libc for the manually chosen
+		 * encoding. Otherwise during installcheck the new database may use
+		 * the ICU locale provider (from template0) which does not support all
+		 * encodings.
+		 */
 		psql_add_command(buf, "CREATE DATABASE \"%s\" TEMPLATE=template0 ENCODING='%s'%s", dbname, encoding,
-						 (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C'" : "");
+						 (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C' LOCALE_PROVIDER='libc'" : " LOCALE_PROVIDER='libc'");
 	else
 		psql_add_command(buf, "CREATE DATABASE \"%s\" TEMPLATE=template0%s", dbname,
-						 (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C'" : "");
+						 (nolocale) ? " LC_COLLATE='C' LC_CTYPE='C' LOCALE_PROVIDER='libc'" : "");
 	psql_add_command(buf,
 					 "ALTER DATABASE \"%s\" SET lc_messages TO 'C';"
 					 "ALTER DATABASE \"%s\" SET lc_monetary TO 'C';"

#111

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Marina Polyakova (#110)

Re: ICU for global collation

On Fri, Oct 21, 2022 at 05:32:38PM +0300, Marina Polyakova wrote:

Hello!

I discovered an interesting behaviour during installcheck runs when the
cluster was initialized with ICU locale provider:

$ initdb --locale-provider icu --icu-locale en-US -D data &&
2) The option --no-locale in pg_regress is described as "use C locale" [2].
But in this case the created databases actually use the ICU locale provider
with the ICU cluster locale from template0 (see
diff_check_backend_used_provider.patch):

$ make NO_LOCALE=1 installcheck

Yes, this looks wrong on the ground on what -no-locale is expected to
do, aka use a C locale. Peter?
--
Michael