move collation import to backend

Started by Peter Eisentrautabout 9 years ago12 messages

peter.eisentraut@2ndquadrant.com

about 9 years ago

1 attachment(s)

Currently, initdb parses locale -a output to populate pg_collation. If
additional collations are installed in the operating system, it is not
possible to repeat this process, only by doing each step manually. So I
propose to move this to a backend function that can be called
separately, and have initdb call that. Running this logic in the
backend instead of initdb also makes the code simpler. If we add other
collation providers such as ICU, initdb doesn't need to know about that
at all, because all the logic would be contained in the backend.

Here is an example:

select pg_import_system_collations(if_not_exists => false, schema =>
'test');

(Specifying the schema also allows testing this without overwriting
pg_catalog.)

I thought about making this a top-level command (IMPORT COLLATIONS ...
?) but decided against it for now, to keep it simple. Right now, this
is more of a refactoring. Documentation could be added if we decide so.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Add-function-to-import-operation-system-collations.patchtext/x-patch; name=0001-Add-function-to-import-operation-system-collations.patchDownload

From 97fb7f992b95d2ca8725011dc141dad88051a8cd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Thu, 13 Oct 2016 12:00:00 -0400
Subject: [PATCH] Add function to import operation system collations

Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
---
 src/backend/catalog/pg_collation.c    |  18 +++-
 src/backend/commands/collationcmds.c  | 146 +++++++++++++++++++++++++++++-
 src/bin/initdb/initdb.c               | 164 +---------------------------------
 src/include/catalog/pg_collation_fn.h |   3 +-
 src/include/catalog/pg_proc.h         |   3 +
 5 files changed, 167 insertions(+), 167 deletions(-)

diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index f37cf37..cda64c4 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -41,7 +41,8 @@ Oid
 CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype)
+				const char *collcollate, const char *collctype,
+				bool if_not_exists)
 {
 	Relation	rel;
 	TupleDesc	tupDesc;
@@ -72,10 +73,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 							  PointerGetDatum(collname),
 							  Int32GetDatum(collencoding),
 							  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
 				(errcode(ERRCODE_DUPLICATE_OBJECT),
-				 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+				 errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
 						collname, pg_encoding_to_char(collencoding))));
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+							collname, pg_encoding_to_char(collencoding))));
+	}
 
 	/*
 	 * Also forbid matching an any-encoding entry.  This test of course is not
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9bba748..062e3b6 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -136,7 +136,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 							 GetUserId(),
 							 GetDatabaseEncoding(),
 							 collcollate,
-							 collctype);
+							 collctype,
+							 false);
+
+	if (!newoid)
+		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
@@ -177,3 +181,143 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
 				 errmsg("collation \"%s\" already exists in schema \"%s\"",
 						collname, get_namespace_name(nspOid))));
 }
+
+
+/*
+ * "Normalize" a locale name, stripping off encoding tags such as
+ * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
+ * -> "br_FR@euro").  Return true if a new, different name was
+ * generated.
+ */
+static bool
+normalize_locale_name(char *new, const char *old)
+{
+	char	   *n = new;
+	const char *o = old;
+	bool		changed = false;
+
+	while (*o)
+	{
+		if (*o == '.')
+		{
+			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
+			o++;
+			while ((*o >= 'A' && *o <= 'Z')
+				   || (*o >= 'a' && *o <= 'z')
+				   || (*o >= '0' && *o <= '9')
+				   || (*o == '-'))
+				o++;
+			changed = true;
+		}
+		else
+			*n++ = *o++;
+	}
+	*n = '\0';
+
+	return changed;
+}
+
+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
+	bool		if_not_exists = PG_GETARG_BOOL(0);
+	Oid         nspid = PG_GETARG_OID(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	locale_a_handle = OpenPipeStream("locale -a", "r");
+	if (locale_a_handle == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not execute command \"%s\": %m",
+						"locale -a")));
+
+	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
+	{
+		int			i;
+		size_t		len;
+		int			enc;
+		bool		skip;
+		char		alias[NAMEDATALEN];
+
+		len = strlen(localebuf);
+
+		if (len == 0 || localebuf[len - 1] != '\n')
+		{
+			elog(DEBUG1, "locale name too long, skipped: \"%s\"", localebuf);
+			continue;
+		}
+		localebuf[len - 1] = '\0';
+
+		/*
+		 * Some systems have locale names that don't consist entirely of ASCII
+		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
+		 * pretty silly, since we need the locale itself to interpret the
+		 * non-ASCII characters. We can't do much with those, so we filter
+		 * them out.
+		 */
+		skip = false;
+		for (i = 0; i < len; i++)
+		{
+			if (IS_HIGHBIT_SET(localebuf[i]))
+			{
+				skip = true;
+				break;
+			}
+		}
+		if (skip)
+		{
+			elog(DEBUG1, "locale name has non-ASCII characters, skipped: \"%s\"", localebuf);
+			continue;
+		}
+
+		enc = pg_get_encoding_from_locale(localebuf, false);
+		if (enc < 0)
+		{
+			/* error message printed by pg_get_encoding_from_locale() */
+			continue;
+		}
+		if (!PG_VALID_BE_ENCODING(enc))
+			continue;			/* ignore locales for client-only encodings */
+		if (enc == PG_SQL_ASCII)
+			continue;			/* C/POSIX are already in the catalog */
+
+		count++;
+
+		CollationCreate(localebuf, nspid, GetUserId(), enc,
+						localebuf, localebuf, if_not_exists);
+
+		CommandCounterIncrement();
+
+		/*
+		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
+		 * ease of use.  Note that collation names are unique per encoding
+		 * only, so this doesn't clash with "en_US" for LATIN1, say.
+		 *
+		 * This always runs in "if not exists" mode, to skip aliases that
+		 * conflict with an existing locale name for the same encoding.  For
+		 * example, "br_FR.iso88591" is normalized to "br_FR", both for
+		 * encoding LATIN1.  But the unnormalized locale "br_FR" already
+		 * exists for LATIN1.
+		 */
+		if (normalize_locale_name(alias, localebuf))
+		{
+			CollationCreate(alias, nspid, GetUserId(), enc,
+							localebuf, localebuf, true);
+			CommandCounterIncrement();
+		}
+	}
+
+	ClosePipeStream(locale_a_handle);
+
+	if (count == 0)
+		ereport(ERROR,
+				(errmsg("no usable system locales were found")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index c8a8c52..498729c 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1587,42 +1587,6 @@ setup_description(FILE *cmdfd)
 	PG_CMD_PUTS("DROP TABLE tmp_pg_shdescription;\n\n");
 }
 
-#ifdef HAVE_LOCALE_T
-/*
- * "Normalize" a locale name, stripping off encoding tags such as
- * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
- * -> "br_FR@euro").  Return true if a new, different name was
- * generated.
- */
-static bool
-normalize_locale_name(char *new, const char *old)
-{
-	char	   *n = new;
-	const char *o = old;
-	bool		changed = false;
-
-	while (*o)
-	{
-		if (*o == '.')
-		{
-			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
-			o++;
-			while ((*o >= 'A' && *o <= 'Z')
-				   || (*o >= 'a' && *o <= 'z')
-				   || (*o >= '0' && *o <= '9')
-				   || (*o == '-'))
-				o++;
-			changed = true;
-		}
-		else
-			*n++ = *o++;
-	}
-	*n = '\0';
-
-	return changed;
-}
-#endif   /* HAVE_LOCALE_T */
-
 /*
  * populate pg_collation
  */
@@ -1630,134 +1594,10 @@ static void
 setup_collation(FILE *cmdfd)
 {
 #if defined(HAVE_LOCALE_T) && !defined(WIN32)
-	int			i;
-	FILE	   *locale_a_handle;
-	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
-	int			count = 0;
-
-	locale_a_handle = popen_check("locale -a", "r");
-	if (!locale_a_handle)
-		return;					/* complaint already printed */
-
-	PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_collation ( "
-				"	collname name, "
-				"	locale name, "
-				"	encoding int) WITHOUT OIDS;\n\n");
-
-	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
-	{
-		size_t		len;
-		int			enc;
-		bool		skip;
-		char	   *quoted_locale;
-		char		alias[NAMEDATALEN];
-
-		len = strlen(localebuf);
-
-		if (len == 0 || localebuf[len - 1] != '\n')
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name too long, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-		localebuf[len - 1] = '\0';
-
-		/*
-		 * Some systems have locale names that don't consist entirely of ASCII
-		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
-		 * pretty silly, since we need the locale itself to interpret the
-		 * non-ASCII characters. We can't do much with those, so we filter
-		 * them out.
-		 */
-		skip = false;
-		for (i = 0; i < len; i++)
-		{
-			if (IS_HIGHBIT_SET(localebuf[i]))
-			{
-				skip = true;
-				break;
-			}
-		}
-		if (skip)
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name has non-ASCII characters, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-
-		enc = pg_get_encoding_from_locale(localebuf, debug);
-		if (enc < 0)
-		{
-			/* error message printed by pg_get_encoding_from_locale() */
-			continue;
-		}
-		if (!PG_VALID_BE_ENCODING(enc))
-			continue;			/* ignore locales for client-only encodings */
-		if (enc == PG_SQL_ASCII)
-			continue;			/* C/POSIX are already in the catalog */
-
-		count++;
-
-		quoted_locale = escape_quotes(localebuf);
-
-		PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-					   quoted_locale, quoted_locale, enc);
-
-		/*
-		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
-		 * ease of use.  Note that collation names are unique per encoding
-		 * only, so this doesn't clash with "en_US" for LATIN1, say.
-		 */
-		if (normalize_locale_name(alias, localebuf))
-		{
-			char	   *quoted_alias = escape_quotes(alias);
-
-			PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-						   quoted_alias, quoted_locale, enc);
-			free(quoted_alias);
-		}
-		free(quoted_locale);
-	}
+	PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");
 
 	/* Add an SQL-standard name */
-	PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8);
-
-	/*
-	 * When copying collations to the final location, eliminate aliases that
-	 * conflict with an existing locale name for the same encoding.  For
-	 * example, "br_FR.iso88591" is normalized to "br_FR", both for encoding
-	 * LATIN1.  But the unnormalized locale "br_FR" already exists for LATIN1.
-	 * Prefer the alias that matches the OS locale name, else the first locale
-	 * name by sort order (arbitrary choice to be deterministic).
-	 *
-	 * Also, eliminate any aliases that conflict with pg_collation's
-	 * hard-wired entries for "C" etc.
-	 */
-	PG_CMD_PUTS("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) "
-				" SELECT DISTINCT ON (collname, encoding)"
-				"   collname, "
-				"   (SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog') AS collnamespace, "
-				"   (SELECT relowner FROM pg_class WHERE relname = 'pg_collation') AS collowner, "
-				"   encoding, locale, locale "
-				"  FROM tmp_pg_collation"
-				"  WHERE NOT EXISTS (SELECT 1 FROM pg_collation WHERE collname = tmp_pg_collation.collname)"
-	 "  ORDER BY collname, encoding, (collname = locale) DESC, locale;\n\n");
-
-	/*
-	 * Even though the table is temp, drop it explicitly so it doesn't get
-	 * copied into template0/postgres databases.
-	 */
-	PG_CMD_PUTS("DROP TABLE tmp_pg_collation;\n\n");
-
-	pclose(locale_a_handle);
-
-	if (count == 0 && !debug)
-	{
-		printf(_("No usable system locales were found.\n"));
-		printf(_("Use the option \"--debug\" to see details.\n"));
-	}
+	PG_CMD_PRINTF2("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, '%s'::regrole, %d, 'C', 'C');\n\n", escape_quotes(username), PG_UTF8);
 #endif   /* not HAVE_LOCALE_T  && not WIN32 */
 }
 
diff --git a/src/include/catalog/pg_collation_fn.h b/src/include/catalog/pg_collation_fn.h
index 574b288..ac1a81d 100644
--- a/src/include/catalog/pg_collation_fn.h
+++ b/src/include/catalog/pg_collation_fn.h
@@ -17,7 +17,8 @@
 extern Oid CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype);
+				const char *collcollate, const char *collctype,
+				bool if_not_exists);
 extern void RemoveCollationById(Oid collationOid);
 
 #endif   /* PG_COLLATION_FN_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 17ec71d..0283310 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5341,6 +5341,9 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,bigint_timestamps,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 4032 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089" _null_ _null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
+DESCR("import collations from operating system");
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,
-- 
2.10.1

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Peter Eisentraut (#1)

Re: move collation import to backend

Hi,

On 2016-10-27 21:56:53 -0400, Peter Eisentraut wrote:

Currently, initdb parses locale -a output to populate pg_collation. If
additional collations are installed in the operating system, it is not
possible to repeat this process, only by doing each step manually. So I
propose to move this to a backend function that can be called
separately, and have initdb call that. Running this logic in the
backend instead of initdb also makes the code simpler. If we add other
collation providers such as ICU, initdb doesn't need to know about that
at all, because all the logic would be contained in the backend.

That generally sounds like a good idea. There's some questions imo:
E.g. what if previously present collations are now unavailable?

I thought about making this a top-level command (IMPORT COLLATIONS ...
?) but decided against it for now, to keep it simple.

Seems ok to me.

/*
* Also forbid matching an any-encoding entry. This test of course is not
* backed up by the unique index, but it's not a problem since we don't
* support adding any-encoding entries after initdb.
*/

Note that this isn't true anymore...

+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{

Uh?

+	bool		if_not_exists = PG_GETARG_BOOL(0);
+	Oid         nspid = PG_GETARG_OID(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	locale_a_handle = OpenPipeStream("locale -a", "r");
+	if (locale_a_handle == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not execute command \"%s\": %m",
+						"locale -a")));

This function needs to have !superuser permissions revoked, which it
afaics currently hasn't.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#2)

1 attachment(s)

Re: move collation import to backend

On 11/12/16 10:38 AM, Andres Freund wrote:

E.g. what if previously present collations are now unavailable?

You get an error message when you try to use the collation. I think
that is a different class of problems.

/*
* Also forbid matching an any-encoding entry. This test of course is not
* backed up by the unique index, but it's not a problem since we don't
* support adding any-encoding entries after initdb.
*/

Note that this isn't true anymore...

I think this is still correct, because the collation import does not
produce any any-encoding entries (collencoding = -1).

+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{

Uh?

Required to avoid compiler warning about missing prototype.

This function needs to have !superuser permissions revoked, which it
afaics currently hasn't.

Done.

New patch attached (includes OID change because of conflict).

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Add-function-to-import-operation-system-collation.patchtext/x-patch; name=v2-0001-Add-function-to-import-operation-system-collation.patchDownload

From bb6710c55df3a5f7023ddcda749e05e05e49bc59 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 29 Nov 2016 12:00:00 -0500
Subject: [PATCH v2] Add function to import operation system collations

Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
---
 src/backend/catalog/pg_collation.c    |  18 +++-
 src/backend/commands/collationcmds.c  | 151 ++++++++++++++++++++++++++++++-
 src/bin/initdb/initdb.c               | 164 +---------------------------------
 src/include/catalog/pg_collation_fn.h |   3 +-
 src/include/catalog/pg_proc.h         |   3 +
 5 files changed, 172 insertions(+), 167 deletions(-)

diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index f37cf37..cda64c4 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -41,7 +41,8 @@ Oid
 CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype)
+				const char *collcollate, const char *collctype,
+				bool if_not_exists)
 {
 	Relation	rel;
 	TupleDesc	tupDesc;
@@ -72,10 +73,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 							  PointerGetDatum(collname),
 							  Int32GetDatum(collencoding),
 							  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
 				(errcode(ERRCODE_DUPLICATE_OBJECT),
-				 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+				 errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
 						collname, pg_encoding_to_char(collencoding))));
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+							collname, pg_encoding_to_char(collencoding))));
+	}
 
 	/*
 	 * Also forbid matching an any-encoding entry.  This test of course is not
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9bba748..f4b7b65 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -136,7 +136,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 							 GetUserId(),
 							 GetDatabaseEncoding(),
 							 collcollate,
-							 collctype);
+							 collctype,
+							 false);
+
+	if (!newoid)
+		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
@@ -177,3 +181,148 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
 				 errmsg("collation \"%s\" already exists in schema \"%s\"",
 						collname, get_namespace_name(nspOid))));
 }
+
+
+/*
+ * "Normalize" a locale name, stripping off encoding tags such as
+ * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
+ * -> "br_FR@euro").  Return true if a new, different name was
+ * generated.
+ */
+static bool
+normalize_locale_name(char *new, const char *old)
+{
+	char	   *n = new;
+	const char *o = old;
+	bool		changed = false;
+
+	while (*o)
+	{
+		if (*o == '.')
+		{
+			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
+			o++;
+			while ((*o >= 'A' && *o <= 'Z')
+				   || (*o >= 'a' && *o <= 'z')
+				   || (*o >= '0' && *o <= '9')
+				   || (*o == '-'))
+				o++;
+			changed = true;
+		}
+		else
+			*n++ = *o++;
+	}
+	*n = '\0';
+
+	return changed;
+}
+
+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
+	bool		if_not_exists = PG_GETARG_BOOL(0);
+	Oid         nspid = PG_GETARG_OID(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to import system collations"))));
+
+	locale_a_handle = OpenPipeStream("locale -a", "r");
+	if (locale_a_handle == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not execute command \"%s\": %m",
+						"locale -a")));
+
+	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
+	{
+		int			i;
+		size_t		len;
+		int			enc;
+		bool		skip;
+		char		alias[NAMEDATALEN];
+
+		len = strlen(localebuf);
+
+		if (len == 0 || localebuf[len - 1] != '\n')
+		{
+			elog(DEBUG1, "locale name too long, skipped: \"%s\"", localebuf);
+			continue;
+		}
+		localebuf[len - 1] = '\0';
+
+		/*
+		 * Some systems have locale names that don't consist entirely of ASCII
+		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
+		 * pretty silly, since we need the locale itself to interpret the
+		 * non-ASCII characters. We can't do much with those, so we filter
+		 * them out.
+		 */
+		skip = false;
+		for (i = 0; i < len; i++)
+		{
+			if (IS_HIGHBIT_SET(localebuf[i]))
+			{
+				skip = true;
+				break;
+			}
+		}
+		if (skip)
+		{
+			elog(DEBUG1, "locale name has non-ASCII characters, skipped: \"%s\"", localebuf);
+			continue;
+		}
+
+		enc = pg_get_encoding_from_locale(localebuf, false);
+		if (enc < 0)
+		{
+			/* error message printed by pg_get_encoding_from_locale() */
+			continue;
+		}
+		if (!PG_VALID_BE_ENCODING(enc))
+			continue;			/* ignore locales for client-only encodings */
+		if (enc == PG_SQL_ASCII)
+			continue;			/* C/POSIX are already in the catalog */
+
+		count++;
+
+		CollationCreate(localebuf, nspid, GetUserId(), enc,
+						localebuf, localebuf, if_not_exists);
+
+		CommandCounterIncrement();
+
+		/*
+		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
+		 * ease of use.  Note that collation names are unique per encoding
+		 * only, so this doesn't clash with "en_US" for LATIN1, say.
+		 *
+		 * This always runs in "if not exists" mode, to skip aliases that
+		 * conflict with an existing locale name for the same encoding.  For
+		 * example, "br_FR.iso88591" is normalized to "br_FR", both for
+		 * encoding LATIN1.  But the unnormalized locale "br_FR" already
+		 * exists for LATIN1.
+		 */
+		if (normalize_locale_name(alias, localebuf))
+		{
+			CollationCreate(alias, nspid, GetUserId(), enc,
+							localebuf, localebuf, true);
+			CommandCounterIncrement();
+		}
+	}
+
+	ClosePipeStream(locale_a_handle);
+
+	if (count == 0)
+		ereport(ERROR,
+				(errmsg("no usable system locales were found")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 24f9cc8..9c1a1fb 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1608,42 +1608,6 @@ setup_description(FILE *cmdfd)
 	PG_CMD_PUTS("DROP TABLE tmp_pg_shdescription;\n\n");
 }
 
-#ifdef HAVE_LOCALE_T
-/*
- * "Normalize" a locale name, stripping off encoding tags such as
- * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
- * -> "br_FR@euro").  Return true if a new, different name was
- * generated.
- */
-static bool
-normalize_locale_name(char *new, const char *old)
-{
-	char	   *n = new;
-	const char *o = old;
-	bool		changed = false;
-
-	while (*o)
-	{
-		if (*o == '.')
-		{
-			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
-			o++;
-			while ((*o >= 'A' && *o <= 'Z')
-				   || (*o >= 'a' && *o <= 'z')
-				   || (*o >= '0' && *o <= '9')
-				   || (*o == '-'))
-				o++;
-			changed = true;
-		}
-		else
-			*n++ = *o++;
-	}
-	*n = '\0';
-
-	return changed;
-}
-#endif   /* HAVE_LOCALE_T */
-
 /*
  * populate pg_collation
  */
@@ -1651,134 +1615,10 @@ static void
 setup_collation(FILE *cmdfd)
 {
 #if defined(HAVE_LOCALE_T) && !defined(WIN32)
-	int			i;
-	FILE	   *locale_a_handle;
-	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
-	int			count = 0;
-
-	locale_a_handle = popen_check("locale -a", "r");
-	if (!locale_a_handle)
-		return;					/* complaint already printed */
-
-	PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_collation ( "
-				"	collname name, "
-				"	locale name, "
-				"	encoding int) WITHOUT OIDS;\n\n");
-
-	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
-	{
-		size_t		len;
-		int			enc;
-		bool		skip;
-		char	   *quoted_locale;
-		char		alias[NAMEDATALEN];
-
-		len = strlen(localebuf);
-
-		if (len == 0 || localebuf[len - 1] != '\n')
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name too long, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-		localebuf[len - 1] = '\0';
-
-		/*
-		 * Some systems have locale names that don't consist entirely of ASCII
-		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
-		 * pretty silly, since we need the locale itself to interpret the
-		 * non-ASCII characters. We can't do much with those, so we filter
-		 * them out.
-		 */
-		skip = false;
-		for (i = 0; i < len; i++)
-		{
-			if (IS_HIGHBIT_SET(localebuf[i]))
-			{
-				skip = true;
-				break;
-			}
-		}
-		if (skip)
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name has non-ASCII characters, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-
-		enc = pg_get_encoding_from_locale(localebuf, debug);
-		if (enc < 0)
-		{
-			/* error message printed by pg_get_encoding_from_locale() */
-			continue;
-		}
-		if (!PG_VALID_BE_ENCODING(enc))
-			continue;			/* ignore locales for client-only encodings */
-		if (enc == PG_SQL_ASCII)
-			continue;			/* C/POSIX are already in the catalog */
-
-		count++;
-
-		quoted_locale = escape_quotes(localebuf);
-
-		PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-					   quoted_locale, quoted_locale, enc);
-
-		/*
-		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
-		 * ease of use.  Note that collation names are unique per encoding
-		 * only, so this doesn't clash with "en_US" for LATIN1, say.
-		 */
-		if (normalize_locale_name(alias, localebuf))
-		{
-			char	   *quoted_alias = escape_quotes(alias);
-
-			PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-						   quoted_alias, quoted_locale, enc);
-			free(quoted_alias);
-		}
-		free(quoted_locale);
-	}
+	PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");
 
 	/* Add an SQL-standard name */
-	PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8);
-
-	/*
-	 * When copying collations to the final location, eliminate aliases that
-	 * conflict with an existing locale name for the same encoding.  For
-	 * example, "br_FR.iso88591" is normalized to "br_FR", both for encoding
-	 * LATIN1.  But the unnormalized locale "br_FR" already exists for LATIN1.
-	 * Prefer the alias that matches the OS locale name, else the first locale
-	 * name by sort order (arbitrary choice to be deterministic).
-	 *
-	 * Also, eliminate any aliases that conflict with pg_collation's
-	 * hard-wired entries for "C" etc.
-	 */
-	PG_CMD_PUTS("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) "
-				" SELECT DISTINCT ON (collname, encoding)"
-				"   collname, "
-				"   (SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog') AS collnamespace, "
-				"   (SELECT relowner FROM pg_class WHERE relname = 'pg_collation') AS collowner, "
-				"   encoding, locale, locale "
-				"  FROM tmp_pg_collation"
-				"  WHERE NOT EXISTS (SELECT 1 FROM pg_collation WHERE collname = tmp_pg_collation.collname)"
-	 "  ORDER BY collname, encoding, (collname = locale) DESC, locale;\n\n");
-
-	/*
-	 * Even though the table is temp, drop it explicitly so it doesn't get
-	 * copied into template0/postgres databases.
-	 */
-	PG_CMD_PUTS("DROP TABLE tmp_pg_collation;\n\n");
-
-	pclose(locale_a_handle);
-
-	if (count == 0 && !debug)
-	{
-		printf(_("No usable system locales were found.\n"));
-		printf(_("Use the option \"--debug\" to see details.\n"));
-	}
+	PG_CMD_PRINTF2("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, '%s'::regrole, %d, 'C', 'C');\n\n", escape_quotes(username), PG_UTF8);
 #endif   /* not HAVE_LOCALE_T  && not WIN32 */
 }
 
diff --git a/src/include/catalog/pg_collation_fn.h b/src/include/catalog/pg_collation_fn.h
index 574b288..ac1a81d 100644
--- a/src/include/catalog/pg_collation_fn.h
+++ b/src/include/catalog/pg_collation_fn.h
@@ -17,7 +17,8 @@
 extern Oid CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype);
+				const char *collcollate, const char *collctype,
+				bool if_not_exists);
 extern void RemoveCollationById(Oid collationOid);
 
 #endif   /* PG_COLLATION_FN_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 047a1ce..6d98188 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5343,6 +5343,9 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,bigint_timestamps,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089" _null_ _null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
+DESCR("import collations from operating system");
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,
-- 
2.10.2

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Peter Eisentraut (#3)

Re: move collation import to backend

On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:

On 11/12/16 10:38 AM, Andres Freund wrote:

/*
* Also forbid matching an any-encoding entry. This test of course is not
* backed up by the unique index, but it's not a problem since we don't
* support adding any-encoding entries after initdb.
*/

Note that this isn't true anymore...

I think this is still correct, because the collation import does not
produce any any-encoding entries (collencoding = -1).

Well, the comment "don't support adding any-encoding entries after
initdb." is now wrong.

+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
Uh?
Required to avoid compiler warning about missing prototype.

It seems not to be project style to have prototypes in the middle of the
file...

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Andres Freund (#4)

Re: move collation import to backend

Andres Freund <andres@anarazel.de> writes:

On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:

Required to avoid compiler warning about missing prototype.

It seems not to be project style to have prototypes in the middle of the
file...

I agree. Please put that in builtins.h, if you can't find any better
header for it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#4)

Re: move collation import to backend

On 11/29/16 2:53 PM, Andres Freund wrote:

On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:

On 11/12/16 10:38 AM, Andres Freund wrote:

/*
* Also forbid matching an any-encoding entry. This test of course is not
* backed up by the unique index, but it's not a problem since we don't
* support adding any-encoding entries after initdb.
*/

Note that this isn't true anymore...

I think this is still correct, because the collation import does not
produce any any-encoding entries (collencoding = -1).

Well, the comment "don't support adding any-encoding entries after
initdb." is now wrong.

I think there is a misunderstanding. The comment says that we don't
support adding encodings that have collencoding = -1 after initdb. That
is still true. Note that the original comment as two "any"'s. With
this patch, we would now support adding collations with collencoding <>
-1 after initdb.

+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
Uh?
Required to avoid compiler warning about missing prototype.
It seems not to be project style to have prototypes in the middle of the
file...

OK, will fix.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Peter Eisentraut (#6)

Re: move collation import to backend

On Thu, Dec 1, 2016 at 12:18 AM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

+
+Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
Uh?
Required to avoid compiler warning about missing prototype.
It seems not to be project style to have prototypes in the middle of the
file...
OK, will fix.

Moved to next CF with "waiting on author" status.

Regards,
Hari Babu
Fujitsu Australia

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

about 9 years ago

In reply to: Peter Eisentraut (#6)

1 attachment(s)

Re: move collation import to backend

On 11/30/16 8:18 AM, Peter Eisentraut wrote:

It seems not to be project style to have prototypes in the middle of the
file...

OK, will fix.

Updated patch with that fix.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-Add-function-to-import-operation-system-collation.patchtext/x-patch; name=v3-0001-Add-function-to-import-operation-system-collation.patchDownload

From 0c17610b698cc335bc0aed1a66d151e96f618537 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Wed, 30 Nov 2016 12:00:00 -0500
Subject: [PATCH v3] Add function to import operation system collations

Move this logic out of initdb into a user-callable function.  This
simplifies the code and makes it possible to update the standard
collations later on if additional operating system collations appear.
---
 src/backend/catalog/pg_collation.c    |  18 +++-
 src/backend/commands/collationcmds.c  | 149 +++++++++++++++++++++++++++++-
 src/bin/initdb/initdb.c               | 164 +---------------------------------
 src/include/catalog/pg_collation_fn.h |   3 +-
 src/include/catalog/pg_proc.h         |   3 +
 src/include/commands/collationcmds.h  |   2 +
 6 files changed, 172 insertions(+), 167 deletions(-)

diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index f37cf37c4a..cda64c44a1 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -41,7 +41,8 @@ Oid
 CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype)
+				const char *collcollate, const char *collctype,
+				bool if_not_exists)
 {
 	Relation	rel;
 	TupleDesc	tupDesc;
@@ -72,10 +73,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 							  PointerGetDatum(collname),
 							  Int32GetDatum(collencoding),
 							  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
 				(errcode(ERRCODE_DUPLICATE_OBJECT),
-				 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+				 errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
 						collname, pg_encoding_to_char(collencoding))));
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+							collname, pg_encoding_to_char(collencoding))));
+	}
 
 	/*
 	 * Also forbid matching an any-encoding entry.  This test of course is not
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9bba748708..eafc0a99fa 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -136,7 +136,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 							 GetUserId(),
 							 GetDatabaseEncoding(),
 							 collcollate,
-							 collctype);
+							 collctype,
+							 false);
+
+	if (!newoid)
+		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
@@ -177,3 +181,146 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
 				 errmsg("collation \"%s\" already exists in schema \"%s\"",
 						collname, get_namespace_name(nspOid))));
 }
+
+
+/*
+ * "Normalize" a locale name, stripping off encoding tags such as
+ * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
+ * -> "br_FR@euro").  Return true if a new, different name was
+ * generated.
+ */
+static bool
+normalize_locale_name(char *new, const char *old)
+{
+	char	   *n = new;
+	const char *o = old;
+	bool		changed = false;
+
+	while (*o)
+	{
+		if (*o == '.')
+		{
+			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
+			o++;
+			while ((*o >= 'A' && *o <= 'Z')
+				   || (*o >= 'a' && *o <= 'z')
+				   || (*o >= '0' && *o <= '9')
+				   || (*o == '-'))
+				o++;
+			changed = true;
+		}
+		else
+			*n++ = *o++;
+	}
+	*n = '\0';
+
+	return changed;
+}
+
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
+	bool		if_not_exists = PG_GETARG_BOOL(0);
+	Oid         nspid = PG_GETARG_OID(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to import system collations"))));
+
+	locale_a_handle = OpenPipeStream("locale -a", "r");
+	if (locale_a_handle == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not execute command \"%s\": %m",
+						"locale -a")));
+
+	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
+	{
+		int			i;
+		size_t		len;
+		int			enc;
+		bool		skip;
+		char		alias[NAMEDATALEN];
+
+		len = strlen(localebuf);
+
+		if (len == 0 || localebuf[len - 1] != '\n')
+		{
+			elog(DEBUG1, "locale name too long, skipped: \"%s\"", localebuf);
+			continue;
+		}
+		localebuf[len - 1] = '\0';
+
+		/*
+		 * Some systems have locale names that don't consist entirely of ASCII
+		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
+		 * pretty silly, since we need the locale itself to interpret the
+		 * non-ASCII characters. We can't do much with those, so we filter
+		 * them out.
+		 */
+		skip = false;
+		for (i = 0; i < len; i++)
+		{
+			if (IS_HIGHBIT_SET(localebuf[i]))
+			{
+				skip = true;
+				break;
+			}
+		}
+		if (skip)
+		{
+			elog(DEBUG1, "locale name has non-ASCII characters, skipped: \"%s\"", localebuf);
+			continue;
+		}
+
+		enc = pg_get_encoding_from_locale(localebuf, false);
+		if (enc < 0)
+		{
+			/* error message printed by pg_get_encoding_from_locale() */
+			continue;
+		}
+		if (!PG_VALID_BE_ENCODING(enc))
+			continue;			/* ignore locales for client-only encodings */
+		if (enc == PG_SQL_ASCII)
+			continue;			/* C/POSIX are already in the catalog */
+
+		count++;
+
+		CollationCreate(localebuf, nspid, GetUserId(), enc,
+						localebuf, localebuf, if_not_exists);
+
+		CommandCounterIncrement();
+
+		/*
+		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
+		 * ease of use.  Note that collation names are unique per encoding
+		 * only, so this doesn't clash with "en_US" for LATIN1, say.
+		 *
+		 * This always runs in "if not exists" mode, to skip aliases that
+		 * conflict with an existing locale name for the same encoding.  For
+		 * example, "br_FR.iso88591" is normalized to "br_FR", both for
+		 * encoding LATIN1.  But the unnormalized locale "br_FR" already
+		 * exists for LATIN1.
+		 */
+		if (normalize_locale_name(alias, localebuf))
+		{
+			CollationCreate(alias, nspid, GetUserId(), enc,
+							localebuf, localebuf, true);
+			CommandCounterIncrement();
+		}
+	}
+
+	ClosePipeStream(locale_a_handle);
+
+	if (count == 0)
+		ereport(ERROR,
+				(errmsg("no usable system locales were found")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 24f9cc8eae..9c1a1fb938 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1608,42 +1608,6 @@ setup_description(FILE *cmdfd)
 	PG_CMD_PUTS("DROP TABLE tmp_pg_shdescription;\n\n");
 }
 
-#ifdef HAVE_LOCALE_T
-/*
- * "Normalize" a locale name, stripping off encoding tags such as
- * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
- * -> "br_FR@euro").  Return true if a new, different name was
- * generated.
- */
-static bool
-normalize_locale_name(char *new, const char *old)
-{
-	char	   *n = new;
-	const char *o = old;
-	bool		changed = false;
-
-	while (*o)
-	{
-		if (*o == '.')
-		{
-			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
-			o++;
-			while ((*o >= 'A' && *o <= 'Z')
-				   || (*o >= 'a' && *o <= 'z')
-				   || (*o >= '0' && *o <= '9')
-				   || (*o == '-'))
-				o++;
-			changed = true;
-		}
-		else
-			*n++ = *o++;
-	}
-	*n = '\0';
-
-	return changed;
-}
-#endif   /* HAVE_LOCALE_T */
-
 /*
  * populate pg_collation
  */
@@ -1651,134 +1615,10 @@ static void
 setup_collation(FILE *cmdfd)
 {
 #if defined(HAVE_LOCALE_T) && !defined(WIN32)
-	int			i;
-	FILE	   *locale_a_handle;
-	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
-	int			count = 0;
-
-	locale_a_handle = popen_check("locale -a", "r");
-	if (!locale_a_handle)
-		return;					/* complaint already printed */
-
-	PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_collation ( "
-				"	collname name, "
-				"	locale name, "
-				"	encoding int) WITHOUT OIDS;\n\n");
-
-	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
-	{
-		size_t		len;
-		int			enc;
-		bool		skip;
-		char	   *quoted_locale;
-		char		alias[NAMEDATALEN];
-
-		len = strlen(localebuf);
-
-		if (len == 0 || localebuf[len - 1] != '\n')
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name too long, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-		localebuf[len - 1] = '\0';
-
-		/*
-		 * Some systems have locale names that don't consist entirely of ASCII
-		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
-		 * pretty silly, since we need the locale itself to interpret the
-		 * non-ASCII characters. We can't do much with those, so we filter
-		 * them out.
-		 */
-		skip = false;
-		for (i = 0; i < len; i++)
-		{
-			if (IS_HIGHBIT_SET(localebuf[i]))
-			{
-				skip = true;
-				break;
-			}
-		}
-		if (skip)
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name has non-ASCII characters, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-
-		enc = pg_get_encoding_from_locale(localebuf, debug);
-		if (enc < 0)
-		{
-			/* error message printed by pg_get_encoding_from_locale() */
-			continue;
-		}
-		if (!PG_VALID_BE_ENCODING(enc))
-			continue;			/* ignore locales for client-only encodings */
-		if (enc == PG_SQL_ASCII)
-			continue;			/* C/POSIX are already in the catalog */
-
-		count++;
-
-		quoted_locale = escape_quotes(localebuf);
-
-		PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-					   quoted_locale, quoted_locale, enc);
-
-		/*
-		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
-		 * ease of use.  Note that collation names are unique per encoding
-		 * only, so this doesn't clash with "en_US" for LATIN1, say.
-		 */
-		if (normalize_locale_name(alias, localebuf))
-		{
-			char	   *quoted_alias = escape_quotes(alias);
-
-			PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-						   quoted_alias, quoted_locale, enc);
-			free(quoted_alias);
-		}
-		free(quoted_locale);
-	}
+	PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");
 
 	/* Add an SQL-standard name */
-	PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8);
-
-	/*
-	 * When copying collations to the final location, eliminate aliases that
-	 * conflict with an existing locale name for the same encoding.  For
-	 * example, "br_FR.iso88591" is normalized to "br_FR", both for encoding
-	 * LATIN1.  But the unnormalized locale "br_FR" already exists for LATIN1.
-	 * Prefer the alias that matches the OS locale name, else the first locale
-	 * name by sort order (arbitrary choice to be deterministic).
-	 *
-	 * Also, eliminate any aliases that conflict with pg_collation's
-	 * hard-wired entries for "C" etc.
-	 */
-	PG_CMD_PUTS("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) "
-				" SELECT DISTINCT ON (collname, encoding)"
-				"   collname, "
-				"   (SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog') AS collnamespace, "
-				"   (SELECT relowner FROM pg_class WHERE relname = 'pg_collation') AS collowner, "
-				"   encoding, locale, locale "
-				"  FROM tmp_pg_collation"
-				"  WHERE NOT EXISTS (SELECT 1 FROM pg_collation WHERE collname = tmp_pg_collation.collname)"
-	 "  ORDER BY collname, encoding, (collname = locale) DESC, locale;\n\n");
-
-	/*
-	 * Even though the table is temp, drop it explicitly so it doesn't get
-	 * copied into template0/postgres databases.
-	 */
-	PG_CMD_PUTS("DROP TABLE tmp_pg_collation;\n\n");
-
-	pclose(locale_a_handle);
-
-	if (count == 0 && !debug)
-	{
-		printf(_("No usable system locales were found.\n"));
-		printf(_("Use the option \"--debug\" to see details.\n"));
-	}
+	PG_CMD_PRINTF2("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, '%s'::regrole, %d, 'C', 'C');\n\n", escape_quotes(username), PG_UTF8);
 #endif   /* not HAVE_LOCALE_T  && not WIN32 */
 }
 
diff --git a/src/include/catalog/pg_collation_fn.h b/src/include/catalog/pg_collation_fn.h
index 574b288acc..ac1a81d620 100644
--- a/src/include/catalog/pg_collation_fn.h
+++ b/src/include/catalog/pg_collation_fn.h
@@ -17,7 +17,8 @@
 extern Oid CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype);
+				const char *collcollate, const char *collctype,
+				bool if_not_exists);
 extern void RemoveCollationById(Oid collationOid);
 
 #endif   /* PG_COLLATION_FN_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index cd7b909812..b81ceca774 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5345,6 +5345,9 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,bigint_timestamps,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089" _null_ _null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
+DESCR("import collations from operating system");
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,
diff --git a/src/include/commands/collationcmds.h b/src/include/commands/collationcmds.h
index 073314e76d..f7c746bc17 100644
--- a/src/include/commands/collationcmds.h
+++ b/src/include/commands/collationcmds.h
@@ -21,4 +21,6 @@
 extern ObjectAddress DefineCollation(ParseState *pstate, List *names, List *parameters);
 extern void IsThereCollationInNamespace(const char *collname, Oid nspOid);
 
+extern Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
 #endif   /* COLLATIONCMDS_H */
-- 
2.11.0

Euler Taveira

euler@timbira.com.br

about 9 years ago

In reply to: Peter Eisentraut (#8)

2 attachment(s)

Re: move collation import to backend

On 18-12-2016 18:30, Peter Eisentraut wrote:

Updated patch with that fix.

Peter, I reviewed and improved your patch.

* I document the new function. Since collation is a database object, I
chose "Database Object Management Functions" section.
* I've added a check to any-encoding database because I got 'FATAL:
collation "C" already exists' on Debian 8, although, I didn't get on
CentOS 7. The problem is that Debian has two locales for C (C and
C.UTF-8) and CentOS has just one (C).
* I've added OidIsValid to test the new collation row.
* I've changed the parameter order. Schema seems more important than
if_not_exists. Also, we generally leave those boolean parameters for the
end of list. I don't turn if_not_exists optional but IMO it would be a
good idea (default = true).
* You removed some #if and #ifdef while moving things around. I put it back.
* You didn't pgident some lines of code but I'm sure you didn't for a
small patch footprint.
* I didn't test on Windows.
* As a last comment, you set cost = 100 and it seems reasonable because
it lasts 411 ms to scan/load 923 collations in my slow VM. However,
successive executions takes ~1200 ms.

I'm attaching the complete and also a patch at the top of your last patch.

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Attachments:

collation-import-fixes.difftext/x-patch; name=collation-import-fixes.diffDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 10e3186..1e52a48 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19190,6 +19190,38 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
     in the database's default tablespace, the tablespace can be specified as 0.
    </para>
 
+   <para>
+   Operating system collations are loaded with the
+   <function>pg_import_system_collations</> function, shown in <xref
+   linkend="functions-import-collation">.
+   </para>
+
+   <table id="functions-import-collation">
+    <title>Collation Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <indexterm><primary>pg_import_system_collations</primary></indexterm>
+        <literal><function>pg_import_system_collations(<parameter>schema</> <type>regnamespace</>, <parameter>if_not_exists</> <type>boolean</>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>Import operating system collations</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   <function>pg_import_system_collations</> loads collations that it finds on
+   the operating system into system catalog <literal>pg_collation</literal>,
+   skipping those that are already present.
+   </para>
+
   </sect2>
 
   <sect2 id="functions-admin-index">
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index 63c2eb9..694c0f6 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -98,10 +98,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 							  PointerGetDatum(collname),
 							  Int32GetDatum(-1),
 							  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("collation \"%s\" already exists, skipping",
+						collname)));
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
 				(errcode(ERRCODE_DUPLICATE_OBJECT),
 				 errmsg("collation \"%s\" already exists",
 						collname)));
+	}
 
 	/* open pg_collation */
 	rel = heap_open(CollationRelationId, RowExclusiveLock);
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index e108b50..cf3acea 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -139,7 +139,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 							 collctype,
 							 false);
 
-	if (!newoid)
+	if (!OidIsValid(newoid))
 		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
@@ -183,6 +183,7 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
 }
 
 
+#ifdef HAVE_LOCALE_T
 /*
  * "Normalize" a locale name, stripping off encoding tags such as
  * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
@@ -216,13 +217,15 @@ normalize_locale_name(char *new, const char *old)
 
 	return changed;
 }
+#endif	/* HAVE_LOCALE_T */
 
 
 Datum
 pg_import_system_collations(PG_FUNCTION_ARGS)
 {
-	bool		if_not_exists = PG_GETARG_BOOL(0);
-	Oid         nspid = PG_GETARG_OID(1);
+#if defined(HAVE_LOCALE_T) && !defined(WIN32)
+	Oid         nspid = PG_GETARG_OID(0);
+	bool		if_not_exists = PG_GETARG_BOOL(1);
 
 	FILE	   *locale_a_handle;
 	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
@@ -321,6 +324,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 	if (count == 0)
 		ereport(ERROR,
 				(errmsg("no usable system locales were found")));
+#endif	/* not HAVE_LOCALE_T && not WIN32 */
 
 	PG_RETURN_VOID();
 }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b0126a9..bb8637e 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5345,7 +5345,7 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,bigint_timestamps,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
-DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089" _null_ _null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
+DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "4089 16" _null_ _null_ "{schema,if_not_exists}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
 
 /*

collation-import-complete.difftext/x-patch; name=collation-import-complete.diffDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 10e3186..1e52a48 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19190,6 +19190,38 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
     in the database's default tablespace, the tablespace can be specified as 0.
    </para>
 
+   <para>
+   Operating system collations are loaded with the
+   <function>pg_import_system_collations</> function, shown in <xref
+   linkend="functions-import-collation">.
+   </para>
+
+   <table id="functions-import-collation">
+    <title>Collation Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <indexterm><primary>pg_import_system_collations</primary></indexterm>
+        <literal><function>pg_import_system_collations(<parameter>schema</> <type>regnamespace</>, <parameter>if_not_exists</> <type>boolean</>)</function></literal>
+       </entry>
+       <entry><type>void</type></entry>
+       <entry>Import operating system collations</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+   <function>pg_import_system_collations</> loads collations that it finds on
+   the operating system into system catalog <literal>pg_collation</literal>,
+   skipping those that are already present.
+   </para>
+
   </sect2>
 
   <sect2 id="functions-admin-index">
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index fa42ad5..694c0f6 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -41,7 +41,8 @@ Oid
 CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype)
+				const char *collcollate, const char *collctype,
+				bool if_not_exists)
 {
 	Relation	rel;
 	TupleDesc	tupDesc;
@@ -72,10 +73,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 							  PointerGetDatum(collname),
 							  Int32GetDatum(collencoding),
 							  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
 				(errcode(ERRCODE_DUPLICATE_OBJECT),
-				 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+				 errmsg("collation \"%s\" for encoding \"%s\" already exists, skipping",
 						collname, pg_encoding_to_char(collencoding))));
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("collation \"%s\" for encoding \"%s\" already exists",
+							collname, pg_encoding_to_char(collencoding))));
+	}
 
 	/*
 	 * Also forbid matching an any-encoding entry.  This test of course is not
@@ -86,10 +98,21 @@ CollationCreate(const char *collname, Oid collnamespace,
 							  PointerGetDatum(collname),
 							  Int32GetDatum(-1),
 							  ObjectIdGetDatum(collnamespace)))
-		ereport(ERROR,
+	{
+		if (if_not_exists)
+		{
+			ereport(NOTICE,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("collation \"%s\" already exists, skipping",
+						collname)));
+			return InvalidOid;
+		}
+		else
+			ereport(ERROR,
 				(errcode(ERRCODE_DUPLICATE_OBJECT),
 				 errmsg("collation \"%s\" already exists",
 						collname)));
+	}
 
 	/* open pg_collation */
 	rel = heap_open(CollationRelationId, RowExclusiveLock);
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index ccadfc2..cf3acea 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -136,7 +136,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters)
 							 GetUserId(),
 							 GetDatabaseEncoding(),
 							 collcollate,
-							 collctype);
+							 collctype,
+							 false);
+
+	if (!OidIsValid(newoid))
+		return InvalidObjectAddress;
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
@@ -177,3 +181,150 @@ IsThereCollationInNamespace(const char *collname, Oid nspOid)
 				 errmsg("collation \"%s\" already exists in schema \"%s\"",
 						collname, get_namespace_name(nspOid))));
 }
+
+
+#ifdef HAVE_LOCALE_T
+/*
+ * "Normalize" a locale name, stripping off encoding tags such as
+ * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
+ * -> "br_FR@euro").  Return true if a new, different name was
+ * generated.
+ */
+static bool
+normalize_locale_name(char *new, const char *old)
+{
+	char	   *n = new;
+	const char *o = old;
+	bool		changed = false;
+
+	while (*o)
+	{
+		if (*o == '.')
+		{
+			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
+			o++;
+			while ((*o >= 'A' && *o <= 'Z')
+				   || (*o >= 'a' && *o <= 'z')
+				   || (*o >= '0' && *o <= '9')
+				   || (*o == '-'))
+				o++;
+			changed = true;
+		}
+		else
+			*n++ = *o++;
+	}
+	*n = '\0';
+
+	return changed;
+}
+#endif	/* HAVE_LOCALE_T */
+
+
+Datum
+pg_import_system_collations(PG_FUNCTION_ARGS)
+{
+#if defined(HAVE_LOCALE_T) && !defined(WIN32)
+	Oid         nspid = PG_GETARG_OID(0);
+	bool		if_not_exists = PG_GETARG_BOOL(1);
+
+	FILE	   *locale_a_handle;
+	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
+	int			count = 0;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 (errmsg("must be superuser to import system collations"))));
+
+	locale_a_handle = OpenPipeStream("locale -a", "r");
+	if (locale_a_handle == NULL)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not execute command \"%s\": %m",
+						"locale -a")));
+
+	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
+	{
+		int			i;
+		size_t		len;
+		int			enc;
+		bool		skip;
+		char		alias[NAMEDATALEN];
+
+		len = strlen(localebuf);
+
+		if (len == 0 || localebuf[len - 1] != '\n')
+		{
+			elog(DEBUG1, "locale name too long, skipped: \"%s\"", localebuf);
+			continue;
+		}
+		localebuf[len - 1] = '\0';
+
+		/*
+		 * Some systems have locale names that don't consist entirely of ASCII
+		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
+		 * pretty silly, since we need the locale itself to interpret the
+		 * non-ASCII characters. We can't do much with those, so we filter
+		 * them out.
+		 */
+		skip = false;
+		for (i = 0; i < len; i++)
+		{
+			if (IS_HIGHBIT_SET(localebuf[i]))
+			{
+				skip = true;
+				break;
+			}
+		}
+		if (skip)
+		{
+			elog(DEBUG1, "locale name has non-ASCII characters, skipped: \"%s\"", localebuf);
+			continue;
+		}
+
+		enc = pg_get_encoding_from_locale(localebuf, false);
+		if (enc < 0)
+		{
+			/* error message printed by pg_get_encoding_from_locale() */
+			continue;
+		}
+		if (!PG_VALID_BE_ENCODING(enc))
+			continue;			/* ignore locales for client-only encodings */
+		if (enc == PG_SQL_ASCII)
+			continue;			/* C/POSIX are already in the catalog */
+
+		count++;
+
+		CollationCreate(localebuf, nspid, GetUserId(), enc,
+						localebuf, localebuf, if_not_exists);
+
+		CommandCounterIncrement();
+
+		/*
+		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
+		 * ease of use.  Note that collation names are unique per encoding
+		 * only, so this doesn't clash with "en_US" for LATIN1, say.
+		 *
+		 * This always runs in "if not exists" mode, to skip aliases that
+		 * conflict with an existing locale name for the same encoding.  For
+		 * example, "br_FR.iso88591" is normalized to "br_FR", both for
+		 * encoding LATIN1.  But the unnormalized locale "br_FR" already
+		 * exists for LATIN1.
+		 */
+		if (normalize_locale_name(alias, localebuf))
+		{
+			CollationCreate(alias, nspid, GetUserId(), enc,
+							localebuf, localebuf, true);
+			CommandCounterIncrement();
+		}
+	}
+
+	ClosePipeStream(locale_a_handle);
+
+	if (count == 0)
+		ereport(ERROR,
+				(errmsg("no usable system locales were found")));
+#endif	/* not HAVE_LOCALE_T && not WIN32 */
+
+	PG_RETURN_VOID();
+}
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 1e7d677..c378595 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1608,42 +1608,6 @@ setup_description(FILE *cmdfd)
 	PG_CMD_PUTS("DROP TABLE tmp_pg_shdescription;\n\n");
 }
 
-#ifdef HAVE_LOCALE_T
-/*
- * "Normalize" a locale name, stripping off encoding tags such as
- * ".utf8" (e.g., "en_US.utf8" -> "en_US", but "br_FR.iso885915@euro"
- * -> "br_FR@euro").  Return true if a new, different name was
- * generated.
- */
-static bool
-normalize_locale_name(char *new, const char *old)
-{
-	char	   *n = new;
-	const char *o = old;
-	bool		changed = false;
-
-	while (*o)
-	{
-		if (*o == '.')
-		{
-			/* skip over encoding tag such as ".utf8" or ".UTF-8" */
-			o++;
-			while ((*o >= 'A' && *o <= 'Z')
-				   || (*o >= 'a' && *o <= 'z')
-				   || (*o >= '0' && *o <= '9')
-				   || (*o == '-'))
-				o++;
-			changed = true;
-		}
-		else
-			*n++ = *o++;
-	}
-	*n = '\0';
-
-	return changed;
-}
-#endif   /* HAVE_LOCALE_T */
-
 /*
  * populate pg_collation
  */
@@ -1651,134 +1615,10 @@ static void
 setup_collation(FILE *cmdfd)
 {
 #if defined(HAVE_LOCALE_T) && !defined(WIN32)
-	int			i;
-	FILE	   *locale_a_handle;
-	char		localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
-	int			count = 0;
-
-	locale_a_handle = popen_check("locale -a", "r");
-	if (!locale_a_handle)
-		return;					/* complaint already printed */
-
-	PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_collation ( "
-				"	collname name, "
-				"	locale name, "
-				"	encoding int) WITHOUT OIDS;\n\n");
-
-	while (fgets(localebuf, sizeof(localebuf), locale_a_handle))
-	{
-		size_t		len;
-		int			enc;
-		bool		skip;
-		char	   *quoted_locale;
-		char		alias[NAMEDATALEN];
-
-		len = strlen(localebuf);
-
-		if (len == 0 || localebuf[len - 1] != '\n')
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name too long, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-		localebuf[len - 1] = '\0';
-
-		/*
-		 * Some systems have locale names that don't consist entirely of ASCII
-		 * letters (such as "bokm&aring;l" or "fran&ccedil;ais").  This is
-		 * pretty silly, since we need the locale itself to interpret the
-		 * non-ASCII characters. We can't do much with those, so we filter
-		 * them out.
-		 */
-		skip = false;
-		for (i = 0; i < len; i++)
-		{
-			if (IS_HIGHBIT_SET(localebuf[i]))
-			{
-				skip = true;
-				break;
-			}
-		}
-		if (skip)
-		{
-			if (debug)
-				fprintf(stderr, _("%s: locale name has non-ASCII characters, skipped: \"%s\"\n"),
-						progname, localebuf);
-			continue;
-		}
-
-		enc = pg_get_encoding_from_locale(localebuf, debug);
-		if (enc < 0)
-		{
-			/* error message printed by pg_get_encoding_from_locale() */
-			continue;
-		}
-		if (!PG_VALID_BE_ENCODING(enc))
-			continue;			/* ignore locales for client-only encodings */
-		if (enc == PG_SQL_ASCII)
-			continue;			/* C/POSIX are already in the catalog */
-
-		count++;
-
-		quoted_locale = escape_quotes(localebuf);
-
-		PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-					   quoted_locale, quoted_locale, enc);
-
-		/*
-		 * Generate aliases such as "en_US" in addition to "en_US.utf8" for
-		 * ease of use.  Note that collation names are unique per encoding
-		 * only, so this doesn't clash with "en_US" for LATIN1, say.
-		 */
-		if (normalize_locale_name(alias, localebuf))
-		{
-			char	   *quoted_alias = escape_quotes(alias);
-
-			PG_CMD_PRINTF3("INSERT INTO tmp_pg_collation VALUES (E'%s', E'%s', %d);\n\n",
-						   quoted_alias, quoted_locale, enc);
-			free(quoted_alias);
-		}
-		free(quoted_locale);
-	}
+	PG_CMD_PUTS("SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');\n\n");
 
 	/* Add an SQL-standard name */
-	PG_CMD_PRINTF1("INSERT INTO tmp_pg_collation VALUES ('ucs_basic', 'C', %d);\n\n", PG_UTF8);
-
-	/*
-	 * When copying collations to the final location, eliminate aliases that
-	 * conflict with an existing locale name for the same encoding.  For
-	 * example, "br_FR.iso88591" is normalized to "br_FR", both for encoding
-	 * LATIN1.  But the unnormalized locale "br_FR" already exists for LATIN1.
-	 * Prefer the alias that matches the OS locale name, else the first locale
-	 * name by sort order (arbitrary choice to be deterministic).
-	 *
-	 * Also, eliminate any aliases that conflict with pg_collation's
-	 * hard-wired entries for "C" etc.
-	 */
-	PG_CMD_PUTS("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) "
-				" SELECT DISTINCT ON (collname, encoding)"
-				"   collname, "
-				"   (SELECT oid FROM pg_namespace WHERE nspname = 'pg_catalog') AS collnamespace, "
-				"   (SELECT relowner FROM pg_class WHERE relname = 'pg_collation') AS collowner, "
-				"   encoding, locale, locale "
-				"  FROM tmp_pg_collation"
-				"  WHERE NOT EXISTS (SELECT 1 FROM pg_collation WHERE collname = tmp_pg_collation.collname)"
-	 "  ORDER BY collname, encoding, (collname = locale) DESC, locale;\n\n");
-
-	/*
-	 * Even though the table is temp, drop it explicitly so it doesn't get
-	 * copied into template0/postgres databases.
-	 */
-	PG_CMD_PUTS("DROP TABLE tmp_pg_collation;\n\n");
-
-	pclose(locale_a_handle);
-
-	if (count == 0 && !debug)
-	{
-		printf(_("No usable system locales were found.\n"));
-		printf(_("Use the option \"--debug\" to see details.\n"));
-	}
+	PG_CMD_PRINTF2("INSERT INTO pg_collation (collname, collnamespace, collowner, collencoding, collcollate, collctype) VALUES ('ucs_basic', 'pg_catalog'::regnamespace, '%s'::regrole, %d, 'C', 'C');\n\n", escape_quotes(username), PG_UTF8);
 #endif   /* not HAVE_LOCALE_T  && not WIN32 */
 }
 
diff --git a/src/include/catalog/pg_collation_fn.h b/src/include/catalog/pg_collation_fn.h
index 1ea757f..482ba79 100644
--- a/src/include/catalog/pg_collation_fn.h
+++ b/src/include/catalog/pg_collation_fn.h
@@ -17,7 +17,8 @@
 extern Oid CollationCreate(const char *collname, Oid collnamespace,
 				Oid collowner,
 				int32 collencoding,
-				const char *collcollate, const char *collctype);
+				const char *collcollate, const char *collctype,
+				bool if_not_exists);
 extern void RemoveCollationById(Oid collationOid);
 
 #endif   /* PG_COLLATION_FN_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 37e022d..bb8637e 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5345,6 +5345,9 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,bigint_timestamps,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "4089 16" _null_ _null_ "{schema,if_not_exists}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
+DESCR("import collations from operating system");
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,
diff --git a/src/include/commands/collationcmds.h b/src/include/commands/collationcmds.h
index 699ce2f..28ab07f 100644
--- a/src/include/commands/collationcmds.h
+++ b/src/include/commands/collationcmds.h
@@ -21,4 +21,6 @@
 extern ObjectAddress DefineCollation(ParseState *pstate, List *names, List *parameters);
 extern void IsThereCollationInNamespace(const char *collname, Oid nspOid);
 
+extern Datum pg_import_system_collations(PG_FUNCTION_ARGS);
+
 #endif   /* COLLATIONCMDS_H */

#10

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Euler Taveira (#9)

Re: move collation import to backend

On 1/9/17 10:04 PM, Euler Taveira wrote:

On 18-12-2016 18:30, Peter Eisentraut wrote:

Updated patch with that fix.

Peter, I reviewed and improved your patch.

* I document the new function. Since collation is a database object, I
chose "Database Object Management Functions" section.

* I've added a check to any-encoding database because I got 'FATAL:
collation "C" already exists' on Debian 8, although, I didn't get on
CentOS 7. The problem is that Debian has two locales for C (C and
C.UTF-8) and CentOS has just one (C).

* I've added OidIsValid to test the new collation row.

* I've changed the parameter order. Schema seems more important than
if_not_exists. Also, we generally leave those boolean parameters for the
end of list. I don't turn if_not_exists optional but IMO it would be a
good idea (default = true).

I put them that way because in an SQL command the "IF NOT EXISTS" comes
before the schema, but I can see how it is weird that way in a function.

* You removed some #if and #ifdef while moving things around. I put it back.
* You didn't pgident some lines of code but I'm sure you didn't for a
small patch footprint.

I had left the #if in initdb, but I think your changes are better.

I'm attaching the complete and also a patch at the top of your last patch.

Thanks. If there are no more comments, I will proceed with that.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Jeff Janes

jeff.janes@gmail.com

almost 9 years ago

In reply to: Peter Eisentraut (#10)

Re: move collation import to backend

On Tue, Jan 17, 2017 at 9:05 AM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 1/9/17 10:04 PM, Euler Taveira wrote:

On 18-12-2016 18:30, Peter Eisentraut wrote:

Updated patch with that fix.

Peter, I reviewed and improved your patch.

* I document the new function. Since collation is a database object, I
chose "Database Object Management Functions" section.

OK

* I've added a check to any-encoding database because I got 'FATAL:
collation "C" already exists' on Debian 8, although, I didn't get on
CentOS 7. The problem is that Debian has two locales for C (C and
C.UTF-8) and CentOS has just one (C).

OK

* I've added OidIsValid to test the new collation row.

OK

* I've changed the parameter order. Schema seems more important than
if_not_exists. Also, we generally leave those boolean parameters for the
end of list. I don't turn if_not_exists optional but IMO it would be a
good idea (default = true).

I put them that way because in an SQL command the "IF NOT EXISTS" comes
before the schema, but I can see how it is weird that way in a function.

* You removed some #if and #ifdef while moving things around. I put it

back.

* You didn't pgident some lines of code but I'm sure you didn't for a
small patch footprint.

I had left the #if in initdb, but I think your changes are better.

I'm attaching the complete and also a patch at the top of your last

patch.

Thanks. If there are no more comments, I will proceed with that.

With this commit, I'm getting 'make check' fail at initdb with the error:

2017-01-18 07:47:50.565 PST [43691] FATAL: collation "aa_ER@saaho" for
encoding "UTF8" already exists
2017-01-18 07:47:50.565 PST [43691] STATEMENT: SELECT
pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');

My system:

CentOS release 6.8 (Final)
gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)

./configure > /dev/null # no options

$ locale -a|fgrep aa_ER
aa_ER
aa_ER.utf8
aa_ER.utf8@saaho
aa_ER@saaho

I have no idea what @ means in a locale, I'm just relaying the information.

Cheers,

Jeff

#12

Tom Lane

tgl@sss.pgh.pa.us

almost 9 years ago

In reply to: Jeff Janes (#11)

Re: move collation import to backend

Jeff Janes <jeff.janes@gmail.com> writes:

With this commit, I'm getting 'make check' fail at initdb with the error:

2017-01-18 07:47:50.565 PST [43691] FATAL: collation "aa_ER@saaho" for
encoding "UTF8" already exists

Yeah, so are large chunks of the buildfarm. Having now read the patch,
I see that the problem is that it simply ignored the de-duplication
logic that existed in initdb's implementation. That was put there
on the basis of bitter experience, as I recall.

The new code seems to think it's sufficient to do an "if not exists"
insertion when generating abbreviated names, but that's wrong, and
even if it avoided outright failures, it would be nondeterministic
(I doubt "locale -a" is guaranteed to emit locale names in any
particular order).

I think this needs to be reverted pending redesign of the de-duplication
coding.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers