Move definition of standard collations from initdb to pg_collation.dat
While working on [0]/messages/by-id/1293e382-2093-a2bf-a397-c04e8f83d3c2@enterprisedb.com, I was wondering why the collations ucs_basic and
unicode are not in pg_collation.dat. I traced this back through
history, and I think this was just lost in a game of telephone.
The initial commit for pg_collation.h (414c5a2ea6) has only the default
collation in pg_collation.h (pre .dat), with initdb handling everything
else. Over time, additional collations "C" and "POSIX" were moved to
pg_collation.h, and other logic was moved from initdb to
pg_import_system_collations(). But ucs_basic was untouched. Commit
0b13b2a771 rearranged the relative order of operations in initdb and
added the current comment "We don't want to pin these", but looking at
the email[1]/messages/by-id/28195.1498172402@sss.pgh.pa.us, I think this was more a guess about the previous intent.
I suggest we fix this now; see attached patch.
[0]: /messages/by-id/1293e382-2093-a2bf-a397-c04e8f83d3c2@enterprisedb.com
/messages/by-id/1293e382-2093-a2bf-a397-c04e8f83d3c2@enterprisedb.com
Attachments:
0001-Move-definition-of-standard-collations-from-initdb-t.patchtext/plain; charset=UTF-8; name=0001-Move-definition-of-standard-collations-from-initdb-t.patchDownload
From 0d2c6b92a3340833f13bab395e0556ce1f045226 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 28 Mar 2023 12:04:34 +0200
Subject: [PATCH] Move definition of standard collations from initdb to
pg_collation.dat
---
src/bin/initdb/initdb.c | 15 +--------------
src/include/catalog/pg_collation.dat | 7 +++++++
2 files changed, 8 insertions(+), 14 deletions(-)
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index bae97539fc..9ccbf998ec 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1695,20 +1695,7 @@ setup_description(FILE *cmdfd)
static void
setup_collation(FILE *cmdfd)
{
- /*
- * Add SQL-standard names. We don't want to pin these, so they don't go
- * in pg_collation.dat. But add them before reading system collations, so
- * that they win if libc defines a locale with the same name.
- */
- PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, colliculocale)"
- "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, %u, '%c', true, -1, 'und');\n\n",
- BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU);
-
- PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, collcollate, collctype)"
- "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, %d, 'C', 'C');\n\n",
- BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, PG_UTF8);
-
- /* Now import all collations we can find in the operating system */
+ /* Import all collations we can find in the operating system */
PG_CMD_PUTS("SELECT pg_import_system_collations('pg_catalog');\n\n");
}
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index f4bda1c769..14df398ad2 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -23,5 +23,12 @@
descr => 'standard POSIX collation',
collname => 'POSIX', collprovider => 'c', collencoding => '-1',
collcollate => 'POSIX', collctype => 'POSIX' },
+{ oid => '962',
+ descr => 'sorts using the Unicode Collation Algorithm with default settings',
+ collname => 'unicode', collprovider => 'i', collencoding => '-1',
+ colliculocale => 'und' },
+{ oid => '963', descr => 'sorts by Unicode code point',
+ collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
+ collcollate => 'C', collctype => 'C' },
]
--
2.40.0
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
While working on [0], I was wondering why the collations ucs_basic and
unicode are not in pg_collation.dat. I traced this back through
history, and I think this was just lost in a game of telephone.
The initial commit for pg_collation.h (414c5a2ea6) has only the default
collation in pg_collation.h (pre .dat), with initdb handling everything
else. Over time, additional collations "C" and "POSIX" were moved to
pg_collation.h, and other logic was moved from initdb to
pg_import_system_collations(). But ucs_basic was untouched. Commit
0b13b2a771 rearranged the relative order of operations in initdb and
added the current comment "We don't want to pin these", but looking at
the email[1], I think this was more a guess about the previous intent.
Yeah, I was just loath to change the previous behavior in that
patch. I can't see any strong reason not to pin these entries.
I suggest we fix this now; see attached patch.
While we're here, do we want to adopt some other spelling of "the
root locale" than "und", in view of recent discoveries about the
instability of that on old ICU versions?
regards, tom lane
On 28.03.23 13:33, Tom Lane wrote:
While we're here, do we want to adopt some other spelling of "the
root locale" than "und", in view of recent discoveries about the
instability of that on old ICU versions?
That issue was fixed by 3b50275b12, so we can keep using the "und" spelling.