pgsql: Introduce "builtin" collation provider.

Started by Jeff Davisalmost 2 years ago4 messages
#1Jeff Davis
jdavis@postgresql.org

Introduce "builtin" collation provider.

New provider for collations, like "libc" or "icu", but without any
external dependency.

Initially, the only locale supported by the builtin provider is "C",
which is identical to the libc provider's "C" locale. The libc
provider's "C" locale has always been treated as a special case that
uses an internal implementation, without using libc at all -- so the
new builtin provider uses the same implementation.

The builtin provider's locale is independent of the server environment
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are
set to "en_US", which is impossible with the libc provider.

By offering a new builtin provider, it clarifies that the semantics of
a collation using this provider will never depend on libc, and makes
it easier to document the behavior.

Discussion: /messages/by-id/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
Discussion: /messages/by-id/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org
Discussion: /messages/by-id/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/2d819a08a1cbc11364e36f816b02e33e8dcc030b

Modified Files
--------------
doc/src/sgml/charset.sgml | 90 ++++++++++++++++++-----
doc/src/sgml/ref/create_collation.sgml | 11 ++-
doc/src/sgml/ref/create_database.sgml | 7 +-
doc/src/sgml/ref/createdb.sgml | 2 +-
doc/src/sgml/ref/initdb.sgml | 17 ++++-
src/backend/catalog/pg_collation.c | 5 +-
src/backend/commands/collationcmds.c | 74 +++++++++++++++----
src/backend/commands/dbcommands.c | 129 +++++++++++++++++++++++++--------
src/backend/utils/adt/formatting.c | 6 ++
src/backend/utils/adt/pg_locale.c | 123 ++++++++++++++++++++++++++-----
src/backend/utils/init/postinit.c | 20 ++++-
src/bin/initdb/initdb.c | 53 ++++++++++----
src/bin/initdb/t/001_initdb.pl | 40 +++++++++-
src/bin/pg_dump/pg_dump.c | 23 +++++-
src/bin/pg_upgrade/t/002_pg_upgrade.pl | 81 ++++++++++++++++-----
src/bin/psql/describe.c | 4 +-
src/bin/scripts/createdb.c | 19 ++++-
src/bin/scripts/t/020_createdb.pl | 60 +++++++++++++++
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_collation.dat | 6 +-
src/include/catalog/pg_collation.h | 3 +
src/include/utils/pg_locale.h | 5 ++
src/test/icu/t/010_database.pl | 22 +++---
src/test/regress/expected/collate.out | 19 ++++-
src/test/regress/sql/collate.sql | 8 ++
25 files changed, 671 insertions(+), 158 deletions(-)

#2Peter Eisentraut
peter@eisentraut.org
In reply to: Jeff Davis (#1)
Re: pgsql: Introduce "builtin" collation provider.

On 14.03.24 07:39, Jeff Davis wrote:

Introduce "builtin" collation provider.

Jeff,

I think I found a small bug in this commit.

The new code in dbcommands.c createdb() reads like this:

+   /* validate provider-specific parameters */
+   if (dblocprovider != COLLPROVIDER_BUILTIN)
+   {
+       if (dbuiltinlocale)
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                    errmsg("BUILTIN_LOCALE cannot be specified unless 
locale provider is builtin")));
+   }
+   else if (dblocprovider != COLLPROVIDER_ICU)
+   {
+       if (diculocale)
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                    errmsg("ICU locale cannot be specified unless 
locale provider is ICU")));
+
+       if (dbicurules)
+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                    errmsg("ICU rules cannot be specified unless locale 
provider is ICU")));
+   }

But if dblocprovider is COLLPROVIDER_LIBC, then the first "if" is true
and the second one won't be checked. I think the correct code structure
would be to make both of these checks separate if statements.

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Jeff Davis (#1)
Re: pgsql: Introduce "builtin" collation provider.

On 2024-03-14 Th 02:39, Jeff Davis wrote:

Introduce "builtin" collation provider.

The new value "b" for pg_collation.collprovider doesn't seem to be
documented. Is that deliberate?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#4Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#2)
Re: pgsql: Introduce "builtin" collation provider.

On Tue, 2024-04-23 at 11:23 +0200, Peter Eisentraut wrote:

I think I found a small bug in this commit.

Good catch, thank you.

Committed a fix.

Regards,
Jeff Davis