[17] CREATE COLLATION default provider

Started by Jeff Davisover 2 years ago6 messages
#1Jeff Davis
pgsql@j-davis.com
1 attachment(s)

Currently, CREATE COLLATION always defaults the provider to libc.

The attached patch causes it to default to libc if LC_COLLATE/LC_CTYPE
are specified, otherwise default to the current database default
collation's provider.

That way, the provider choice at initdb time then becomes the default
for "CREATE DATABASE ... TEMPLATE template0", which then becomes the
default provider for "CREATE COLLATION (LOCALE='...')".

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

v11-0001-CREATE-COLLATION-default-provider.patchtext/x-patch; charset=UTF-8; name=v11-0001-CREATE-COLLATION-default-provider.patchDownload
From 329e32bfe5e1883a2cfd6e224c1d512b67256870 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 09:53:02 -0700
Subject: [PATCH v11] CREATE COLLATION default provider.

If the LC_COLLATE or LC_CTYPE are specified for a new collation, the
default provider is libc. Otherwise, the default provider is the same
as the provider for the database default collation.

Previously, the default provider was always libc.
---
 contrib/citext/expected/create_index_acl.out     |  2 +-
 contrib/citext/sql/create_index_acl.sql          |  2 +-
 doc/src/sgml/ref/create_collation.sgml           | 14 ++++++++++----
 src/backend/commands/collationcmds.c             |  7 ++++++-
 src/test/regress/expected/collate.linux.utf8.out | 10 +++++-----
 src/test/regress/sql/collate.linux.utf8.sql      | 10 +++++-----
 6 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/contrib/citext/expected/create_index_acl.out b/contrib/citext/expected/create_index_acl.out
index 33be13a92d..765867d36d 100644
--- a/contrib/citext/expected/create_index_acl.out
+++ b/contrib/citext/expected/create_index_acl.out
@@ -43,7 +43,7 @@ REVOKE ALL ON FUNCTION s.index_row_if FROM PUBLIC;
 -- Even for an empty table, CREATE INDEX checks ii_Predicate permissions.
 GRANT EXECUTE ON FUNCTION s.index_row_if TO regress_minimal;
 -- Non-extension, non-function objects.
-CREATE COLLATION s.coll (LOCALE="C");
+CREATE COLLATION s.coll (PROVIDER=libc, LOCALE="C");
 CREATE TABLE s.x (y s.citext);
 ALTER TABLE s.x OWNER TO regress_minimal;
 -- Empty-table DefineIndex()
diff --git a/contrib/citext/sql/create_index_acl.sql b/contrib/citext/sql/create_index_acl.sql
index 10b5225569..e338ac8799 100644
--- a/contrib/citext/sql/create_index_acl.sql
+++ b/contrib/citext/sql/create_index_acl.sql
@@ -45,7 +45,7 @@ REVOKE ALL ON FUNCTION s.index_row_if FROM PUBLIC;
 -- Even for an empty table, CREATE INDEX checks ii_Predicate permissions.
 GRANT EXECUTE ON FUNCTION s.index_row_if TO regress_minimal;
 -- Non-extension, non-function objects.
-CREATE COLLATION s.coll (LOCALE="C");
+CREATE COLLATION s.coll (PROVIDER=libc, LOCALE="C");
 CREATE TABLE s.x (y s.citext);
 ALTER TABLE s.x OWNER TO regress_minimal;
 -- Empty-table DefineIndex()
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..fd6c679694 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -121,10 +121,16 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
       <para>
        Specifies the provider to use for locale services associated with this
        collation.  Possible values are
-       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
-       (if the server was built with ICU support) or <literal>libc</literal>.
-       <literal>libc</literal> is the default.  See <xref
-       linkend="locale-providers"/> for details.
+       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
+       the server was built with ICU support) or <literal>libc</literal>. See
+       <xref linkend="locale-providers"/> for details.
+      </para>
+      <para>
+       If <replaceable>lc_colllate</replaceable> or
+       <replaceable>lc_ctype</replaceable> is specified, the default is
+       <literal>libc</literal>; otherwise, the default is the same as the
+       provider for the database default collation (see <xref
+       linkend="sql-createdatabase"/>).
       </para>
      </listitem>
     </varlistentry>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..20f976a3f8 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -226,7 +226,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				collprovider = default_locale.provider;
+		}
 
 		if (localeEl)
 		{
diff --git a/src/test/regress/expected/collate.linux.utf8.out b/src/test/regress/expected/collate.linux.utf8.out
index 01664f7c1b..588198d13e 100644
--- a/src/test/regress/expected/collate.linux.utf8.out
+++ b/src/test/regress/expected/collate.linux.utf8.out
@@ -1026,7 +1026,7 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
@@ -1034,7 +1034,7 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
 NOTICE:  collation "test0" already exists, skipping
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 NOTICE:  collation "test0" for encoding "UTF8" already exists, skipping
 do $$
 BEGIN
@@ -1046,7 +1046,7 @@ END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
 ERROR:  parameter "lc_ctype" must be specified
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 ERROR:  could not create locale "nonsense": No such file or directory
 DETAIL:  The operating system could not find any locale data for the locale name "nonsense".
 CREATE COLLATION test4 FROM nonsense;
@@ -1166,8 +1166,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 
 -- nondeterministic collations
 -- (not supported with libc provider)
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 ERROR:  nondeterministic collations not supported with this provider
 -- cleanup
 SET client_min_messages TO warning;
diff --git a/src/test/regress/sql/collate.linux.utf8.sql b/src/test/regress/sql/collate.linux.utf8.sql
index 132d13af0a..2d031293d1 100644
--- a/src/test/regress/sql/collate.linux.utf8.sql
+++ b/src/test/regress/sql/collate.linux.utf8.sql
@@ -358,13 +358,13 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
@@ -374,7 +374,7 @@ BEGIN
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
@@ -455,8 +455,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 -- nondeterministic collations
 -- (not supported with libc provider)
 
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 
 
 -- cleanup
-- 
2.34.1

#2Gurjeet Singh
gurjeet@singh.im
In reply to: Jeff Davis (#1)
Re: [17] CREATE COLLATION default provider

On Wed, Jun 14, 2023 at 9:48 PM Jeff Davis <pgsql@j-davis.com> wrote:

Currently, CREATE COLLATION always defaults the provider to libc.

The attached patch causes it to default to libc if LC_COLLATE/LC_CTYPE
are specified, otherwise default to the current database default
collation's provider.

+            if (lccollateEl || lcctypeEl)
+                collprovider = COLLPROVIDER_LIBC;
+            else
+                collprovider = default_locale.provider;

The docs for the CREATE COLLATION option 'locale' say: "This is a
shortcut for setting LC_COLLATE and LC_CTYPE at once."

So it's not intuitive why the check does not include a test for the
presence of 'localeEl', as well? If we consider the presence of
LC_COLLATE _or_ LC_CTYPE options to be a determining factor for some
decision, then the presence of LOCALE option should also lead to the
same outcome.

Otherwise the patch looks good.

v11-0001-CREATE-COLLATION-default-provider.patch

I believe v11 is a typo, and you really meant v1.

Best regards,
Gurjeet
http://Gurje.et

#3Jeff Davis
pgsql@j-davis.com
In reply to: Gurjeet Singh (#2)
Re: [17] CREATE COLLATION default provider

On Sat, 2023-06-17 at 09:09 -0700, Gurjeet Singh wrote:

The docs for the CREATE COLLATION option 'locale' say: "This is a
shortcut for setting LC_COLLATE and LC_CTYPE at once."

So it's not intuitive why the check does not include a test for the
presence of 'localeEl', as well? If we consider the presence of
LC_COLLATE _or_ LC_CTYPE options to be a determining factor for some
decision, then the presence of LOCALE option should also lead to the
same outcome.

The docs say: "If provider is libc, this is a shortcut...". The point
is that LC_COLLATE and LC_CTYPE act as a signal that what the user
really wants is a libc collation. LOCALE works for either, so we need a
default.

That being said, I'm now having second thoughts about where that
default should come from. While getting the default from datlocprovider
is convenient, I'm not sure that the datlocprovider provides a good
signal. A lot of users will have datlocprovider=c and datcollate=C,
which really means they want the built-in memcmp behavior, and to me
that doesn't signal that they want CREATE COLLATION to use libc for a
non-C locale.

A GUC might be a better default, and we could have CREATE COLLATION
default to ICU if the server is built with ICU and if PROVIDER,
LC_COLLATE and LC_CTYPE are unspecified.

Regards,
Jeff Davis

#4Gurjeet Singh
gurjeet@singh.im
In reply to: Jeff Davis (#3)
Re: [17] CREATE COLLATION default provider

On Fri, Jul 7, 2023 at 9:33 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Sat, 2023-06-17 at 09:09 -0700, Gurjeet Singh wrote:

The docs for the CREATE COLLATION option 'locale' say: "This is a
shortcut for setting LC_COLLATE and LC_CTYPE at once."

So it's not intuitive why the check does not include a test for the
presence of 'localeEl', as well? If we consider the presence of
LC_COLLATE _or_ LC_CTYPE options to be a determining factor for some
decision, then the presence of LOCALE option should also lead to the
same outcome.

The docs say: "If provider is libc, this is a shortcut...". The point
is that LC_COLLATE and LC_CTYPE act as a signal that what the user
really wants is a libc collation. LOCALE works for either, so we need a
default.

Sorry about the noise, I was consulting current/v15 docs online. Now
that v16 docs are online, I can see that the option in fact says this
is the case only if libc is the provider.

(note to self: for reviewing patches to master, consult devel docs [1]https://www.postgresql.org/docs/devel/ online)

[1]: https://www.postgresql.org/docs/devel/

Best regards,
Gurjeet
http://Gurje.et

#5Peter Eisentraut
peter@eisentraut.org
In reply to: Jeff Davis (#1)
Re: [17] CREATE COLLATION default provider

On 15.06.23 06:47, Jeff Davis wrote:

Currently, CREATE COLLATION always defaults the provider to libc.

The attached patch causes it to default to libc if LC_COLLATE/LC_CTYPE
are specified, otherwise default to the current database default
collation's provider.

That way, the provider choice at initdb time then becomes the default
for "CREATE DATABASE ... TEMPLATE template0", which then becomes the
default provider for "CREATE COLLATION (LOCALE='...')".

I like the general idea. If the user has selected ICU overall, it could
be sensible that certain commands default to ICU.

I wonder, however, how useful this would be in practice. In most
interesting cases, you need to know what the provider is to be able to
spell out the locale name appropriately. The cases where some overlap
exists, like the common "ll_CC", are already preloaded, so won't
actually need to be specified explicitly in many cases.

Also, I think the default should only flow one way, top-down: The
default provider of CREATE COLLATION is datlocprovider. There shouldn't
be a second, botton-up way, based on the other specified CREATE
COLLATION parameters. That's just too much logical zig-zag, IMO.
Otherwise, if you extend this locally, why not also look at if
"deterministic" or "rules" was specified, etc.

#6Jeff Davis
pgsql@j-davis.com
In reply to: Peter Eisentraut (#5)
Re: [17] CREATE COLLATION default provider

On Thu, 2024-01-18 at 11:15 +0100, Peter Eisentraut wrote:

I wonder, however, how useful this would be in practice.  In most
interesting cases, you need to know what the provider is to be able
to
spell out the locale name appropriately.  The cases where some
overlap
exists, like the common "ll_CC", are already preloaded, so won't
actually need to be specified explicitly in many cases.

Good point.

Also, I think the default should only flow one way, top-down:  The
default provider of CREATE COLLATION is datlocprovider.  There
shouldn't
be a second, botton-up way, based on the other specified CREATE
COLLATION parameters.  That's just too much logical zig-zag, IMO.

Also a good point. I am withdrawing this patch from the CF, and we can
reconsider the idea later perhaps in some other form.

Regards,
Jeff Davis