Order changes in PG16 since ICU introduction

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Regina Obe (#1)

Re: Order changes in PG16 since ICU introduction

"Regina Obe" <lr@pcorp.us> writes:

A couple of days ago, our PostGIS PG16 bots started failing with order
changes in text.
We have our tests set to locale=c

It seems since April 20th, our tests that rely on sorting characters
changed.
As noted in this ticket:

https://trac.osgeo.org/postgis/ticket/5375

I'm assuming it's result of icu change:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=fcb21b3ac
dcb9a60313325618fd7080aa36f1626

I suspect all our bots are compiling with icu enabled. But I haven't
confirmed.

If they actually are using locale C, I would say this is a bug.
That should designate memcmp sorting and nothing else.

regards, tom lane

pgsql@j-davis.com

over 2 years ago

In reply to: Regina Obe (#1)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 11:27 -0400, Regina Obe wrote:

A couple of days ago, our PostGIS PG16 bots started failing with
order
changes in text.
We have our tests set to locale=c

Are you sure it's still using the C locale? The results seem to be
explainable if the locale switched from "C" to "en-US-x-icu":

The results of the following are the same in v15 and v16:

select 'RM(25)/nodes|+|21|1' collate "C" < 'RM(25)/nodes|-|21|' collate
"C"; -- true

select 'RM(25)/nodes|+|21|1' collate "en-US-x-icu" < 'RM(25)/nodes|-
|21|' collate "en-US-x-icu"; -- false

I suspect when the initdb and configure defaults changed from libc to
ICU, then your locale changed.

Regards,
Jeff Davis

Sandro Santilli

strk@kbt.io

over 2 years ago

In reply to: Tom Lane (#2)

Re: Order changes in PG16 since ICU introduction

On Fri, Apr 21, 2023 at 11:48:51AM -0400, Tom Lane wrote:

"Regina Obe" <lr@pcorp.us> writes:

https://trac.osgeo.org/postgis/ticket/5375

If they actually are using locale C, I would say this is a bug.
That should designate memcmp sorting and nothing else.

Sounds like a bug to me. This is happening with a PostgreSQL cluster
created and served by a build of commit c04c6c5d6f :

=# select version();
PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
=# show lc_collate;
C
=# select '+' < '-';
f
=# select '+' < '-' collate "C";
t

I don't know if it should matter but also:

=# show lc_messages;
C

--strk;

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Sandro Santilli (#4)

Re: Order changes in PG16 since ICU introduction

On 21.04.23 19:09, Sandro Santilli wrote:

On Fri, Apr 21, 2023 at 11:48:51AM -0400, Tom Lane wrote:

"Regina Obe" <lr@pcorp.us> writes:

https://trac.osgeo.org/postgis/ticket/5375

If they actually are using locale C, I would say this is a bug.
That should designate memcmp sorting and nothing else.

Sounds like a bug to me. This is happening with a PostgreSQL cluster
created and served by a build of commit c04c6c5d6f :

=# select version();
PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
=# show lc_collate;
C
=# select '+' < '-';
f

If the database is created with locale provider ICU, then lc_collate
does not apply here, so the result might be correct (depending on what
locale you have set).

Show quoted text

=# select '+' < '-' collate "C";
t

pgsql@j-davis.com

over 2 years ago

In reply to: Sandro Santilli (#4)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 19:09 +0200, Sandro Santilli wrote:

=# select version();
PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
=# show lc_collate;
C
=# select '+' < '-';
f

What is the result of:

select datlocprovider, datcollate, daticulocale
from pg_database where datname=current_database();

=# select '+' < '-' collate "C";
t

Regards,
Jeff Davis

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Peter Eisentraut (#5)

Re: Order changes in PG16 since ICU introduction

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

If the database is created with locale provider ICU, then lc_collate
does not apply here, so the result might be correct (depending on what
locale you have set).

FWIW, an installation created under LANG=C defaults to ICU locale
en-US-u-va-posix for me (see psql \l), and that still sorts as
expected on my RHEL8 box. We've not seen buildfarm problems either.

I am wondering however whether this doesn't mean that all our carefully
coded fast paths for C locale just went down the drain. Does the ICU
code have any of that? Has any performance testing been done to see
what impact this change had on C-locale installations? (The current
code coverage report for pg_locale.c is not encouraging.)

regards, tom lane

lr@pcorp.us

over 2 years ago

In reply to: Tom Lane (#7)

RE: Order changes in PG16 since ICU introduction

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

If the database is created with locale provider ICU, then lc_collate
does not apply here, so the result might be correct (depending on what
locale you have set).

FWIW, an installation created under LANG=C defaults to ICU locale en-US-u-
va-posix for me (see psql \l), and that still sorts as expected on my

RHEL8 box.

We've not seen buildfarm problems either.

I am wondering however whether this doesn't mean that all our carefully
coded fast paths for C locale just went down the drain. Does the ICU code
have any of that? Has any performance testing been done to see what

impact

this change had on C-locale installations? (The current code coverage

report

for pg_locale.c is not encouraging.)

regards, tom lane

Just another metric.

On my mingw64 setup, I built a test database on PG16 (built with icu
support) and PG15 (no icu support)

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LC_COLLATE = 'C'
LC_CTYPE = 'C';

I think the above is the similar setup we have when testing.

On PG15

SELECT '+' < '-' ; returns true

On PG 16 returns false

For PG 16, to strk's point, you have to do: to get a true
SELECT '+' COLLATE "C" < '-' COLLATE "C";

I would expect since I'm initializing my db in collate C they would both
behave the same

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Regina Obe (#8)

Re: Order changes in PG16 since ICU introduction

"Regina Obe" <lr@pcorp.us> writes:

On my mingw64 setup, I built a test database on PG16 (built with icu
support) and PG15 (no icu support)

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LC_COLLATE = 'C'
LC_CTYPE = 'C';

As has been pointed out already, setting LC_COLLATE/LC_CTYPE is
meaningless when the locale provider is ICU. You need to look
at what ICU locale is being chosen, or force it with LOCALE = 'C'.

regards, tom lane

#10

lr@pcorp.us

over 2 years ago

In reply to: Tom Lane (#9)

RE: Order changes in PG16 since ICU introduction

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8'

LC_COLLATE = 'C'

LC_CTYPE = 'C';

As has been pointed out already, setting LC_COLLATE/LC_CTYPE is
meaningless when the locale provider is ICU. You need to look at what ICU
locale is being chosen, or force it with LOCALE = 'C'.

regards, tom lane

Okay got it was on IRC with RhodiumToad and he suggested:

CREATE DATABASE test2 TEMPLATE=template0 ENCODING = 'UTF8' LC_COLLATE = 'C'
LC_CTYPE = 'C' ICU_LOCALE='C';

Which gives expected result:
SELECT '+' < '-' ; -- true

but gives me a notice:
NOTICE: using standard form "en-US-u-va-posix" for locale "C"

#11

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Regina Obe (#10)

Re: Order changes in PG16 since ICU introduction

"Regina Obe" <lr@pcorp.us> writes:

Okay got it was on IRC with RhodiumToad and he suggested:
CREATE DATABASE test2 TEMPLATE=template0 ENCODING = 'UTF8' LC_COLLATE = 'C'
LC_CTYPE = 'C' ICU_LOCALE='C';

Which gives expected result:
SELECT '+' < '-' ; -- true

but gives me a notice:
NOTICE: using standard form "en-US-u-va-posix" for locale "C"

Yeah. My recommendation is just LOCALE:

regression=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C';
CREATE DATABASE
regression=# CREATE DATABASE test2 TEMPLATE=template0 ENCODING = 'UTF8' ICU_LOCALE = 'C';
NOTICE: using standard form "en-US-u-va-posix" for locale "C"
CREATE DATABASE

I think it's probably intentional that ICU_LOCALE is stricter
about being given a real ICU locale name, but I didn't write
any of that code.

regards, tom lane

#12

andrew@tao11.riddles.org.uk

over 2 years ago

In reply to: Peter Eisentraut (#5)

Re: Order changes in PG16 since ICU introduction

"Peter" == Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Peter> If the database is created with locale provider ICU, then
Peter> lc_collate does not apply here,

Having lc_collate return a value which is silently being ignored seems
to me rather hugely confusing.

Also, somewhere along the line someone broke initdb --no-locale, which
should result in C locale being the default everywhere, but when I just
tested it it picked 'en' for an ICU locale, which is not the right
thing.

--
Andrew (irc:RhodiumToad)

#13

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Andrew Gierth (#12)

Re: Order changes in PG16 since ICU introduction

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Peter" == Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
Peter> If the database is created with locale provider ICU, then
Peter> lc_collate does not apply here,

Having lc_collate return a value which is silently being ignored seems
to me rather hugely confusing.

It's not *completely* ignored --- there are bits of code that are not
yet ICU-ified and will still use the libc facilities. So we can't
get rid of those options yet, even in an ICU-based database.

Also, somewhere along the line someone broke initdb --no-locale, which
should result in C locale being the default everywhere, but when I just
tested it it picked 'en' for an ICU locale, which is not the right
thing.

Confirmed:

$ LANG=en_US.utf8 initdb --no-locale
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

That needs to be fixed: --no-locale should prevent any consideration
of initdb's LANG/LC_foo environment.

regards, tom lane

#14

lr@pcorp.us

over 2 years ago

In reply to: Tom Lane (#11)

RE: Order changes in PG16 since ICU introduction

Yeah. My recommendation is just LOCALE:

regression=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING =
'UTF8' LOCALE = 'C'; CREATE DATABASE regression=# CREATE DATABASE test2
TEMPLATE=template0 ENCODING = 'UTF8' ICU_LOCALE = 'C';
NOTICE: using standard form "en-US-u-va-posix" for locale "C"
CREATE DATABASE

I think it's probably intentional that ICU_LOCALE is stricter about being

given

a real ICU locale name, but I didn't write any of that code.

regards, tom lane

CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C';

Doesn't seem to work at least not under mingw64 anyway.

SELECT '+' < '-' ;

Returns false

#15

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Regina Obe (#14)

Re: Order changes in PG16 since ICU introduction

"Regina Obe" <lr@pcorp.us> writes:

CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C';
Doesn't seem to work at least not under mingw64 anyway.

Hmm, doesn't work for me either:

$ LANG=en_US.utf8 initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

Using default ICU locale "en_US".
Using language tag "en-US" for ICU locale "en_US".
The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: en-US
LC_COLLATE: en_US.utf8
LC_CTYPE: en_US.utf8
LC_MESSAGES: en_US.utf8
LC_MONETARY: en_US.utf8
LC_NUMERIC: en_US.utf8
LC_TIME: en_US.utf8
...
$ psql postgres
psql (16devel)
Type "help" for help.

postgres=# SELECT '+' < '-' ;
?column?
----------
f
(1 row)

(as expected, so far)

postgres=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C';
CREATE DATABASE
postgres=# \c test1
You are now connected to database "test1" as user "postgres".
test1=# SELECT '+' < '-' ;
?column?
----------
f
(1 row)

(wrong!)

Looks like the "pick en-US even when told not to" problem exists here too.

regards, tom lane

#16

Sandro Santilli

strk@kbt.io

over 2 years ago

In reply to: Peter Eisentraut (#5)

Re: Order changes in PG16 since ICU introduction

On Fri, Apr 21, 2023 at 07:14:13PM +0200, Peter Eisentraut wrote:

On 21.04.23 19:09, Sandro Santilli wrote:

On Fri, Apr 21, 2023 at 11:48:51AM -0400, Tom Lane wrote:

"Regina Obe" <lr@pcorp.us> writes:

https://trac.osgeo.org/postgis/ticket/5375

If they actually are using locale C, I would say this is a bug.
That should designate memcmp sorting and nothing else.

Sounds like a bug to me. This is happening with a PostgreSQL cluster
created and served by a build of commit c04c6c5d6f :

=# select version();
PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
=# show lc_collate;
C
=# select '+' < '-';
f

If the database is created with locale provider ICU, then lc_collate does
not apply here, so the result might be correct (depending on what locale you
have set).

The database is created by a perl script which starts like this:

$ENV{"LC_ALL"} = "C";
$ENV{"LANG"} = "C";

And then runs:

createdb --encoding=UTF-8 --template=template0 --lc-collate=C

Should we tweak anything else to make the results predictable ?

--strk;

#17

andrew@tao11.riddles.org.uk

over 2 years ago

In reply to: Tom Lane (#13)

Re: Order changes in PG16 since ICU introduction

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Also, somewhere along the line someone broke initdb --no-locale,
which should result in C locale being the default everywhere, but
when I just tested it it picked 'en' for an ICU locale, which is not
the right thing.

Tom> Confirmed:

Tom> $ LANG=en_US.utf8 initdb --no-locale
Tom> The files belonging to this database system will be owned by user "postgres".
Tom> This user must also own the server process.

Tom> Using default ICU locale "en_US".
Tom> Using language tag "en-US" for ICU locale "en_US".
Tom> The database cluster will be initialized with this locale configuration:
Tom> provider: icu
Tom> ICU locale: en-US
Tom> LC_COLLATE: C
Tom> LC_CTYPE: C
Tom> ...

Tom> That needs to be fixed: --no-locale should prevent any
Tom> consideration of initdb's LANG/LC_foo environment.

Would it also not make sense to also take into account any --locale and
--lc-* options before choosing an ICU default locale? Right now if you
do, say, initdb --locale=fr_FR you get an ICU locale based on the
environment but lc_* settings based on the option, which seems maximally
confusing.

Also, what happens now to lc_collate_is_c() when the provider is ICU? Am
I missing something, or is it never true now, even if you specified C /
POSIX / en-US-u-va-posix as the ICU locale? This seems like it could be
an important pessimization.

Also also, we now have the problem that it is much harder to create a
'C' collation database within an existing cluster (e.g. for testing)
without knowing whether the default provider is ICU. In the past one
would have done:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C';

but now that creates a database that uses the same ICU locale as
template0 by default. If instead one tries:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' ICU_LOCALE='C';

then one gets an error if the default locale provider is _not_ ICU. The
only option now seems to be:

CREATE DATABASE test TEMPLATE=template0 ENCODING = 'UTF8' LOCALE = 'C' LOCALE_PROVIDER = 'libc';

which of course doesn't work in older pg versions.

--
Andrew.

#18

Sandro Santilli

strk@kbt.io

over 2 years ago

In reply to: Jeff Davis (#6)

Re: Order changes in PG16 since ICU introduction

On Fri, Apr 21, 2023 at 10:27:49AM -0700, Jeff Davis wrote:

On Fri, 2023-04-21 at 19:09 +0200, Sandro Santilli wrote:

ï¿½ =# select version();
ï¿½ PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
ï¿½ =# show lc_collate;
ï¿½ C
ï¿½ =# select '+' < '-';
ï¿½ f

What is the result of:

select datlocprovider, datcollate, daticulocale
from pg_database where datname=current_database();

datlocprovider | i
datcollate | C
daticulocale | en-US

--strk;

#19

pgsql@j-davis.com

over 2 years ago

In reply to: Tom Lane (#15)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 14:23 -0400, Tom Lane wrote:

postgres=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8'
LOCALE = 'C';

...

test1 | postgres | UTF8 | icu | C |
C | en-US | |
(4 rows)

Looks like the "pick en-US even when told not to" problem exists here
too.

Both provider (ICU) and the icu locale (en-US) are inherited from
template0. The LOCALE parameter to CREATE DATABASE doesn't affect
either of those things, because there's a separate parameter
ICU_LOCALE.

This happens the same way in v15, and although it matches the
documentation technically, it is not a great user experience.

I have a couple ideas:

1. Introduce a "none" provider to separate the concept of C/POSIX
locales from the libc provider. It's not really using a provider
anyway, it's just using memcmp(), and I think it causes confusion to
combine them. Saying "LOCALE_PROVIDER=none" is less error-prone than
"LOCALE_PROVIDER=libc LOCALE='C'".

2. Change the CREATE DATABASE syntax to catch these errors better at
the possible expense of backwards compatibility.

I am also having second thoughts about accepting "C" or "POSIX" as an
ICU locale and transforming it to "en-US-u-va-posix" in v16. It's not
terribly useful (why not just use memcmp()?), it's not fast in my
measurements (en-US is faster), so maybe it's better to just throw an
error and tell the user to use C (or provider=none as I suggest
above)?

Obviously the user could manually type "en-US-u-va-posix" if that's the
locale they want. Throwing an error would be a backwards-compatibility
issue, but in v15 an ICU locale of "C" just gives the root locale
anyway, which is probably not what they want.

Regards,
Jeff Davis

#20

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Jeff Davis (#19)

Re: Order changes in PG16 since ICU introduction

Jeff Davis <pgsql@j-davis.com> writes:

I have a couple ideas:

1. Introduce a "none" provider to separate the concept of C/POSIX
locales from the libc provider. It's not really using a provider
anyway, it's just using memcmp(), and I think it causes confusion to
combine them. Saying "LOCALE_PROVIDER=none" is less error-prone than
"LOCALE_PROVIDER=libc LOCALE='C'".

I think I might like this idea, except for one thing: you're imagining
that the locale doesn't control anything except string comparisons.
What about to_upper/to_lower, character classifications in regexes, etc?
(I'm not sure whether those operations can get redirected to ICU today
or whether they still always go to libc, but we'll surely want to fix
it eventually if the latter is still true.)

In any case, that seems somewhat orthogonal to what we're on about here,
which is making the behavior of CREATE DATABASE less surprising and more
backwards-compatible. I'm not sure that provider=none can help with that.
Aside from the user-surprise issues discussed up to now, pg_dump scripts
emitted by pre-v15 pg_dump are not going to contain LOCALE_PROVIDER
clauses in CREATE DATABASE, and people are going to be very unhappy
if that means they suddenly get totally different locale semantics
after restoring into a new DB. I think we need some plan for mapping
libc-style locale specs into ICU locales so that we can make that
more nearly transparent.

2. Change the CREATE DATABASE syntax to catch these errors better at
the possible expense of backwards compatibility.

That is the exact opposite of what I think we need. Backwards
compatibility isn't optional.

Maybe this means we are not ready to do ICU-by-default in v16.
It certainly feels like there might be more here than we want to
start designing post-feature-freeze.

regards, tom lane

#21

pgsql@j-davis.com

over 2 years ago

In reply to: Sandro Santilli (#16)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 21:14 +0200, Sandro Santilli wrote:

And then runs:

createdb --encoding=UTF-8 --template=template0 --lc-collate=C

Should we tweak anything else to make the results predictable ?

You can specify --locale-provider=libc

Regards,
Jeff Davis

#22

pgsql@j-davis.com

over 2 years ago

In reply to: Tom Lane (#7)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 13:28 -0400, Tom Lane wrote:

I am wondering however whether this doesn't mean that all our
carefully
coded fast paths for C locale just went down the drain.

The code still exists. You can test it by using the built-in collation
"C" which is correctly specified with collprovider=libc and
collcollate=C.

For my test dataset, ICU 72, glibc 2.35:

-- ~07s
explain analyze select t from a order by t collate "C";

-- ~15s
explain analyze select t from a order by t collate "en-US-x-icu";

-- ~21s
explain analyze select t from a order by t collate "en-US-u-va-posix-
x-icu";

-- ~34s
explain analyze select t from a order by t collate "en_US";

I believe the confusion in this thread comes from:

* The syntax of CREATE DATABASE (the same as v15 but still confusing)
* The fact that you need provider=libc to get memcmp() behavior (same
as v15 but still confusing)

Regards,
Jeff Davis

#23

Robert Haas

robertmhaas@gmail.com

over 2 years ago

In reply to: Jeff Davis (#19)

Re: Order changes in PG16 since ICU introduction

On Fri, Apr 21, 2023 at 3:25 PM Jeff Davis <pgsql@j-davis.com> wrote:

I am also having second thoughts about accepting "C" or "POSIX" as an
ICU locale and transforming it to "en-US-u-va-posix" in v16. It's not
terribly useful (why not just use memcmp()?), it's not fast in my
measurements (en-US is faster), so maybe it's better to just throw an
error and tell the user to use C (or provider=none as I suggest
above)?

I mean, to renew a complaint I've made previously, how the heck is
anyone supposed to understand what's going on here?

We have no meaningful documentation of how to select an ICU locale
that works for you. We have a couple of examples and a suggestion that
you should use BCP 47. But when I asked before for documentation
references, the ones you provided were not clear, basically
incomprehensible. In follow-up discussion, you admitted you'd had to
consult the source code to figure certain things out.

And the fact that "C" or "POSIX" gets transformed into
"en-US-u-va-posix" is also completely documented. That string appears
twice in the code, but zero times in the documentation. There's code
to do it, but users shouldn't have to read code, and it wouldn't help
much if they did, because the code comments don't really explain the
rationale behind this choice either.

I find the fact that people are having trouble here completely
predictable. Of course if people ask for "C" and the system tells them
that it's using "en-US-u-va-posix" instead they're going to be
confused and ask questions, exactly as is happening here. glibc
collations aren't particularly well-documented either, but people have
some experience with, and they can get a list of values that have a
chance of working from /usr/share/locale, and they know what "C"
means. Nobody knows what "en-US-u-va-posix" is. It's not even
Googleable, really, whereas "C locale" is.

My opinion is that the switch to using ICU by default is ill-advised
and should be reverted. The compatibility break isn't worth whatever
advantages ICU may have, the documentation to allow people to
transition to ICU with reasonable effort doesn't exist, and the fact
that within weeks of feature freeze people who know a lot about
PostgreSQL are struggling to get the behavior they want is a really
bad sign.

--
Robert Haas
EDB: http://www.enterprisedb.com

#24

pgsql@j-davis.com

over 2 years ago

In reply to: Andrew Gierth (#12)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 19:00 +0100, Andrew Gierth wrote:

Also, somewhere along the line someone broke initdb --no-locale,
which
should result in C locale being the default everywhere, but when I
just
tested it it picked 'en' for an ICU locale, which is not the right
thing.

Fixed, thank you.

Regards,
Jeff Davis

#25

pgsql@j-davis.com

over 2 years ago

In reply to: Tom Lane (#20)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 16:00 -0400, Tom Lane wrote:

Maybe this means we are not ready to do ICU-by-default in v16.
It certainly feels like there might be more here than we want to
start designing post-feature-freeze.

I don't see how punting to the next release helps. If the CREATE
DATABASE syntax (and similar issues for createdb and initdb) in v15 is
just too confusing, and we can't find a remedy for v16, then we
probably won't find a remedy for v17 either.

Regards,
Jeff Davis

#26

andrew@tao11.riddles.org.uk

over 2 years ago

In reply to: Jeff Davis (#24)

Re: Order changes in PG16 since ICU introduction

"Jeff" == Jeff Davis <pgsql@j-davis.com> writes:

Also, somewhere along the line someone broke initdb --no-locale,
which should result in C locale being the default everywhere, but
when I just tested it it picked 'en' for an ICU locale, which is not
the right thing.

Jeff> Fixed, thank you.

Is that the right fix, though? (It forces --locale-provider=libc for the
cluster default, which might not be desirable?)

--
Andrew.

#27

pgsql@j-davis.com

over 2 years ago

In reply to: Andrew Gierth (#26)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 22:08 +0100, Andrew Gierth wrote:

Is that the right fix, though? (It forces --locale-provider=libc for
the
cluster default, which might not be desirable?)

For the "no locale" behavior (memcmp()-based) the provider needs to be
libc. Do you see an alternative?

Regards,
Jeff Davis

#28

andrew@tao11.riddles.org.uk

over 2 years ago

In reply to: Jeff Davis (#27)

Re: Order changes in PG16 since ICU introduction

"Jeff" == Jeff Davis <pgsql@j-davis.com> writes:

Is that the right fix, though? (It forces --locale-provider=libc for
the cluster default, which might not be desirable?)

Jeff> For the "no locale" behavior (memcmp()-based) the provider needs
Jeff> to be libc. Do you see an alternative?

Can lc_collate_is_c() be taught to check whether an ICU locale is using
POSIX collation?

There's now another bug in that --no-locale no longer does the same
thing as --locale=C (which is its long-established documented behavior).
How should these various options interact? This all seems not well
thought out from a usability perspective, and I think a proper fix
should involve a bit more serious consideration.

--
Andrew.

#29

pgsql@j-davis.com

over 2 years ago

In reply to: Robert Haas (#23)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 16:33 -0400, Robert Haas wrote:

My opinion is that the switch to using ICU by default is ill-advised
and should be reverted.

Most of the complaints seem to be complaints about v15 as well, and
while those complaints may be a reason to not make ICU the default,
they are also an argument that we should continue to learn and try to
fix those issues because they exist in an already-released version.
Leaving it the default for now will help us fix those issues rather
than hide them.

It's still early, so we have plenty of time to revert the initdb
default if we need to.

Regards,
Jeff Davis

#30

lr@pcorp.us

over 2 years ago

In reply to: Jeff Davis (#29)

RE: Order changes in PG16 since ICU introduction

My opinion is that the switch to using ICU by default is ill-advised
and should be reverted.

Most of the complaints seem to be complaints about v15 as well, and while
those complaints may be a reason to not make ICU the default, they are also
an argument that we should continue to learn and try to fix those issues
because they exist in an already-released version.
Leaving it the default for now will help us fix those issues rather than hide
them.

It's still early, so we have plenty of time to revert the initdb default if we need
to.

Regards,
Jeff Davis

I'm fine with that. Sounds like it wouldn't be too hard to just pull it out at the end.

Before this, I didn't even know ICU existed in PG15. My first realization that ICU was even a thing was when my PG16 refused to compile without adding my ICU path to my pkg-config or putting in --without-icu.

So yah I suspect leaving it in a little bit longer will uncover some more issues and won't harm too much.

Thanks,
Regina

#31

pgsql@j-davis.com

over 2 years ago

In reply to: Robert Haas (#23)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 16:33 -0400, Robert Haas wrote:

And the fact that "C" or "POSIX" gets transformed into
"en-US-u-va-posix"

I already expressed, on reflection, that we should probably just not do
that. So I think we're in agreement on this point; patch attached.

Regards,
Jeff Davis

Attachments:

0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchtext/x-patch; charset=UTF-8; name=0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchDownload

From 3d2791af0a236cbc7ce7f29d988e8ac7fd3fd389 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Apr 2023 14:03:57 -0700
Subject: [PATCH] ICU: do not convert locale 'C' to 'en-US-u-va-posix'.

The conversion was intended to be for convenience, but it's more
likely to be confusing than useful.

The user can still directly specify 'en-US-u-va-posix' if desired.

Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 19 +------------------
 src/bin/initdb/initdb.c                       | 17 +----------------
 .../regress/expected/collate.icu.utf8.out     |  8 ++++++++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 ++++
 4 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 51df570ce9..58c4c426bc 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2782,26 +2782,10 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		if (elevel > 0)
-			ereport(elevel,
-					(errmsg("could not get language from locale \"%s\": %s",
-							loc_str, u_errorName(status))));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2889,8 +2873,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..4086834458 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2238,24 +2238,10 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		pg_fatal("could not get language from locale \"%s\": %s",
-				 loc_str, u_errorName(status));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2327,8 +2313,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..99f12d2e73 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1020,6 +1020,7 @@ CREATE ROLE regress_test_role;
 CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' ||
@@ -1034,17 +1035,24 @@ BEGIN
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
+RESET icu_validation_level;
 RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+WARNING:  ICU locale "c" has unknown language "c"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..d9778faacc 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -358,6 +358,7 @@ CREATE SCHEMA test_schema;
 
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 
 do $$
 BEGIN
@@ -373,13 +374,16 @@ BEGIN
 END
 $$;
 
+RESET icu_validation_level;
 RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 RESET icu_validation_level;
 
-- 
2.34.1

#32

Robert Haas

robertmhaas@gmail.com

over 2 years ago

In reply to: Jeff Davis (#29)

Re: Order changes in PG16 since ICU introduction

On Fri, Apr 21, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:

Most of the complaints seem to be complaints about v15 as well, and
while those complaints may be a reason to not make ICU the default,
they are also an argument that we should continue to learn and try to
fix those issues because they exist in an already-released version.
Leaving it the default for now will help us fix those issues rather
than hide them.

It's still early, so we have plenty of time to revert the initdb
default if we need to.

That's fair enough, but I really think it's important that some energy
get invested in providing adequate documentation for this stuff. Just
patching the code is not enough.

--
Robert Haas
EDB: http://www.enterprisedb.com

#33

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Peter Eisentraut (#5)

Re: Order changes in PG16 since ICU introduction

On 21.04.23 19:14, Peter Eisentraut wrote:

On 21.04.23 19:09, Sandro Santilli wrote:

On Fri, Apr 21, 2023 at 11:48:51AM -0400, Tom Lane wrote:

"Regina Obe" <lr@pcorp.us> writes:

https://trac.osgeo.org/postgis/ticket/5375

If they actually are using locale C, I would say this is a bug.
That should designate memcmp sorting and nothing else.

Sounds like a bug to me. This is happening with a PostgreSQL cluster
created and served by a build of commit c04c6c5d6f :

   =# select version();
   PostgreSQL 16devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit
   =# show lc_collate;
   C
   =# select '+' < '-';
   f

If the database is created with locale provider ICU, then lc_collate
does not apply here, so the result might be correct (depending on what
locale you have set).

The GUC settings lc_collate and lc_ctype are from a time when those
locale settings were cluster-global. When we made those locale settings
per-database (PG 8.4), we kept them as read-only. As of PG 15, you can
use ICU as the per-database locale provider, so what is being attempted
in the above example is already meaningless before PG 16, since you need
to look into pg_database to find out what is really happening.

I think we should just remove the GUC parameters lc_collate and lc_ctype.

#34

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#31)

Re: Order changes in PG16 since ICU introduction

On 22.04.23 01:00, Jeff Davis wrote:

On Fri, 2023-04-21 at 16:33 -0400, Robert Haas wrote:

And the fact that "C" or "POSIX" gets transformed into
"en-US-u-va-posix"

I already expressed, on reflection, that we should probably just not do
that. So I think we're in agreement on this point; patch attached.

This makes sense to me. This way, if someone specifies 'C' locale
together with ICU provider they get an error. They can then choose to
use the libc provider, to get the performance path, or stick with ICU by
using the native spelling of the locale.

#35

pgsql@j-davis.com

over 2 years ago

In reply to: Tom Lane (#20)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 16:00 -0400, Tom Lane wrote:

I think I might like this idea, except for one thing: you're
imagining
that the locale doesn't control anything except string comparisons.
What about to_upper/to_lower, character classifications in regexes,
etc?

If provider='libc' and LC_CTYPE='C', str_toupper/str_tolower are
handled with asc_tolower/asc_toupper. The regex character
classification is done with pg_char_properties. In these cases neither
ICU nor libc is used; it's just code in postgres.

libc is special in that you can set LC_COLLATE and LC_CTYPE separately,
so that different locales are used for sorting and character
classification. That's potentially useful to set LC_COLLATE to C for
performance reasons, while setting LC_CTYPE to a useful locale. We
don't allow ICU to set collation and ctype separately (it would be
possible to allow it, but I don't think there's a huge demand and it's
arguably inconsistent to set them differently).

(I'm not sure whether those operations can get redirected to ICU
today
or whether they still always go to libc, but we'll surely want to fix
it eventually if the latter is still true.)

Those operations do get redirected to ICU today. There are extensions
that call locale-sensitive libc functions directly, and obviously those
won't use ICU.

Aside from the user-surprise issues discussed up to now, pg_dump
scripts
emitted by pre-v15 pg_dump are not going to contain LOCALE_PROVIDER
clauses in CREATE DATABASE, and people are going to be very unhappy
if that means they suddenly get totally different locale semantics
after restoring into a new DB.

Agreed.

I think we need some plan for mapping
libc-style locale specs into ICU locales so that we can make that
more nearly transparent.

ICU does a reasonable job mapping libc-like locale names to ICU
locales, e.g. en_US to en-US, etc. The ordering semantics aren't
guaranteed to be the same, of course (because the libc-locales are
platform-dependent), but it's at least conceptually the same locale.

Maybe this means we are not ready to do ICU-by-default in v16.
It certainly feels like there might be more here than we want to
start designing post-feature-freeze.

This thread is already on the Open Items list. As long as it's not too
disruptive to others I'll leave it as-is for now to see how this sorts
out. Right now it's not clear to me how much of this is a v15 issue vs
a v16 issue.

Regards,
Jeff Davis

#36

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#35)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

(I'm not sure whether those operations can get redirected to ICU
today
or whether they still always go to libc, but we'll surely want to fix
it eventually if the latter is still true.)

Those operations do get redirected to ICU today.

FTR the full text search parser still uses the libc functions
is[w]space/alpha/digit... that depend on lc_ctype, whether the db
collation provider is ICU or not.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#37

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Daniel Verite (#36)

Re: Order changes in PG16 since ICU introduction

"Daniel Verite" <daniel@manitou-mail.org> writes:

FTR the full text search parser still uses the libc functions
is[w]space/alpha/digit... that depend on lc_ctype, whether the db
collation provider is ICU or not.

Yeah, those aren't even connected up to the collation-selection
mechanisms; lots of work to do there. I wonder if they could be
made to use regc_pg_locale.c instead of duplicating logic.

regards, tom lane

#38

pgsql@j-davis.com

over 2 years ago

In reply to: Andrew Gierth (#28)

3 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 22:35 +0100, Andrew Gierth wrote:

Can lc_collate_is_c() be taught to check whether an ICU locale is
using
POSIX collation?

Attached are a few small patches:

0001: don't convert C to en-US-u-va-posix
0002: handle locale C the same regardless of the provider, as you
suggest above
0003: make LOCALE (or --locale) apply to everything including ICU

As far as I can tell, any libc locale has a reasonable match in ICU, so
setting LOCALE to either C or a libc locale name should be fine. Some
locales are only valid in ICU, e.g. '@colStrength=primary', or a
language tag representation, so if you do something like:

create database foo locale 'en_US@colStrenghth=primary'
template template0;

You'll get a decent error like:

ERROR: invalid LC_COLLATE locale name: "en_US@colStrenghth=primary"
HINT: If the locale name is specific to ICU, use ICU_LOCALE.

Overall, I think it works out nicely. Let me know if there are still
some confusing cases. I tried a few variations and this one seemed the
best, but I may have missed something.

Regards,
Jeff Davis

Attachments:

v2-0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchtext/x-patch; charset=UTF-8; name=v2-0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchDownload

From c768e040dc92b033e4eb0e69f08b59d8d1ffe1e4 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Apr 2023 14:03:57 -0700
Subject: [PATCH v2 1/3] ICU: do not convert locale 'C' to 'en-US-u-va-posix'.

The conversion was intended to be for convenience, but it's more
likely to be confusing than useful.

The user can still directly specify 'en-US-u-va-posix' if desired.

Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 19 +------------------
 src/bin/initdb/initdb.c                       | 17 +----------------
 .../regress/expected/collate.icu.utf8.out     |  8 ++++++++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 ++++
 4 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 51df570ce9..58c4c426bc 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2782,26 +2782,10 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		if (elevel > 0)
-			ereport(elevel,
-					(errmsg("could not get language from locale \"%s\": %s",
-							loc_str, u_errorName(status))));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2889,8 +2873,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..4086834458 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2238,24 +2238,10 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		pg_fatal("could not get language from locale \"%s\": %s",
-				 loc_str, u_errorName(status));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2327,8 +2313,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..99f12d2e73 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1020,6 +1020,7 @@ CREATE ROLE regress_test_role;
 CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' ||
@@ -1034,17 +1035,24 @@ BEGIN
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
+RESET icu_validation_level;
 RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+WARNING:  ICU locale "c" has unknown language "c"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..d9778faacc 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -358,6 +358,7 @@ CREATE SCHEMA test_schema;
 
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 
 do $$
 BEGIN
@@ -373,13 +374,16 @@ BEGIN
 END
 $$;
 
+RESET icu_validation_level;
 RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 RESET icu_validation_level;
 
-- 
2.34.1

v2-0002-ICU-support-locale-C-with-the-same-behavior-as-li.patchtext/x-patch; charset=UTF-8; name=v2-0002-ICU-support-locale-C-with-the-same-behavior-as-li.patchDownload

From 1302a4b65e4e12753ae15e732dab059afe69dbd9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH v2 2/3] ICU: support locale "C" with the same behavior as
 libc.

The "C" locale doesn't actually use a provider at all, it's a special
locale that uses memcmp() and built-in character classification. Make
it behave the same in ICU as libc (even though it doesn't actually
make use of either provider).

Discussion: https://postgr.es/m/87v8hoexdv.fsf@news-spur.riddles.org.uk
---
 src/backend/commands/collationcmds.c          | 43 ++++++----
 src/backend/commands/dbcommands.c             | 42 +++++----
 src/backend/utils/adt/pg_locale.c             | 86 ++++++++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 12 +--
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +-
 5 files changed, 131 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..7e69a889fb 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,39 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 						 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the locale string. Otherwise,
-			 * canonicalize to a language tag.
-			 */
-			if (!IsBinaryUpgrade)
+			if (pg_strcasecmp(colliculocale, "C") == 0 ||
+				pg_strcasecmp(colliculocale, "POSIX") == 0)
 			{
-				char *langtag = icu_language_tag(colliculocale,
-												 icu_validation_level);
-
-				if (langtag && strcmp(colliculocale, langtag) != 0)
+				if (!collisdeterministic)
+					ereport(ERROR,
+							(errmsg("nondeterministic collations not supported for C or POSIX locale")));
+				if (collicurules != NULL)
+					ereport(ERROR,
+							(errmsg("RULES not supported for C or POSIX locale")));
+			}
+			else
+			{
+				/*
+				 * During binary upgrade, preserve the locale
+				 * string. Otherwise, canonicalize to a language tag.
+				 */
+				if (!IsBinaryUpgrade)
 				{
-					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
-									langtag, colliculocale)));
+					char *langtag = icu_language_tag(colliculocale,
+													 icu_validation_level);
+
+					if (langtag && strcmp(colliculocale, langtag) != 0)
+					{
+						ereport(NOTICE,
+								(errmsg("using standard form \"%s\" for locale \"%s\"",
+										langtag, colliculocale)));
 
-					colliculocale = langtag;
+						colliculocale = langtag;
+					}
 				}
-			}
 
-			icu_validate_locale(colliculocale);
+				icu_validate_locale(colliculocale);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..8ef33871f0 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1058,27 +1058,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
 
-		/*
-		 * During binary upgrade, or when the locale came from the template
-		 * database, preserve locale string. Otherwise, canonicalize to a
-		 * language tag.
-		 */
-		if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
+		if (pg_strcasecmp(dbiculocale, "C") == 0 ||
+			pg_strcasecmp(dbiculocale, "POSIX") == 0)
 		{
-			char *langtag = icu_language_tag(dbiculocale,
-											 icu_validation_level);
-
-			if (langtag && strcmp(dbiculocale, langtag) != 0)
+			if (dbicurules != NULL)
+				ereport(ERROR,
+						(errmsg("ICU_RULES not supported for C or POSIX locale")));
+		}
+		else
+		{
+			/*
+			 * During binary upgrade, or when the locale came from the
+			 * template database, preserve locale string. Otherwise,
+			 * canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
 			{
-				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
-								langtag, dbiculocale)));
+				char *langtag = icu_language_tag(dbiculocale,
+												 icu_validation_level);
+
+				if (langtag && strcmp(dbiculocale, langtag) != 0)
+				{
+					ereport(NOTICE,
+							(errmsg("using standard form \"%s\" for locale \"%s\"",
+									langtag, dbiculocale)));
 
-				dbiculocale = langtag;
+					dbiculocale = langtag;
+				}
 			}
-		}
 
-		icu_validate_locale(dbiculocale);
+			icu_validate_locale(dbiculocale);
+		}
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 58c4c426bc..06e7530247 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1246,8 +1246,15 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		}
 		else
 		{
-			cache_entry->collate_is_c = false;
-			cache_entry->ctype_is_c = false;
+			Datum		datum;
+			const char *colliculocale;
+
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colliculocale);
+			colliculocale = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(colliculocale, "C") == 0) ||
+										 (strcmp(colliculocale, "POSIX") == 0));
+			cache_entry->ctype_is_c = cache_entry->collate_is_c;
 		}
 
 		cache_entry->flags_valid = true;
@@ -1279,16 +1286,27 @@ lc_collate_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_COLLATE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_COLLATE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_COLLATE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_COLLATE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1332,16 +1350,27 @@ lc_ctype_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_CTYPE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_CTYPE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_CTYPE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_CTYPE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1375,7 +1404,14 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 
-	collator = pg_ucol_open(iculocstr);
+	if (pg_strcasecmp(iculocstr, "C") == 0 ||
+		pg_strcasecmp(iculocstr, "POSIX") == 0)
+	{
+		Assert(icurules == NULL);
+		collator = NULL;
+	}
+	else
+		collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1650,6 +1686,10 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (pg_strcasecmp("C", collcollate) ||
+		pg_strcasecmp("POSIX", collcollate))
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
@@ -1668,9 +1708,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	else
 #endif
 		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
+			pg_strncasecmp("C.", collcollate, 2) != 0)
 	{
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
@@ -2457,6 +2495,14 @@ pg_ucol_open(const char *loc_str)
 	if (loc_str == NULL)
 		elog(ERROR, "opening default collator is not supported");
 
+	/*
+	 * Must never open special values C or POSIX, which are treated specially
+	 * and not passed to the provider.
+	 */
+	if (pg_strcasecmp(loc_str, "C") == 0 ||
+		pg_strcasecmp(loc_str, "POSIX") == 0)
+		elog(ERROR, "unexpected ICU locale string: %s", loc_str);
+
 	/*
 	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
 	 * the root locale. If the first component of the locale is "und", replace
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 99f12d2e73..53ab496bfe 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1042,21 +1042,21 @@ ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
-ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+ERROR:  nondeterministic collations not supported for C or POSIX locale
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
+ERROR:  RULES not supported for C or POSIX locale
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
-WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-WARNING:  ICU locale "c" has unknown language "c"
-HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 RESET icu_validation_level;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index d9778faacc..63d5352ee6 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -379,14 +379,17 @@ RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
+
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
 
-- 
2.34.1

v2-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v2-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From 1c30ea67e48bab60b7e96847ff3c24880e954471 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v2 3/3] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 ++--
 doc/src/sgml/ref/createdb.sgml                |  5 ++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++--
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++---
 src/bin/initdb/initdb.c                       | 31 ++++++++++++-------
 src/bin/scripts/createdb.c                    | 13 +++-----
 src/bin/scripts/t/020_createdb.pl             |  4 +--
 src/test/icu/t/010_database.pl                | 23 +++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 22 ++++++-------
 10 files changed, 77 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..844773ff44 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 7e69a889fb..e481f20dc8 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -288,7 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					if (langtag && strcmp(colliculocale, langtag) != 0)
 					{
 						ereport(NOTICE,
-								(errmsg("using standard form \"%s\" for locale \"%s\"",
+								(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 										langtag, colliculocale)));
 
 						colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 8ef33871f0..b447dc55f3 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1080,7 +1087,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				if (langtag && strcmp(dbiculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, dbiculocale)));
 
 					dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4086834458..1ef028617e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2157,7 +2157,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2391,7 +2395,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2407,6 +2411,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	/*
@@ -2443,14 +2449,18 @@ setlocales(void)
 			printf(_("Using default ICU locale \"%s\".\n"), icu_locale);
 		}
 
-		/* canonicalize to a language tag */
-		langtag = icu_language_tag(icu_locale);
-		printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
-			   langtag, icu_locale);
-		pg_free(icu_locale);
-		icu_locale = langtag;
-
-		icu_validate_locale(icu_locale);
+		if (pg_strcasecmp(icu_locale, "C") != 0 &&
+			pg_strcasecmp(icu_locale, "POSIX") != 0)
+		{
+			/* canonicalize to a language tag */
+			langtag = icu_language_tag(icu_locale);
+			printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
+				   langtag, icu_locale);
+			pg_free(icu_locale);
+			icu_locale = langtag;
+
+			icu_validate_locale(icu_locale);
+		}
 
 		/*
 		 * In supported builds, the ICU locale ID will be opened during
@@ -3282,7 +3292,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..9ca86a3e53 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
@@ -219,6 +211,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..3db9fe931f 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -126,7 +126,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -134,7 +134,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 53ab496bfe..ecceb6d10c 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1202,9 +1202,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1212,7 +1212,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1229,12 +1229,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1243,7 +1243,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1271,13 +1271,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1327,9 +1327,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

#39

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#38)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

Attached are a few small patches:

0001: don't convert C to en-US-u-va-posix
0002: handle locale C the same regardless of the provider, as you
suggest above
0003: make LOCALE (or --locale) apply to everything including ICU

Testing this briefly I noticed two regressions

1) all pg_collation.collversion are empty due to a trivial bug in 0002:

@ -1650,6 +1686,10 @@ get_collation_actual_version(char collprovider, const
char *collcollate)
{
char *collversion = NULL;

+	if (pg_strcasecmp("C", collcollate) ||
+		pg_strcasecmp("POSIX", collcollate))
+		return NULL;
+

This should be pg_strcasecmp(...) == 0

2) The following works with HEAD (default provider=icu) but errors out with
the patches:

postgres=# create database lat9 locale 'fr_FR@euro' encoding LATIN9 template
'template0';
ERROR: could not convert locale name "fr_FR@euro" to language tag:
U_ILLEGAL_ARGUMENT_ERROR

fr_FR@euro is a libc locale name

$ locale -a|grep fr_FR
fr_FR
fr_FR@euro
fr_FR.iso88591
fr_FR.iso885915@euro
fr_FR.utf8

I understand that fr_FR@euro is taken as an ICU locale name, with the idea
that the locale
syntax being more or less compatible between both providers, this should work
smoothly. 0003 seems to go further in the interpretation and fail on it.
TBH the assumption that it's OK to feed libc locale names to ICU feels quite
uncomfortable.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#40

https://unicode-org.atlassian.net/browse/ICU-22268
https://unicode-org.atlassian.net/browse/ICU-20187

pgsql@j-davis.com

over 2 years ago

In reply to: Daniel Verite (#39)

4 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Thu, 2023-04-27 at 14:23 +0200, Daniel Verite wrote:

This should be pg_strcasecmp(...) == 0

Good catch, thank you! Fixed in updated patches.

postgres=# create database lat9 locale 'fr_FR@euro' encoding LATIN9
template
'template0';
ERROR: could not convert locale name "fr_FR@euro" to language tag:
U_ILLEGAL_ARGUMENT_ERROR

ICU 63 and earlier convert it without error to the language tag 'fr-FR-
u-cu-eur', which is correct. ICU 64 removed support for transforming
some locale variants, because apparently they think those variants are
obsolete:

(Aside: how obsolete are those variants?)

It's frustrating that they'd remove such transformations from the
canonicalization process.

Fortunately, it looks like it's easy enough to do the transformation
ourselves. The only problematic format is '...@VARIANT'. The other
format 'fr_FR_EURO' doesn't seem to be a valid glibc locale name[1]https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html and
windows seems to use BCP 47[2]https://learn.microsoft.com/en-us/windows/win32/intl/locale-names.

And there don't seem to be a lot of variants to handle. ICU 63 only
handles 3 variants, so that's what my patch does. Any unknown variant
between 5 and 8 characters won't throw an error. There could be more
problem cases, but I'm not sure how much of a practical problem they
are.

If we try to keep the meaning of LOCALE to only LC_COLLATE and
LC_CTYPE, that will continue to be confusing for anyone that uses
provider=icu.

Regards,
Jeff Davis

[1]: https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html
https://www.gnu.org/software/libc/manual/html_node/Locale-Names.html
[2]: https://learn.microsoft.com/en-us/windows/win32/intl/locale-names
https://learn.microsoft.com/en-us/windows/win32/intl/locale-names

Attachments:

v3-0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchtext/x-patch; charset=UTF-8; name=v3-0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchDownload

From 6c0251c584edea64148604da52c8e55e43fe36e6 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Apr 2023 14:03:57 -0700
Subject: [PATCH v3 1/4] ICU: do not convert locale 'C' to 'en-US-u-va-posix'.

The conversion was intended to be for convenience, but it's more
likely to be confusing than useful.

The user can still directly specify 'en-US-u-va-posix' if desired.

Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 19 +------------------
 src/bin/initdb/initdb.c                       | 17 +----------------
 .../regress/expected/collate.icu.utf8.out     |  8 ++++++++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 ++++
 4 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 51df570ce9..58c4c426bc 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2782,26 +2782,10 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		if (elevel > 0)
-			ereport(elevel,
-					(errmsg("could not get language from locale \"%s\": %s",
-							loc_str, u_errorName(status))));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2889,8 +2873,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..4086834458 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2238,24 +2238,10 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		pg_fatal("could not get language from locale \"%s\": %s",
-				 loc_str, u_errorName(status));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2327,8 +2313,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..99f12d2e73 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1020,6 +1020,7 @@ CREATE ROLE regress_test_role;
 CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' ||
@@ -1034,17 +1035,24 @@ BEGIN
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
+RESET icu_validation_level;
 RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+WARNING:  ICU locale "c" has unknown language "c"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..d9778faacc 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -358,6 +358,7 @@ CREATE SCHEMA test_schema;
 
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 
 do $$
 BEGIN
@@ -373,13 +374,16 @@ BEGIN
 END
 $$;
 
+RESET icu_validation_level;
 RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 RESET icu_validation_level;
 
-- 
2.34.1

v3-0002-ICU-support-locale-C-with-the-same-behavior-as-li.patchtext/x-patch; charset=UTF-8; name=v3-0002-ICU-support-locale-C-with-the-same-behavior-as-li.patchDownload

From 22a8ba5748953fbc577f7aeb8d8d85d185364fb7 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH v3 2/4] ICU: support locale "C" with the same behavior as
 libc.

The "C" locale doesn't actually use a provider at all, it's a special
locale that uses memcmp() and built-in character classification. Make
it behave the same in ICU as libc (even though it doesn't actually
make use of either provider).

Discussion: https://postgr.es/m/87v8hoexdv.fsf@news-spur.riddles.org.uk
---
 src/backend/commands/collationcmds.c          | 43 ++++++----
 src/backend/commands/dbcommands.c             | 42 +++++----
 src/backend/utils/adt/pg_locale.c             | 86 ++++++++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 12 +--
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +-
 5 files changed, 131 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..7e69a889fb 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,39 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 						 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the locale string. Otherwise,
-			 * canonicalize to a language tag.
-			 */
-			if (!IsBinaryUpgrade)
+			if (pg_strcasecmp(colliculocale, "C") == 0 ||
+				pg_strcasecmp(colliculocale, "POSIX") == 0)
 			{
-				char *langtag = icu_language_tag(colliculocale,
-												 icu_validation_level);
-
-				if (langtag && strcmp(colliculocale, langtag) != 0)
+				if (!collisdeterministic)
+					ereport(ERROR,
+							(errmsg("nondeterministic collations not supported for C or POSIX locale")));
+				if (collicurules != NULL)
+					ereport(ERROR,
+							(errmsg("RULES not supported for C or POSIX locale")));
+			}
+			else
+			{
+				/*
+				 * During binary upgrade, preserve the locale
+				 * string. Otherwise, canonicalize to a language tag.
+				 */
+				if (!IsBinaryUpgrade)
 				{
-					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
-									langtag, colliculocale)));
+					char *langtag = icu_language_tag(colliculocale,
+													 icu_validation_level);
+
+					if (langtag && strcmp(colliculocale, langtag) != 0)
+					{
+						ereport(NOTICE,
+								(errmsg("using standard form \"%s\" for locale \"%s\"",
+										langtag, colliculocale)));
 
-					colliculocale = langtag;
+						colliculocale = langtag;
+					}
 				}
-			}
 
-			icu_validate_locale(colliculocale);
+				icu_validate_locale(colliculocale);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..8ef33871f0 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1058,27 +1058,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
 
-		/*
-		 * During binary upgrade, or when the locale came from the template
-		 * database, preserve locale string. Otherwise, canonicalize to a
-		 * language tag.
-		 */
-		if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
+		if (pg_strcasecmp(dbiculocale, "C") == 0 ||
+			pg_strcasecmp(dbiculocale, "POSIX") == 0)
 		{
-			char *langtag = icu_language_tag(dbiculocale,
-											 icu_validation_level);
-
-			if (langtag && strcmp(dbiculocale, langtag) != 0)
+			if (dbicurules != NULL)
+				ereport(ERROR,
+						(errmsg("ICU_RULES not supported for C or POSIX locale")));
+		}
+		else
+		{
+			/*
+			 * During binary upgrade, or when the locale came from the
+			 * template database, preserve locale string. Otherwise,
+			 * canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
 			{
-				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
-								langtag, dbiculocale)));
+				char *langtag = icu_language_tag(dbiculocale,
+												 icu_validation_level);
+
+				if (langtag && strcmp(dbiculocale, langtag) != 0)
+				{
+					ereport(NOTICE,
+							(errmsg("using standard form \"%s\" for locale \"%s\"",
+									langtag, dbiculocale)));
 
-				dbiculocale = langtag;
+					dbiculocale = langtag;
+				}
 			}
-		}
 
-		icu_validate_locale(dbiculocale);
+			icu_validate_locale(dbiculocale);
+		}
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 58c4c426bc..3e19b21122 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1246,8 +1246,15 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		}
 		else
 		{
-			cache_entry->collate_is_c = false;
-			cache_entry->ctype_is_c = false;
+			Datum		datum;
+			const char *colliculocale;
+
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colliculocale);
+			colliculocale = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(colliculocale, "C") == 0) ||
+										 (strcmp(colliculocale, "POSIX") == 0));
+			cache_entry->ctype_is_c = cache_entry->collate_is_c;
 		}
 
 		cache_entry->flags_valid = true;
@@ -1279,16 +1286,27 @@ lc_collate_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_COLLATE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_COLLATE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_COLLATE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_COLLATE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1332,16 +1350,27 @@ lc_ctype_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_CTYPE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_CTYPE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_CTYPE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_CTYPE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1375,7 +1404,14 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 
-	collator = pg_ucol_open(iculocstr);
+	if (pg_strcasecmp(iculocstr, "C") == 0 ||
+		pg_strcasecmp(iculocstr, "POSIX") == 0)
+	{
+		Assert(icurules == NULL);
+		collator = NULL;
+	}
+	else
+		collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1650,6 +1686,10 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (pg_strcasecmp("C", collcollate) == 0 ||
+		pg_strcasecmp("POSIX", collcollate) == 0)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
@@ -1668,9 +1708,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	else
 #endif
 		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
+			pg_strncasecmp("C.", collcollate, 2) != 0)
 	{
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
@@ -2457,6 +2495,14 @@ pg_ucol_open(const char *loc_str)
 	if (loc_str == NULL)
 		elog(ERROR, "opening default collator is not supported");
 
+	/*
+	 * Must never open special values C or POSIX, which are treated specially
+	 * and not passed to the provider.
+	 */
+	if (pg_strcasecmp(loc_str, "C") == 0 ||
+		pg_strcasecmp(loc_str, "POSIX") == 0)
+		elog(ERROR, "unexpected ICU locale string: %s", loc_str);
+
 	/*
 	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
 	 * the root locale. If the first component of the locale is "und", replace
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 99f12d2e73..53ab496bfe 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1042,21 +1042,21 @@ ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
-ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+ERROR:  nondeterministic collations not supported for C or POSIX locale
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
+ERROR:  RULES not supported for C or POSIX locale
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
-WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-WARNING:  ICU locale "c" has unknown language "c"
-HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 RESET icu_validation_level;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index d9778faacc..63d5352ee6 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -379,14 +379,17 @@ RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
+
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
 
-- 
2.34.1

v3-0003-ICU-fix-up-old-libc-style-locale-strings.patchtext/x-patch; charset=UTF-8; name=v3-0003-ICU-fix-up-old-libc-style-locale-strings.patchDownload

From b33dc56960378a1047ccf9c0387a1fe333912140 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 28 Apr 2023 12:22:41 -0700
Subject: [PATCH v3 3/4] ICU: fix up old libc-style locale strings.

Before transforming a locale string into a language tag, fix up old
libc-style locale strings such as 'de__PHONEBOOK' or
'fr_FR@EURO'. Older ICU versions did this automatically, but ICU
version 64 removed that support.
---
 src/backend/utils/adt/pg_locale.c             | 59 ++++++++++++++++-
 src/bin/initdb/initdb.c                       | 63 ++++++++++++++++++-
 .../regress/expected/collate.icu.utf8.out     | 11 ++++
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +++
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 3e19b21122..9f2c139b0b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2812,6 +2812,60 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 	pfree(lower_str);
 }
 
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = palloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pstrdup(loc_str);
+}
+
 #endif
 
 /*
@@ -2828,6 +2882,7 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
@@ -2844,7 +2899,7 @@ icu_language_tag(const char *loc_str, int elevel)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2864,6 +2919,8 @@ icu_language_tag(const char *loc_str, int elevel)
 		break;
 	}
 
+	pfree(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pfree(langtag);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4086834458..600c8d93f3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2229,6 +2229,64 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+#ifdef USE_ICU
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = pg_malloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pg_strdup(loc_str);
+}
+
+#endif
+
 /*
  * Convert to canonical BCP47 language tag. Must be consistent with
  * icu_language_tag().
@@ -2238,6 +2296,7 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
@@ -2254,7 +2313,7 @@ icu_language_tag(const char *loc_str)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2273,6 +2332,8 @@ icu_language_tag(const char *loc_str)
 		break;
 	}
 
+	pg_free(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pg_free(langtag);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 53ab496bfe..5f5b61d036 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1048,15 +1048,26 @@ CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W');
 ERROR:  RULES not supported for C or POSIX locale
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
+ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-cu-eur" for ICU locale "@EURO"
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-pinyin" for ICU locale "@pinyin"
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-stroke" for ICU locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 63d5352ee6..e4bbd2c009 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -382,14 +382,21 @@ CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
 CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 RESET icu_validation_level;
 
 CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
 
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
 
-- 
2.34.1

v3-0004-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v3-0004-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From 066eac039f86d95ad853aa1e5de2b34fdf688f2e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v3 4/4] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 ++--
 doc/src/sgml/ref/createdb.sgml                |  5 ++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++--
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++---
 src/bin/initdb/initdb.c                       | 31 ++++++++++++-------
 src/bin/scripts/createdb.c                    | 13 +++-----
 src/bin/scripts/t/020_createdb.pl             |  4 +--
 src/test/icu/t/010_database.pl                | 23 +++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 22 ++++++-------
 10 files changed, 77 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..844773ff44 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 7e69a889fb..e481f20dc8 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -288,7 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					if (langtag && strcmp(colliculocale, langtag) != 0)
 					{
 						ereport(NOTICE,
-								(errmsg("using standard form \"%s\" for locale \"%s\"",
+								(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 										langtag, colliculocale)));
 
 						colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 8ef33871f0..b447dc55f3 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1080,7 +1087,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				if (langtag && strcmp(dbiculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, dbiculocale)));
 
 					dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 600c8d93f3..7e316c8ba9 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2157,7 +2157,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2452,7 +2456,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2468,6 +2472,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	/*
@@ -2504,14 +2510,18 @@ setlocales(void)
 			printf(_("Using default ICU locale \"%s\".\n"), icu_locale);
 		}
 
-		/* canonicalize to a language tag */
-		langtag = icu_language_tag(icu_locale);
-		printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
-			   langtag, icu_locale);
-		pg_free(icu_locale);
-		icu_locale = langtag;
-
-		icu_validate_locale(icu_locale);
+		if (pg_strcasecmp(icu_locale, "C") != 0 &&
+			pg_strcasecmp(icu_locale, "POSIX") != 0)
+		{
+			/* canonicalize to a language tag */
+			langtag = icu_language_tag(icu_locale);
+			printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
+				   langtag, icu_locale);
+			pg_free(icu_locale);
+			icu_locale = langtag;
+
+			icu_validate_locale(icu_locale);
+		}
 
 		/*
 		 * In supported builds, the ICU locale ID will be opened during
@@ -3343,7 +3353,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..9ca86a3e53 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
@@ -219,6 +211,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..3db9fe931f 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -126,7 +126,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -134,7 +134,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 5f5b61d036..566e91d2d9 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1213,9 +1213,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1223,7 +1223,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1240,12 +1240,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1254,7 +1254,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1282,13 +1282,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1338,9 +1338,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

#41

pgsql@j-davis.com

over 2 years ago

In reply to: Jeff Davis (#40)

4 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-28 at 14:35 -0700, Jeff Davis wrote:

On Thu, 2023-04-27 at 14:23 +0200, Daniel Verite wrote:

This should be pg_strcasecmp(...) == 0

Good catch, thank you! Fixed in updated patches.

Rebased patches.

=== 0001: do not convert C to en-US-u-va-posix

I plan to commit this soon. If someone specifies "C", they are probably
expecting memcmp()-like behavior, or some kind of error/warning that it
can't be provided.

Removing this transformation means that if you specify iculocale=C,
you'll get an error or warning (depending on icu_validation_level),
because C is not a recognized icu locale. Depending on how some of the
other issues in this thread are sorted out, we may want to relax the
validation.

=== 0002: fix @euro, etc. in ICU >= 64

I'd like to commit this soon too, but I'll wait for someone to take a
look. It makes it more reliable to map libc names to icu locale names
regardless of the ICU version.

It doesn't solve the problem for locales like "de__PHONEBOOK", but
those don't seem to be a libc format (I think just an old ICU format),
so I don't see a big reason to carry it forward. It might be another
reason to turn down the validation level to WARNING, though.

=== 0003: support C memcmp() behavior with ICU provider

The current patch 0003 has a problem, because in previous postgres
versions (going all the way back), we allowed "C" as a valid ICU
locale, that would actually be passed to ICU as a locale name. But ICU
didn't recognize it, and it would end up opening the root locale. So we
can't simply redefine "C" to mean "memcmp", because that would
potentially break indexes.

I see the following potential solutions:

1. Represent the memcmp behavior with iculocale=NULL, or some other
catalog hack, so that we can distinguish between a locale "C" upgraded
from a previous version (which should pass "C" to ICU and get the root
locale), and a new collation defined with locale "C" (which should have
memcmp behavior). The catalog representation for locale information is
already complex, so I'm not excited about this option, but it will
work.

2. When provider=icu and locale=C, magically transform that into
provider=libc to get memcmp-like behavior for new collations but
preserve the existing behavior for upgraded collations. Not especially
clean, but if we issue a NOTICE perhaps that would avoid confusion.

3. Like #2, except create a new provider type "none" which may be
slightly less confusing.

=== 0004: make LOCALE apply to ICU for CREATE DATABASE

To understand this patch it helps to understand the confusing situation
with CREATE DATABASE in version 15:

The keywords LC_CTYPE and LC_COLLATE set the server environment
LC_CTYPE/LC_COLLATE for that database and can be specified regardless
of the provider. LOCALE can be specified along with (or instead of)
LC_CTYPE and LC_COLLATE, in which case whichever of LC_CTYPE or
LC_COLLATE is unspecified defaults to the setting of LOCALE. Iff the
provider is libc, LC_CTYPE and LC_COLLATE also act as the database
default collation's locale. If the provider is icu, then none of
LOCALE, LC_CTYPE, or LC_COLLATE affect the database default collation's
locale at all; that's controlled by ICU_LOCALE (which may be omitted if
the template's daticulocale is non-NULL).

The idea of patch 0004 is to address the last part, which is probably
the most confusing aspect. But for that to work smoothly, we need
something like 0003 so that LOCALE=C gives the same semantics
regardless of the provider.

Regards,
Jeff Davis

Attachments:

v4-0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchtext/x-patch; charset=UTF-8; name=v4-0001-ICU-do-not-convert-locale-C-to-en-US-u-va-posix.patchDownload

From ddda683963959a175dff17ab0e3d8519641498b9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Apr 2023 14:03:57 -0700
Subject: [PATCH v4 1/4] ICU: do not convert locale 'C' to 'en-US-u-va-posix'.

The conversion was intended to be for convenience, but it's more
likely to be confusing than useful.

The user can still directly specify 'en-US-u-va-posix' if desired.

Discussion: https://postgr.es/m/f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 19 +------------------
 src/bin/initdb/initdb.c                       | 17 +----------------
 .../regress/expected/collate.icu.utf8.out     |  8 ++++++++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 ++++
 4 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f0b6567da1..51b4221a39 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2782,26 +2782,10 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		if (elevel > 0)
-			ereport(elevel,
-					(errmsg("could not get language from locale \"%s\": %s",
-							loc_str, u_errorName(status))));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2889,8 +2873,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..4086834458 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2238,24 +2238,10 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
-	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
 
-	status = U_ZERO_ERROR;
-	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
-	{
-		pg_fatal("could not get language from locale \"%s\": %s",
-				 loc_str, u_errorName(status));
-		return NULL;
-	}
-
-	/* C/POSIX locales aren't handled by uloc_getLanguageTag() */
-	if (strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
-		return pstrdup("en-US-u-va-posix");
-
 	/*
 	 * A BCP47 language tag doesn't have a clearly-defined upper limit
 	 * (cf. RFC5646 section 4.4). Additionally, in older ICU versions,
@@ -2327,8 +2313,7 @@ icu_validate_locale(const char *loc_str)
 
 	/* check for special language name */
 	if (strcmp(lang, "") == 0 ||
-		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0 ||
-		strcmp(lang, "c") == 0 || strcmp(lang, "posix") == 0)
+		strcmp(lang, "root") == 0 || strcmp(lang, "und") == 0)
 		found = true;
 
 	/* search for matching language within ICU */
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..99f12d2e73 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1020,6 +1020,7 @@ CREATE ROLE regress_test_role;
 CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' ||
@@ -1034,17 +1035,24 @@ BEGIN
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
+RESET icu_validation_level;
 RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+WARNING:  ICU locale "c" has unknown language "c"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..d9778faacc 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -358,6 +358,7 @@ CREATE SCHEMA test_schema;
 
 -- We need to do this this way to cope with varying names for encodings:
 SET client_min_messages TO WARNING;
+SET icu_validation_level = disabled;
 
 do $$
 BEGIN
@@ -373,13 +374,16 @@ BEGIN
 END
 $$;
 
+RESET icu_validation_level;
 RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 RESET icu_validation_level;
 
-- 
2.34.1

v4-0002-ICU-fix-up-old-libc-style-locale-strings.patchtext/x-patch; charset=UTF-8; name=v4-0002-ICU-fix-up-old-libc-style-locale-strings.patchDownload

From 3db6abd0fe56e9c4b6653e04e28f6f77381c2fc8 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 28 Apr 2023 12:22:41 -0700
Subject: [PATCH v4 2/4] ICU: fix up old libc-style locale strings.

Before transforming a locale string into a language tag, fix up old
libc-style locale strings such as 'fr_FR@euro'. Older ICU versions did
this automatically, but ICU version 64 removed that support.

Discussion: https://postgr.es/m/654a49f7ff7461bcf47be4181430678d45f93858.camel%40j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 ++++++++++++++++-
 src/bin/initdb/initdb.c                       | 63 ++++++++++++++++++-
 .../regress/expected/collate.icu.utf8.out     | 11 ++++
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +++
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 51b4221a39..0e7343b28b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2766,6 +2766,60 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 	pfree(lower_str);
 }
 
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = palloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pstrdup(loc_str);
+}
+
 #endif
 
 /*
@@ -2782,6 +2836,7 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
@@ -2798,7 +2853,7 @@ icu_language_tag(const char *loc_str, int elevel)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2818,6 +2873,8 @@ icu_language_tag(const char *loc_str, int elevel)
 		break;
 	}
 
+	pfree(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pfree(langtag);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4086834458..600c8d93f3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2229,6 +2229,64 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+#ifdef USE_ICU
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = pg_malloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pg_strdup(loc_str);
+}
+
+#endif
+
 /*
  * Convert to canonical BCP47 language tag. Must be consistent with
  * icu_language_tag().
@@ -2238,6 +2296,7 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
 	const bool	 strict = true;
@@ -2254,7 +2313,7 @@ icu_language_tag(const char *loc_str)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2273,6 +2332,8 @@ icu_language_tag(const char *loc_str)
 		break;
 	}
 
+	pg_free(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pg_free(langtag);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 99f12d2e73..d520674edf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1046,6 +1046,8 @@ CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
+ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
@@ -1056,7 +1058,16 @@ HINT:  To disable ICU locale validation, set parameter icu_validation_level to D
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index d9778faacc..ab9a8484b9 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -381,12 +381,19 @@ CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, nee
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
 
-- 
2.34.1

v4-0003-ICU-support-locale-C-with-the-same-behavior-as-li.patchtext/x-patch; charset=UTF-8; name=v4-0003-ICU-support-locale-C-with-the-same-behavior-as-li.patchDownload

From 05597bea2f48cb1ef78a745401bcabdd29245b84 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH v4 3/4] ICU: support locale "C" with the same behavior as
 libc.

The "C" locale doesn't actually use a provider at all, it's a special
locale that uses memcmp() and built-in character classification. Make
it behave the same in ICU as libc (even though it doesn't actually
make use of either provider).

Discussion: https://postgr.es/m/87v8hoexdv.fsf@news-spur.riddles.org.uk
---
 src/backend/commands/collationcmds.c          | 43 ++++++----
 src/backend/commands/dbcommands.c             | 42 +++++----
 src/backend/utils/adt/pg_locale.c             | 86 ++++++++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 12 +--
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +-
 5 files changed, 131 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..7e69a889fb 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,39 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 						 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the locale string. Otherwise,
-			 * canonicalize to a language tag.
-			 */
-			if (!IsBinaryUpgrade)
+			if (pg_strcasecmp(colliculocale, "C") == 0 ||
+				pg_strcasecmp(colliculocale, "POSIX") == 0)
 			{
-				char *langtag = icu_language_tag(colliculocale,
-												 icu_validation_level);
-
-				if (langtag && strcmp(colliculocale, langtag) != 0)
+				if (!collisdeterministic)
+					ereport(ERROR,
+							(errmsg("nondeterministic collations not supported for C or POSIX locale")));
+				if (collicurules != NULL)
+					ereport(ERROR,
+							(errmsg("RULES not supported for C or POSIX locale")));
+			}
+			else
+			{
+				/*
+				 * During binary upgrade, preserve the locale
+				 * string. Otherwise, canonicalize to a language tag.
+				 */
+				if (!IsBinaryUpgrade)
 				{
-					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
-									langtag, colliculocale)));
+					char *langtag = icu_language_tag(colliculocale,
+													 icu_validation_level);
+
+					if (langtag && strcmp(colliculocale, langtag) != 0)
+					{
+						ereport(NOTICE,
+								(errmsg("using standard form \"%s\" for locale \"%s\"",
+										langtag, colliculocale)));
 
-					colliculocale = langtag;
+						colliculocale = langtag;
+					}
 				}
-			}
 
-			icu_validate_locale(colliculocale);
+				icu_validate_locale(colliculocale);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..8ef33871f0 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1058,27 +1058,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
 
-		/*
-		 * During binary upgrade, or when the locale came from the template
-		 * database, preserve locale string. Otherwise, canonicalize to a
-		 * language tag.
-		 */
-		if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
+		if (pg_strcasecmp(dbiculocale, "C") == 0 ||
+			pg_strcasecmp(dbiculocale, "POSIX") == 0)
 		{
-			char *langtag = icu_language_tag(dbiculocale,
-											 icu_validation_level);
-
-			if (langtag && strcmp(dbiculocale, langtag) != 0)
+			if (dbicurules != NULL)
+				ereport(ERROR,
+						(errmsg("ICU_RULES not supported for C or POSIX locale")));
+		}
+		else
+		{
+			/*
+			 * During binary upgrade, or when the locale came from the
+			 * template database, preserve locale string. Otherwise,
+			 * canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
 			{
-				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
-								langtag, dbiculocale)));
+				char *langtag = icu_language_tag(dbiculocale,
+												 icu_validation_level);
+
+				if (langtag && strcmp(dbiculocale, langtag) != 0)
+				{
+					ereport(NOTICE,
+							(errmsg("using standard form \"%s\" for locale \"%s\"",
+									langtag, dbiculocale)));
 
-				dbiculocale = langtag;
+					dbiculocale = langtag;
+				}
 			}
-		}
 
-		icu_validate_locale(dbiculocale);
+			icu_validate_locale(dbiculocale);
+		}
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 0e7343b28b..76ca42441d 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1246,8 +1246,15 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		}
 		else
 		{
-			cache_entry->collate_is_c = false;
-			cache_entry->ctype_is_c = false;
+			Datum		datum;
+			const char *colliculocale;
+
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colliculocale);
+			colliculocale = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(colliculocale, "C") == 0) ||
+										 (strcmp(colliculocale, "POSIX") == 0));
+			cache_entry->ctype_is_c = cache_entry->collate_is_c;
 		}
 
 		cache_entry->flags_valid = true;
@@ -1279,16 +1286,27 @@ lc_collate_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_COLLATE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_COLLATE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_COLLATE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_COLLATE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1332,16 +1350,27 @@ lc_ctype_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_CTYPE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_CTYPE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_CTYPE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_CTYPE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1375,7 +1404,14 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 
-	collator = pg_ucol_open(iculocstr);
+	if (pg_strcasecmp(iculocstr, "C") == 0 ||
+		pg_strcasecmp(iculocstr, "POSIX") == 0)
+	{
+		Assert(icurules == NULL);
+		collator = NULL;
+	}
+	else
+		collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1650,6 +1686,10 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (pg_strcasecmp("C", collcollate) == 0 ||
+		pg_strcasecmp("POSIX", collcollate) == 0)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
@@ -1668,9 +1708,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	else
 #endif
 		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
+			pg_strncasecmp("C.", collcollate, 2) != 0)
 	{
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
@@ -2457,6 +2495,14 @@ pg_ucol_open(const char *loc_str)
 	if (loc_str == NULL)
 		elog(ERROR, "opening default collator is not supported");
 
+	/*
+	 * Must never open special values C or POSIX, which are treated specially
+	 * and not passed to the provider.
+	 */
+	if (pg_strcasecmp(loc_str, "C") == 0 ||
+		pg_strcasecmp(loc_str, "POSIX") == 0)
+		elog(ERROR, "unexpected ICU locale string: %s", loc_str);
+
 	/*
 	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
 	 * the root locale. If the first component of the locale is "und", replace
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d520674edf..f217658151 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1042,8 +1042,10 @@ ERROR:  parameter "locale" must be specified
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
-ERROR:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+ERROR:  nondeterministic collations not supported for C or POSIX locale
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
+ERROR:  RULES not supported for C or POSIX locale
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
@@ -1051,16 +1053,14 @@ ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMEN
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
-WARNING:  could not convert locale name "c" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-WARNING:  ICU locale "c" has unknown language "c"
-HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
 NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index ab9a8484b9..e4bbd2c009 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -379,16 +379,19 @@ RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-CREATE COLLATION testx (provider = icu, locale = 'c'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', deterministic = false); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'c', rules = '&V << w <<< W'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
-CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
+
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
-- 
2.34.1

v4-0004-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v4-0004-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From 310bdcd136e44bfca1eea4da5181886eac02d52d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v4 4/4] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 ++--
 doc/src/sgml/ref/createdb.sgml                |  5 ++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++--
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++---
 src/bin/initdb/initdb.c                       | 31 ++++++++++++-------
 src/bin/scripts/createdb.c                    | 13 +++-----
 src/bin/scripts/t/020_createdb.pl             |  4 +--
 src/test/icu/t/010_database.pl                | 23 +++++++++-----
 .../regress/expected/collate.icu.utf8.out     | 28 ++++++++---------
 10 files changed, 80 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..844773ff44 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 7e69a889fb..e481f20dc8 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -288,7 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					if (langtag && strcmp(colliculocale, langtag) != 0)
 					{
 						ereport(NOTICE,
-								(errmsg("using standard form \"%s\" for locale \"%s\"",
+								(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 										langtag, colliculocale)));
 
 						colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 8ef33871f0..b447dc55f3 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1080,7 +1087,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				if (langtag && strcmp(dbiculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, dbiculocale)));
 
 					dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 600c8d93f3..7e316c8ba9 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2157,7 +2157,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2452,7 +2456,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2468,6 +2472,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	/*
@@ -2504,14 +2510,18 @@ setlocales(void)
 			printf(_("Using default ICU locale \"%s\".\n"), icu_locale);
 		}
 
-		/* canonicalize to a language tag */
-		langtag = icu_language_tag(icu_locale);
-		printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
-			   langtag, icu_locale);
-		pg_free(icu_locale);
-		icu_locale = langtag;
-
-		icu_validate_locale(icu_locale);
+		if (pg_strcasecmp(icu_locale, "C") != 0 &&
+			pg_strcasecmp(icu_locale, "POSIX") != 0)
+		{
+			/* canonicalize to a language tag */
+			langtag = icu_language_tag(icu_locale);
+			printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
+				   langtag, icu_locale);
+			pg_free(icu_locale);
+			icu_locale = langtag;
+
+			icu_validate_locale(icu_locale);
+		}
 
 		/*
 		 * In supported builds, the ICU locale ID will be opened during
@@ -3343,7 +3353,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..9ca86a3e53 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
@@ -219,6 +211,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..3db9fe931f 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -126,7 +126,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -134,7 +134,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index f217658151..566e91d2d9 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1063,11 +1063,11 @@ CREATE COLLATION testx (provider = icu, locale = 'c'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'posix'); DROP COLLATION testx;
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+NOTICE:  using standard form "und-u-cu-eur" for ICU locale "@EURO"
 CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+NOTICE:  using standard form "und-u-co-pinyin" for ICU locale "@pinyin"
 CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
+NOTICE:  using standard form "und-u-co-stroke" for ICU locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
@@ -1213,9 +1213,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1223,7 +1223,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1240,12 +1240,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1254,7 +1254,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1282,13 +1282,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1338,9 +1338,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

#42

/messages/by-id/e861ac4fdae9f9f5ce2a938a37bcb5e083f0f489.camel@cybertec.at

pgsql@j-davis.com

over 2 years ago

In reply to: Robert Haas (#32)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-04-21 at 20:12 -0400, Robert Haas wrote:

On Fri, Apr 21, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:

Most of the complaints seem to be complaints about v15 as well, and
while those complaints may be a reason to not make ICU the default,
they are also an argument that we should continue to learn and try
to
fix those issues because they exist in an already-released version.
Leaving it the default for now will help us fix those issues rather
than hide them.

It's still early, so we have plenty of time to revert the initdb
default if we need to.

That's fair enough, but I really think it's important that some
energy
get invested in providing adequate documentation for this stuff. Just
patching the code is not enough.

Attached a significant documentation patch.

I tried to make it comprehensive without trying to be exhaustive, and I
separated the explanation of language tags from what collation settings
you can include in a language tag, so hopefully that's more clear.

I added quite a few examples spread throughout the various sections,
and I preserved the existing examples at the end. I also left all of
the external links at the bottom for those interested enough to go
beyond what's there.

I didn't add additional documentation for ICU rules. There are so many
options for collations that it's hard for me to think of realistic
examples to specify the rules directly, unless someone wants to invent
a new language. Perhaps useful if working with an interesting text file
format with special treatment for delimiters?

I asked the question about rules here:

and got some limited response about addressing sort complaints. That
sounds reasonable, but a lot of that can also be handled just by
specifying the right collation settings. Someone who understands the
use case better could add some more documentation.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

v1-0001-Doc-improvements-for-language-tags-and-custom-ICU.patchtext/x-patch; charset=UTF-8; name=v1-0001-Doc-improvements-for-language-tags-and-custom-ICU.patchDownload

From b09515bfaf5e9de330138ec4a627d02a7947de1a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 27 Apr 2023 14:43:46 -0700
Subject: [PATCH v1] Doc improvements for language tags and custom ICU
 collations.

Separate the documentation for language tags from the documentaiton
for the available collation settings which can be included in a
language tag.

Include tables of the available options, more details about the
effects of each option, and additional examples.

Also include an explanation of the "levels" of textual features and
how they relate to collation.
---
 doc/src/sgml/charset.sgml | 656 +++++++++++++++++++++++++++++++-------
 1 file changed, 535 insertions(+), 121 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 6dd95b8966..be74064168 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -377,7 +377,125 @@ initdb --locale-provider=icu --icu-locale=en
     variants and customization options.
    </para>
   </sect2>
+  <sect2 id="icu-locales">
+   <title>ICU Locales</title>
+   <sect3 id="icu-locale-names">
+    <title>ICU Locale Names</title>
+    <para>
+     The ICU format for the locale name is a <link
+     linkend="icu-language-tag">Language Tag</link>.
+
+<programlisting>
+CREATE COLLATION mycollation1 (PROVIDER = icu, LOCALE = 'ja-JP);
+CREATE COLLATION mycollation2 (PROVIDER = icu, LOCALE = 'fr');
+</programlisting>
+    </para>
+   </sect3>
+   <sect3 id="icu-canonicalization">
+    <title>Locale Canonicalization and Validation</title>
+    <para>
+     When defining a new ICU collation object or database with ICU as the
+     provider, the given locale name is transformed ("canonicalized") into a
+     language tag if not already in that form. For instance,
+
+<screen>
+CREATE COLLATION mycollation3 (PROVIDER = icu, LOCALE = 'en-US-u-kn-true');
+NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
+CREATE COLLATION mycollation4 (PROVIDER = icu, LOCALE = 'de_DE.utf8');
+NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"
+</screen>
+
+     If you see such a message, ensure that the <symbol>PROVIDER</symbol> and
+     <symbol>LOCALE</symbol> are as you expect, and consider specifying
+     directly as the canonical language tag instead of relying on the
+     transformation.
+    </para>
+    <note>
+     <para>
+      ICU can transform most libc locale names, as well as some other formats,
+      into language tags for easier transition to ICU. If a libc locale name
+      is used in ICU, it may not have precisely the same behavior as in libc.
+     </para>
+    </note>
+    <para>
+     If there is some problem interpreting the locale name, or if it represents
+     a language or region that ICU does not recognize, a message will be reported:
 
+<screen>
+SET icu_validation_level = ERROR;
+CREATE COLLATION nonsense (PROVIDER = icu, LOCALE = 'nonsense');
+ERROR:  ICU locale "nonsense" has unknown language "nonsense"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+</screen>
+
+     <xref
+     linkend="guc-icu-validation-level"/> controls how the message is
+     reported. If set below <literal>ERROR</literal>, the collation will still
+     be created, but the behavior may not be what the user intended.
+    </para>
+   </sect3>
+   <sect3 id="icu-language-tag">
+    <title>Language Tag</title>
+    <para>
+     Basic language tags are simply
+     <replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
+     or even just <replaceable>language</replaceable>. The
+     <replaceable>language</replaceable> is a language code
+     (e.g. <literal>fr</literal> for French or <literal>und</literal> for
+     "undefined"), and <replaceable>region</replaceable> is a region code
+     (e.g. <literal>CA</literal> for Canada). Examples:
+     <literal>ja-JP</literal>, <literal>de</literal>, or
+     <literal>fr-CA</literal>.
+    </para>
+    <para>
+     Collation settings may be included in the language tag to customize
+     collation behavior. ICU allows extensive customization, such as
+     sensitivity (or insensitivity) to accents, case, and punctuation;
+     treatment of digits within text; and many other options to satisfy a
+     variety of uses.
+    </para>
+    <para>
+     To include this additional collation information in a language tag,
+     append <literal>-u</literal>, followed by one or more
+     <literal>-</literal><replaceable>key</replaceable><literal>-</literal><replaceable>value</replaceable>
+     pairs, where <replaceable>key</replaceable> is the key for a collation
+     setting and <replaceable>value</replaceable> is a valid value for that
+     setting. For boolean settings, the
+     <literal>-</literal><replaceable>key</replaceable> may be specified
+     without a corresponding
+     <literal>-</literal><replaceable>value</replaceable>, which implies a
+     value of <literal>true</literal>.
+    </para>
+    <para>
+     For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
+     means the locale with the English language in the US region, with
+     collation settings <literal>kn</literal> set to <literal>true</literal>
+     and <literal>ks</literal> set to <literal>level2</literal>. Those
+     settings mean the collation will be case-insensitive and treat a sequence
+     of digits as a single number:
+
+<screen>
+CREATE COLLATION mycollation5 (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'en-US-u-kn-ks-level2');
+SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+
+SELECT 'N-45' &lt; 'N-123' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+</screen>
+    </para>
+    <para>
+     See <xref linkend="icu-custom-collations"/> for details and additional
+     examples of using language tags with custom collation information for the
+     locale.
+    </para>
+   </sect3>
+  </sect2>
   <sect2 id="locale-problems">
    <title>Problems</title>
 
@@ -658,6 +776,13 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
     code byte values.
    </para>
 
+   <note>
+    <para>
+     The <literal>C</literal> and <literal>POSIX</literal> locales may behave
+     differently depending on the database encoding.
+    </para>
+   </note>
+
    <para>
     Additionally, two SQL standard collation names are available:
 
@@ -869,132 +994,23 @@ CREATE COLLATION german (provider = libc, locale = 'de_DE');
    <sect4 id="collation-managing-create-icu">
     <title>ICU Collations</title>
 
-   <para>
-    ICU allows collations to be customized beyond the basic language+country
-    set that is preloaded by <command>initdb</command>.  Users are encouraged
-    to define their own collation objects that make use of these facilities to
-    suit the sorting behavior to their requirements.
-    See <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
-    and <ulink url="https://unicode-org.github.io/icu/userguide/collation/api.html"></ulink> for
-    information on ICU locale naming.  The set of acceptable names and
-    attributes depends on the particular ICU version.
-   </para>
-
-   <para>
-    Here are some examples:
-
-    <variablelist>
-     <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
-      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
-      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');</literal></term>
-      <listitem>
-       <para>German collation with phone book collation type</para>
-       <para>
-        The first example selects the ICU locale using a <quote>language
-        tag</quote> per BCP 47.  The second example uses the traditional
-        ICU-specific locale syntax.  The first style is preferred going
-        forward, and is used internally to store locales.
-       </para>
-       <para>
-        Note that you can name the collation objects in the SQL environment
-        anything you want.  In this example, we follow the naming style that
-        the predefined collations use, which in turn also follow BCP 47, but
-        that is not required for user-defined collations.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
-      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
-      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');</literal></term>
-      <listitem>
-       <para>
-        Root collation with Emoji collation type, per Unicode Technical Standard #51
-       </para>
-       <para>
-        Observe how in the traditional ICU locale naming system, the root
-        locale is selected by an empty string.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
-      <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
-      <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en@colReorder=grek-latn');</literal></term>
-      <listitem>
-       <para>
-        Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
-      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
-      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');</literal></term>
-      <listitem>
-       <para>
-        Sort upper-case letters before lower-case letters.  (The default is
-        lower-case letters first.)
-       </para>
-      </listitem>
-     </varlistentry>
-
-    <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
-      <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
-      <term><literal>CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=grek-latn');</literal></term>
-      <listitem>
-       <para>
-        Combines both of the above options.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kn-true">
-      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');</literal></term>
-      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');</literal></term>
-      <listitem>
-       <para>
-        Numeric ordering, sorts sequences of digits by their numeric value,
-        for example: <literal>A-21</literal> &lt; <literal>A-123</literal>
-        (also known as natural sort).
-       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>
-
-    See <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
-    Technical Standard #35</ulink>
-    and <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink> for
-    details.  The list of possible collation types (<literal>co</literal>
-    subtag) can be found in
-    the <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
-    repository</ulink>.
-   </para>
+    <para>
+     ICU collations can be created like:
 
-   <para>
-    Note that while this system allows creating collations that <quote>ignore
-    case</quote> or <quote>ignore accents</quote> or similar (using the
-    <literal>ks</literal> key), in order for such collations to act in a
-    truly case- or accent-insensitive manner, they also need to be declared as not
-    <firstterm>deterministic</firstterm> in <command>CREATE COLLATION</command>;
-    see <xref linkend="collation-nondeterministic"/>.
-    Otherwise, any strings that compare equal according to the collation but
-    are not byte-wise equal will be sorted according to their byte values.
-   </para>
+<programlisting>
+CREATE COLLATION german (provider = icu, locale = 'de-DE');
+</programlisting>
 
-   <note>
+     ICU locales are specified as a <link linkend="icu-language-tag">Language
+     Tag</link>, but can also accept most libc-style locale names (which will
+     be transformed into language tags if possible).
+    </para>
     <para>
-     By design, ICU will accept almost any string as a locale name and match
-     it to the closest locale it can provide, using the fallback procedure
-     described in its documentation.  Thus, there will be no direct feedback
-     if a collation specification is composed using features that the given
-     ICU installation does not actually support.  It is therefore recommended
-     to create application-level test cases to check that the collation
-     definitions satisfy one's requirements.
+     New ICU collations can customize collation behavior extensively by
+     including collation attributes in the langugage tag. See <xref
+     linkend="icu-custom-collations"/> for details and examples.
     </para>
-   </note>
    </sect4>
-
    <sect4 id="collation-copy">
    <title>Copying Collations</title>
 
@@ -1072,6 +1088,404 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
     </tip>
    </sect3>
   </sect2>
+  <sect2 id="icu-custom-collations">
+   <title>ICU Custom Collations</title>
+
+   <para>
+    ICU allows extensive control over collation behavior by defining new
+    collations with collation settings as a part of the language tag. These
+    settings can modify the collation order to suit a variety of needs. For
+    instance:
+
+<programlisting>
+-- ignore differences in accents and case
+CREATE COLLATION ignore_accent_case (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'und-u-ks-level1');
+SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
+SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
+
+-- upper case letters sort before lower case.
+CREATE COLLATION upper_first (PROVIDER=icu, LOCALE = 'und-u-kf-upper');
+SELECT 'B' &lt; 'b' COLLATE upper_first; -- true
+
+-- treat digits numerically and ignore punctuation
+CREATE COLLATION num_ignore_punct (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'und-u-ka-shifted-kn');
+SELECT 'id-45' &lt; 'id-123' COLLATE num_ignore_punct; -- true
+SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
+</programlisting>
+
+    Many of the available options are described in <xref
+    linkend="icu-collation-settings"/>, or see <xref
+    linkend="icu-external-references"/> for more details.
+   </para>
+   <sect3 id="icu-collation-comparison-levels">
+    <title>ICU Comparison Levels</title>
+    <para>
+     Comparison of two strings (collation) in ICU is determined by a
+     multi-level process, where textual features are grouped into
+     "levels". Treatment of each level is controlled by the <link
+     linkend="icu-collation-settings-table">collation settings</link>. Higher
+     levels correspond to finer textual features.
+    </para>
+    <para>
+     <table id="icu-collation-levels">
+      <title>ICU Collation Levels</title>
+      <tgroup cols="3">
+       <thead>
+        <row>
+         <entry>Level</entry>
+         <entry>Description</entry>
+         <entry><literal>'f' = 'f'</literal></entry>
+         <entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
+         <entry><literal>'x-y' = 'x_y'</literal></entry>
+         <entry><literal>'g' = 'G'</literal></entry>
+         <entry><literal>'n' = 'ñ'</literal></entry>
+         <entry><literal>'y' = 'z'</literal></entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+         <entry>level1</entry>
+         <entry>Base Character</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level2</entry>
+         <entry>Accents</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level3</entry>
+         <entry>Case/Variants</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level4</entry>
+         <entry>Punctuation</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>identic</entry>
+         <entry>All</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+
+     The above table shows which textual feature differences are
+     considered significant when determining equality at the given level. The
+     unicode character <literal>U+2063</literal> is an invisible separator,
+     and as seen in the table, is ignored for at all levels of comparison less
+     than <literal>identic</literal>.
+    </para>
+    <para>
+     Examples:
+
+<programlisting>
+CREATE COLLATION level3 (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-level3');
+CREATE COLLATION level4 (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-level4');
+CREATE COLLATION identic (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-identic');
+
+-- invisible separator ignored at all levels except identic
+SELECT 'ab' = U&amp;'a\2063b' COLLATE level4; -- true
+SELECT 'ab' = U&amp;'a\2063b' COLLATE identic; -- false
+
+-- punctuation ignored at level3 but not at level 4
+SELECT 'x-y' = 'x_y' COLLATE level3; -- true
+SELECT 'x-y' = 'x_y' COLLATE level4; -- false
+</programlisting>
+
+    </para>
+    <note>
+     <para>
+      For many collation settings, you must create the collation with
+      <option>DETERMINISTIC</option> set to <literal>false</literal> for the
+      setting to have the desired effect. Additionally, some settings only
+      take effect when the key <literal>ka</literal> is set to
+      <literal>shifted</literal> (see <xref
+      linkend="icu-collation-settings-table"/>).
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="icu-collation-settings">
+    <title>Collation Settings for an ICU Locale</title>
+    <para>
+     <table id="icu-collation-settings-table">
+      <title>ICU Collation Settings</title>
+      <tgroup cols="4">
+       <thead>
+        <row>
+         <entry>Key</entry>
+         <entry>Values</entry>
+         <entry>Default</entry>
+         <entry>Description</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+         <entry><literal>co</literal></entry>
+         <entry><literal>emoji</literal>, <literal>phonebk</literal>, <literal>standard</literal>, <replaceable>...</replaceable></entry>
+         <entry><literal>standard</literal></entry>
+         <entry>
+          Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>ks</literal></entry>
+         <entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
+         <entry><literal>level3</literal></entry>
+         <entry>
+          Sensitivity when determining equality, with
+          <literal>level1</literal> the least sensitive and
+          <literal>identic</literal> the most sensitive. See <xref
+          linkend="icu-collation-levels"/> for details.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>ka</literal></entry>
+         <entry><literal>noignore</literal>, <literal>shifted</literal></entry>
+         <entry><literal>noignore</literal></entry>
+         <entry>
+          If set to <literal>shifted</literal>, causes some characters
+          (e.g. punctuation or space) to be ignored in comparison. Key
+          <literal>ks</literal> must be set to <literal>level3</literal> or
+          lower to take effect. Set key <literal>kv</literal> to control which
+          character classes are ignored.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kb</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          Backwards comparison for the level 2 differences. For example,
+          locale <literal>und-u-kb</literal> sorts <literal>'àe'</literal>
+          before <literal>'aé'</literal>.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kk</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Enable full normalization; may affect performance. Basic
+           normalization is performed even when set to
+           <literal>false</literal>.
+          </para>
+          <para>
+           Full normalization is important in some cases, such as when
+           multiple accents are applied to a single character (e.g. in
+           Vietnamese or Arabic). Locales for languages that require full
+           normalization typically enable it by default.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kc</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Separates case into a "level 2.5" that falls between accents and
+           other level 3 features.
+          </para>
+          <para>
+           If set to <literal>true</literal> and <literal>ks</literal> is set
+           to <literal>level1</literal>, will ignore accents but take case
+           into account.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kf</literal></entry>
+         <entry>
+          <literal>upper</literal>, <literal>lower</literal>,
+          <literal>false</literal>
+         </entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>upper</literal>, upper case sorts before lower
+          case. If set to <literal>lower</literal>, lower case sorts before
+          upper case. If set to <literal>false</literal>, it depends on the
+          locale.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kn</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>true</literal>, numbers within a string are
+          treated as a single numeric value rather than a sequence of
+          digits. For example, <literal>'id-45'</literal> sorts before
+          <literal>'id-123'</literal>.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kr</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>,
+          <literal>digit</literal>, <replaceable>script-id</replaceable>
+         </entry>
+         <entry></entry>
+         <entry>
+          <para>
+           Set to one or more of the valid values, or any BCP 47
+           <replaceable>script-id</replaceable>, e.g. <literal>latn</literal>
+           ("Latin") or <literal>grek</literal> ("Greek"). Multiple values are
+           separated by "<literal>-</literal>".
+          </para>
+          <para>
+           Redefines the ordering of classes of characters; those characters
+           belonging to a class earlier in the list sort before characters
+           belonging to a class later in the list. For instance, the value
+           <literal>digit-currency-space</literal> (as part of a language tag
+           like <literal>und-u-kr-digit-currency-space</literal>) sorts
+           punctuation before digits and spaces.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kv</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>
+         </entry>
+         <entry><literal>punct</literal></entry>
+         <entry>
+          Classes of characters ignored during comparison at level 3. Setting
+          to a later value includes earlier values;
+          e.g. <literal>symbol</literal> also includes
+          <literal>punct</literal> and <literal>space</literal> in the
+          characters to be ignored. Key <literal>ka</literal> must be set to
+          <literal>shifted</literal> and key <literal>ks</literal> must be set
+          to <literal>level3</literal> or lower to take effect.
+         </entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+      Defaults may depend on locale. The above table is not meant to be
+      complete. See <xref linkend="icu-external-references"/> for additinal
+      options and details.
+    </para>
+   </sect3>
+   <sect3 id="icu-locale-examples">
+    <title>Examples</title>
+    <para>
+     <variablelist>
+      <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
+       <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
+       <listitem>
+        <para>German collation with phone book collation type</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
+       <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
+       <listitem>
+        <para>
+         Root collation with Emoji collation type, per Unicode Technical Standard #51
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
+       <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
+       <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
+       <listitem>
+        <para>
+         Sort upper-case letters before lower-case letters.  (The default is
+         lower-case letters first.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
+       <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Combines both of the above options.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </para>
+   </sect3>
+   <sect3 id="icu-external-references">
+    <title>External References for ICU</title>
+    <para>
+     This section (<xref linkend="icu-custom-collations"/>) is only a brief
+     overview of ICU behavior and language tags. Refer to the following
+     documents for technical details, additional options, and new behavior:
+    </para>
+    <itemizedlist>
+     <listitem>
+      <para>
+       <ulink
+           url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
+       Technical Standard #35</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
+       repository</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/collation/api.html"></ulink>
+      </para>
+     </listitem>
+    </itemizedlist>
+   </sect3>
+  </sect2>
  </sect1>
 
  <sect1 id="multibyte">
-- 
2.34.1

#43

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=siskin&dt=2023-05-08%2020%3A09%3A26

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Jeff Davis (#41)

Re: Order changes in PG16 since ICU introduction

Jeff Davis <pgsql@j-davis.com> writes:

=== 0001: do not convert C to en-US-u-va-posix

I plan to commit this soon.

Several buildfarm animals have failed since this went in. The
only one showing enough info to diagnose is siskin [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=siskin&dt=2023-05-08%2020%3A09%3A26:

@@ -1043,16 +1043,15 @@
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'C'); -- fails
-ERROR:  could not convert locale name "C" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+NOTICE:  using standard form "en-US-u-va-posix" for locale "C"
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+ERROR:  collation "testx" already exists
 CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
-WARNING:  could not convert locale name "C" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-WARNING:  ICU locale "C" has unknown language "c"
-HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+NOTICE:  using standard form "en-US-u-va-posix" for locale "C"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.

I suppose this is environment-dependent. Sadly, the buildfarm
client does not show the prevailing LANG or LC_XXX settings.

regards, tom lane

#44

pgsql@j-davis.com

over 2 years ago

In reply to: Tom Lane (#43)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-05-08 at 17:47 -0400, Tom Lane wrote:

-ERROR:  could not convert locale name "C" to language tag:
U_ILLEGAL_ARGUMENT_ERROR
+NOTICE:  using standard form "en-US-u-va-posix" for locale "C"

...

I suppose this is environment-dependent. Sadly, the buildfarm
client does not show the prevailing LANG or LC_XXX settings.

Looks like it's failing-to-fail on some versions of ICU which
automatically perform that conversion.

The easiest thing to do is revert it for now, and after we sort out the
memcmp() path for the ICU provider, then I can commit it again (after
that point it would just be code cleanup and should have no functional
impact).

Regards,
Jeff Davis

#45

Alvaro Herrera

alvherre@alvh.no-ip.org

over 2 years ago

In reply to: Peter Eisentraut (#33)

Re: Order changes in PG16 since ICU introduction

On 2023-Apr-24, Peter Eisentraut wrote:

The GUC settings lc_collate and lc_ctype are from a time when those locale
settings were cluster-global. When we made those locale settings
per-database (PG 8.4), we kept them as read-only. As of PG 15, you can use
ICU as the per-database locale provider, so what is being attempted in the
above example is already meaningless before PG 16, since you need to look
into pg_database to find out what is really happening.

I think we should just remove the GUC parameters lc_collate and lc_ctype.

I agree with removing these in v16, since they are going to become more
meaningless and confusing.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#46

pgsql@j-davis.com

over 2 years ago

In reply to: Alvaro Herrera (#45)

Re: Order changes in PG16 since ICU introduction

On Tue, 2023-05-09 at 10:25 +0200, Alvaro Herrera wrote:

I agree with removing these in v16, since they are going to become
more
meaningless and confusing.

Agreed, but it would be nice to have an alternative that does the right
thing.

It's awkward for a user to read pg_database.datlocprovider, then
depending on that, either look in datcollate or daticulocale. (It's
awkward in the code, too.)

Maybe some built-in function that returns a tuple of the default
provider, the locale, and the version? Or should we also output the
ctype somehow (which affects the results of upper()/lower())?

Regards,
Jeff Davis

#47

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Alvaro Herrera (#45)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On 09.05.23 10:25, Alvaro Herrera wrote:

On 2023-Apr-24, Peter Eisentraut wrote:

The GUC settings lc_collate and lc_ctype are from a time when those locale
settings were cluster-global. When we made those locale settings
per-database (PG 8.4), we kept them as read-only. As of PG 15, you can use
ICU as the per-database locale provider, so what is being attempted in the
above example is already meaningless before PG 16, since you need to look
into pg_database to find out what is really happening.

I think we should just remove the GUC parameters lc_collate and lc_ctype.

I agree with removing these in v16, since they are going to become more
meaningless and confusing.

Here is my proposed patch for this.

Attachments:

0001-Remove-read-only-server-settings-lc_collate-and-lc_c.patchtext/plain; charset=UTF-8; name=0001-Remove-read-only-server-settings-lc_collate-and-lc_c.patchDownload

From b548a671ad02a5c851a4984db6e4535a0b70f881 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 11 May 2023 13:02:02 +0200
Subject: [PATCH] Remove read-only server settings lc_collate and lc_ctype

The GUC settings lc_collate and lc_ctype are from a time when those
locale settings were cluster-global.  When those locale settings were
made per-database (PG 8.4), the settings were kept as read-only.  As
of PG 15, you can use ICU as the per-database locale provider, so
examining these settings is already meaningless, since you need to
look into pg_database to find out what is really happening.

Discussion: https://www.postgresql.org/message-id/696054d1-bc88-b6ab-129a-18b8bce6a6f0@enterprisedb.com
---
 contrib/citext/expected/citext_utf8.out       |  4 +--
 contrib/citext/expected/citext_utf8_1.out     |  4 +--
 contrib/citext/sql/citext_utf8.sql            |  4 +--
 doc/src/sgml/config.sgml                      | 32 -------------------
 src/backend/utils/init/postinit.c             |  4 ---
 src/backend/utils/misc/guc_tables.c           | 26 ---------------
 .../regress/expected/collate.icu.utf8.out     |  4 +--
 .../regress/expected/collate.linux.utf8.out   |  6 ++--
 .../expected/collate.windows.win1252.out      |  6 ++--
 src/test/regress/sql/collate.icu.utf8.sql     |  4 +--
 src/test/regress/sql/collate.linux.utf8.sql   |  6 ++--
 .../regress/sql/collate.windows.win1252.sql   |  6 ++--
 12 files changed, 22 insertions(+), 84 deletions(-)

diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out
index 77b4586d8f..6630e09a4d 100644
--- a/contrib/citext/expected/citext_utf8.out
+++ b/contrib/citext/expected/citext_utf8.out
@@ -8,8 +8,8 @@
  * to the "tr-TR-x-icu" collation where it will succeed.
  */
 SELECT getdatabaseencoding() <> 'UTF8' OR
-       current_setting('lc_ctype') = 'C' OR
-       (SELECT datlocprovider='i' FROM pg_database
+       (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'
+        FROM pg_database
         WHERE datname=current_database())
        AS skip_test \gset
 \if :skip_test
diff --git a/contrib/citext/expected/citext_utf8_1.out b/contrib/citext/expected/citext_utf8_1.out
index d1e1fe1a9d..3caa7a00d4 100644
--- a/contrib/citext/expected/citext_utf8_1.out
+++ b/contrib/citext/expected/citext_utf8_1.out
@@ -8,8 +8,8 @@
  * to the "tr-TR-x-icu" collation where it will succeed.
  */
 SELECT getdatabaseencoding() <> 'UTF8' OR
-       current_setting('lc_ctype') = 'C' OR
-       (SELECT datlocprovider='i' FROM pg_database
+       (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'
+        FROM pg_database
         WHERE datname=current_database())
        AS skip_test \gset
 \if :skip_test
diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql
index 8530c68dd7..1f51df134b 100644
--- a/contrib/citext/sql/citext_utf8.sql
+++ b/contrib/citext/sql/citext_utf8.sql
@@ -9,8 +9,8 @@
  */
 
 SELECT getdatabaseencoding() <> 'UTF8' OR
-       current_setting('lc_ctype') = 'C' OR
-       (SELECT datlocprovider='i' FROM pg_database
+       (SELECT (datlocprovider = 'c' AND datctype = 'C') OR datlocprovider = 'i'
+        FROM pg_database
         WHERE datname=current_database())
        AS skip_test \gset
 \if :skip_test
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 909a3f28c7..3e9030e3d7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -10788,38 +10788,6 @@ <title>Preset Options</title>
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-lc-collate" xreflabel="lc_collate">
-      <term><varname>lc_collate</varname> (<type>string</type>)
-      <indexterm>
-       <primary><varname>lc_collate</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Reports the locale in which sorting of textual data is done.
-        See <xref linkend="locale"/> for more information.
-        This value is determined when a database is created.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="guc-lc-ctype" xreflabel="lc_ctype">
-      <term><varname>lc_ctype</varname> (<type>string</type>)
-      <indexterm>
-       <primary><varname>lc_ctype</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Reports the locale that determines character classifications.
-        See <xref linkend="locale"/> for more information.
-        This value is determined when a database is created.
-        Ordinarily this will be the same as <varname>lc_collate</varname>,
-        but for special applications it might be set differently.
-       </para>
-      </listitem>
-     </varlistentry>
-
      <varlistentry id="guc-max-function-args" xreflabel="max_function_args">
       <term><varname>max_function_args</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 53420f4974..df81e35eb8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -483,10 +483,6 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 							 quote_identifier(name))));
 	}
 
-	/* Make the locale settings visible as GUC variables, too */
-	SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
-	SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
-
 	ReleaseSysCache(tup);
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 5f90aecd47..23d4b38e72 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -563,8 +563,6 @@ static char *syslog_ident_str;
 static double phony_random_seed;
 static char *client_encoding_string;
 static char *datestyle_string;
-static char *locale_collate;
-static char *locale_ctype;
 static char *server_encoding_string;
 static char *server_version_string;
 static int	server_version_num;
@@ -4050,30 +4048,6 @@ struct config_string ConfigureNamesString[] =
 		NULL, NULL, NULL
 	},
 
-	/* See main.c about why defaults for LC_foo are not all alike */
-
-	{
-		{"lc_collate", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows the collation order locale."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&locale_collate,
-		"C",
-		NULL, NULL, NULL
-	},
-
-	{
-		{"lc_ctype", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows the character classification and case conversion locale."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&locale_ctype,
-		"C",
-		NULL, NULL, NULL
-	},
-
 	{
 		{"lc_messages", PGC_SUSET, CLIENT_CONN_LOCALE,
 			gettext_noop("Sets the language in which messages are displayed."),
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..21840815c9 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1023,7 +1023,7 @@ SET client_min_messages TO WARNING;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT daticulocale FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
@@ -1031,7 +1031,7 @@ ERROR:  collation "test0" already exists
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT daticulocale FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 RESET client_min_messages;
diff --git a/src/test/regress/expected/collate.linux.utf8.out b/src/test/regress/expected/collate.linux.utf8.out
index 6d34667ceb..01664f7c1b 100644
--- a/src/test/regress/expected/collate.linux.utf8.out
+++ b/src/test/regress/expected/collate.linux.utf8.out
@@ -1027,7 +1027,7 @@ CREATE SCHEMA test_schema;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
@@ -1039,9 +1039,9 @@ NOTICE:  collation "test0" for encoding "UTF8" already exists, skipping
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) ||
           ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+          quote_literal((SELECT datctype FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
diff --git a/src/test/regress/expected/collate.windows.win1252.out b/src/test/regress/expected/collate.windows.win1252.out
index 61b421161f..b7b93959de 100644
--- a/src/test/regress/expected/collate.windows.win1252.out
+++ b/src/test/regress/expected/collate.windows.win1252.out
@@ -863,7 +863,7 @@ CREATE SCHEMA test_schema;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
@@ -875,9 +875,9 @@ NOTICE:  collation "test0" for encoding "WIN1252" already exists, skipping
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) ||
           ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+          quote_literal((SELECT datctype FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..c9c2ab8fa6 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -362,14 +362,14 @@ CREATE SCHEMA test_schema;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (provider = icu, locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT daticulocale FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (provider = icu, locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT daticulocale FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 
diff --git a/src/test/regress/sql/collate.linux.utf8.sql b/src/test/regress/sql/collate.linux.utf8.sql
index 2b787507c5..132d13af0a 100644
--- a/src/test/regress/sql/collate.linux.utf8.sql
+++ b/src/test/regress/sql/collate.linux.utf8.sql
@@ -359,7 +359,7 @@ CREATE SCHEMA test_schema;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
@@ -368,9 +368,9 @@ CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) ||
           ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+          quote_literal((SELECT datctype FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
diff --git a/src/test/regress/sql/collate.windows.win1252.sql b/src/test/regress/sql/collate.windows.win1252.sql
index b5c45e1810..353d769a5b 100644
--- a/src/test/regress/sql/collate.windows.win1252.sql
+++ b/src/test/regress/sql/collate.windows.win1252.sql
@@ -310,7 +310,7 @@ CREATE SCHEMA test_schema;
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test0 (locale = ' ||
-          quote_literal(current_setting('lc_collate')) || ');';
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
@@ -319,9 +319,9 @@ CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
-          quote_literal(current_setting('lc_collate')) ||
+          quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) ||
           ', lc_ctype = ' ||
-          quote_literal(current_setting('lc_ctype')) || ');';
+          quote_literal((SELECT datctype FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-- 
2.40.0

#48

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#46)

Re: Order changes in PG16 since ICU introduction

On 09.05.23 17:09, Jeff Davis wrote:

It's awkward for a user to read pg_database.datlocprovider, then
depending on that, either look in datcollate or daticulocale. (It's
awkward in the code, too.)

Maybe some built-in function that returns a tuple of the default
provider, the locale, and the version? Or should we also output the
ctype somehow (which affects the results of upper()/lower())?

There is also the deterministic flag and the icurules setting.
Depending on what level of detail you imagine the user needs, you really
do need to look at the whole picture, not some subset of it.

#49

Bug:
/messages/by-id/051c9395cf880307865ee8b17acdbf7f838c1e39.camel@j-davis.com

pgsql@j-davis.com

over 2 years ago

In reply to: Jeff Davis (#41)

7 attachment(s)

Re: Order changes in PG16 since ICU introduction

New patch series attached.

=== 0001: fix bug that allows creating hidden collations

=== 0002: handle some kinds of libc-stlye locale strings

ICU used to handle libc locale strings like 'fr_FR@euro', but doesn't
in later versions. Handle them in postgres for consistency.

=== 0003: reduce icu_validation_level to WARNING

Given that we've seen some inconsistency in which locale names are
accepted in different ICU versions, it seems best not to be too strict.
Peter Eisentraut suggested that it be set to ERROR originally, but a
WARNING should be sufficient to see problems without introducing risks
migrating to version 16.

I don't expect objections to 0003, so I may commit this soon, but I'll
give it a little time in case someone has an opinion.

=== 0004-0006:

To solve the issues that have come up in this thread, we need CREATE
DATABASE (and createdb and initdb) to use LOCALE to mean the collation
locale regardless of which provider is in use (which is what 0006
does).

0006 depends on ICU handling libc locale names. It already does a good
job for most libc locale names (though patch 0002 fixes a few cases
where it doesn't). There may be more cases, but for the most part libc
names are interpreted in a reasonable way. But one important case is
missing: ICU does not handle the "C" locale as we expect (that is,
using memcmp()).

We've already allowed users to create ICU collations with the C locale
in the past, which uses the root collation (not memcmp()), and we need
to keep supporting that for upgraded clusters. So that leaves us with a
catalog representation problem. I mentioned upthread that we can solve
that by:

1. Using iculocale=NULL to mean "C-as-in-memcmp", or having some
other catalog hack (like another field). That's not desirable because
the catalog representation is already complex and it may be hard for
users to tell what's happening.

2. When provider=icu and locale=C, switch to provider=libc locale=C.
This is very messy, because currently the syntax allows specifying a
database with LOCALE_PROVIDER='icu' ICU_LOCALE='C' LC_COLLATE='en_US' -
- if the provider gets changed to libc, what would we set datcollate
to? I don't think this is workable without some breakage. We can't
simply override datcollate to be C in that case, because there are some
things other than the default collation that might need it set to en_US
as the user specified.

3. Introduce collation provider "none", which is always memcmp-based
(patch 0004). It's equivalent to the libc locale=C, but it allows
specifying the LC_COLLATE and LC_CTYPE independently. A command like
CREATE DATABASE ... LOCALE_PROVIDER='icu' ICU_LOCALE='C'
LC_COLLATE='en_US' would get changed (with a NOTICE) to provider "none"
(patch 0005), so you'd have datlocprovider=none, datcollate=en_US. For
the database default collation, that would always use memcmp(), but the
server environment LC_COLLATE would be set to en_US as the user
specified.

For this patch series, I chose approach #3. I think it works out nicely
-- it provides a better place to document the "no locale" behavior
(including a warning that it depends on the database encoding), and I
think it's more clear to the user that locale=C is not actually using a
provider at all. It's more invasive, but feels like a better solution.
If others don't like it I can implement approach #1 instead.

=== 0007: Add a GUC to control the default collation provider

Having a GUC would make it easier to migrate to ICU without surprises.
This only affects the default for CREATE COLLATION, not CREATE DATABASE
(and obviously not initdb).

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

v5-0001-For-user-defined-collations-never-set-collencodin.patchtext/x-patch; charset=UTF-8; name=v5-0001-For-user-defined-collations-never-set-collencodin.patchDownload

From fc66f02976bb11b629bcf71346c2858eccbcf1a3 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 11 May 2023 10:36:04 -0700
Subject: [PATCH v5 1/7] For user-defined collations, never set
 collencoding=-1.

For new user-defined collations, always set collencoding to the
current database encoding so that it is never shadowed by a built-in
collation.

Built in collations that work with any encoding may have
collencoding=-1, and if a user defines a collation with the same name,
it will shadow the built-in collation.

Previously it was possible to create an ICU collation (which was
assigned collencoding=-1) that was shadowed by a built-in collation
and completely inaccessible.
---
 src/backend/commands/collationcmds.c          | 28 +++++++++++++------
 .../regress/expected/collate.icu.utf8.out     |  2 +-
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..a53700256b 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -302,16 +302,29 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("ICU rules cannot be specified unless locale provider is ICU")));
 
+		/*
+		 * The collencoding is used to hide built-in collations that are
+		 * incompatible with the current database encoding, allowing users to
+		 * define a compatible collation with the same name if
+		 * desired. Built-in collations that work with any encoding have
+		 * collencoding=-1.
+		 *
+		 * A collation that's a match to the current database encoding will
+		 * shadow a collation with the same name and collencoding=-1. We never
+		 * want a user-created collation to be shadowed by a built-in
+		 * collation, so for user-created collations, always set collencoding
+		 * to the current database encoding.
+		 */
+		collencoding = GetDatabaseEncoding();
+
 		if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
 			/*
-			 * We could create ICU collations with collencoding == database
-			 * encoding, but it seems better to use -1 so that it matches the
-			 * way initdb would create ICU collations.  However, only allow
-			 * one to be created when the current database's encoding is
-			 * supported.  Otherwise the collation is useless, plus we get
-			 * surprising behaviors like not being able to drop the collation.
+			 * Only allow an ICU collation to be created when the current
+			 * database's encoding is supported.  Otherwise the collation is
+			 * useless, plus we get surprising behaviors like not being able
+			 * to drop the collation.
 			 *
 			 * Skip this test when !USE_ICU, because the error we want to
 			 * throw for that isn't thrown till later.
@@ -321,11 +334,10 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						 errmsg("current database's encoding is not supported with this provider")));
 #endif
-			collencoding = -1;
 		}
 		else
 		{
-			collencoding = GetDatabaseEncoding();
+			Assert(collprovider == COLLPROVIDER_LIBC);
 			check_encoding_locale_matches(collencoding, collcollate, collctype);
 		}
 	}
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..9c9e1e4f48 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1062,7 +1062,7 @@ SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
 
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
-ERROR:  collation "test11" already exists in schema "collate_tests"
+ERROR:  collation "test11" for encoding "UTF8" already exists in schema "collate_tests"
 ALTER COLLATION test1 RENAME TO test22; -- fail
 ERROR:  collation "test1" for encoding "UTF8" does not exist
 ALTER COLLATION test11 OWNER TO regress_test_role;
-- 
2.34.1

v5-0002-ICU-fix-up-old-libc-style-locale-strings.patchtext/x-patch; charset=UTF-8; name=v5-0002-ICU-fix-up-old-libc-style-locale-strings.patchDownload

From 25824dc213272c739eecd16b17a3458fc5f81339 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 28 Apr 2023 12:22:41 -0700
Subject: [PATCH v5 2/7] ICU: fix up old libc-style locale strings.

Before transforming a locale string into a language tag, fix up old
libc-style locale strings such as 'fr_FR@euro'. Older ICU versions did
this automatically, but ICU version 64 removed that support.

Discussion: https://postgr.es/m/654a49f7ff7461bcf47be4181430678d45f93858.camel%40j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 ++++++++++++++++-
 src/bin/initdb/initdb.c                       | 63 ++++++++++++++++++-
 .../regress/expected/collate.icu.utf8.out     | 11 ++++
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +++
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f0b6567da1..e7b166461b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2766,6 +2766,60 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 	pfree(lower_str);
 }
 
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = palloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pstrdup(loc_str);
+}
+
 #endif
 
 /*
@@ -2782,6 +2836,7 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2814,7 +2869,7 @@ icu_language_tag(const char *loc_str, int elevel)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2834,6 +2889,8 @@ icu_language_tag(const char *loc_str, int elevel)
 		break;
 	}
 
+	pfree(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pfree(langtag);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..2b5cc30955 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2229,6 +2229,64 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+#ifdef USE_ICU
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = pg_malloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pg_strdup(loc_str);
+}
+
+#endif
+
 /*
  * Convert to canonical BCP47 language tag. Must be consistent with
  * icu_language_tag().
@@ -2238,6 +2296,7 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2268,7 +2327,7 @@ icu_language_tag(const char *loc_str)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2287,6 +2346,8 @@ icu_language_tag(const char *loc_str)
 		break;
 	}
 
+	pg_free(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pg_free(langtag);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 9c9e1e4f48..e0f11e3cd4 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1042,13 +1042,24 @@ ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
+ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..8d5423bc17 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -378,11 +378,18 @@ RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
 
-- 
2.34.1

v5-0003-Reduce-icu_validation_level-default-to-WARNING.patchtext/x-patch; charset=UTF-8; name=v5-0003-Reduce-icu_validation_level-default-to-WARNING.patchDownload

From cd839f069cc09a71788bafa28730e4caf8f9d768 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 10 May 2023 10:47:16 -0700
Subject: [PATCH v5 3/7] Reduce icu_validation_level default to WARNING.

---
 doc/src/sgml/config.sgml                       | 2 +-
 src/backend/utils/adt/pg_locale.c              | 2 +-
 src/backend/utils/misc/guc_tables.c            | 2 +-
 src/backend/utils/misc/postgresql.conf.sample  | 2 +-
 src/test/regress/expected/collate.icu.utf8.out | 4 ++--
 src/test/regress/sql/collate.icu.utf8.sql      | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b56f073a91..c4a9dcb9ae 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9840,7 +9840,7 @@ SET XML OPTION { DOCUMENT | CONTENT };
        <para>
         If set to <literal>DISABLED</literal>, does not report validation
         problems at all. Otherwise reports problems at the given message
-        level. The default is <literal>ERROR</literal>.
+        level. The default is <literal>WARNING</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index e7b166461b..bb4a8d84f6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -96,7 +96,7 @@ char	   *locale_monetary;
 char	   *locale_numeric;
 char	   *locale_time;
 
-int			icu_validation_level = ERROR;
+int			icu_validation_level = WARNING;
 
 /*
  * lc_time localization cache.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 2f42cebaf6..8c843f4ab6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -4689,7 +4689,7 @@ struct config_enum ConfigureNamesEnum[] =
 		 NULL
 		},
 		&icu_validation_level,
-		ERROR, icu_validation_level_options,
+		WARNING, icu_validation_level_options,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b70c66ca87..87bad8ecbf 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -734,7 +734,7 @@
 #lc_numeric = 'C'			# locale for number formatting
 #lc_time = 'C'				# locale for time formatting
 
-#icu_validation_level = ERROR		# report ICU locale validation
+#icu_validation_level = WARNING		# report ICU locale validation
 					# errors at the given level
 
 # default configuration for text search
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index e0f11e3cd4..12afc3b65a 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1037,6 +1037,7 @@ $$;
 RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
+SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
@@ -1044,7 +1045,7 @@ CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-SET icu_validation_level = WARNING;
+RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
@@ -1052,7 +1053,6 @@ WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-RESET icu_validation_level;
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
 NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 8d5423bc17..655c965f46 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -376,14 +376,14 @@ $$;
 RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
+SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
-SET icu_validation_level = WARNING;
+RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
-RESET icu_validation_level;
 
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
-- 
2.34.1

v5-0004-Introduce-collation-provider-none.patchtext/x-patch; charset=UTF-8; name=v5-0004-Introduce-collation-provider-none.patchDownload

From a13a15988ab2e991e42569b8b1e0cd1d6e940baf Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 1 May 2023 15:38:29 -0700
Subject: [PATCH v5 4/7] Introduce collation provider "none".

Provides locale-unaware semantics that are implemented as fast byte
operations in Postgres, independent of the operating system or any
provider libraries.

Equivalent (in semantics and implementation) to the libc provider with
locale "C", except that LC_COLLATE and LC_CTYPE can be set
independently.

Use provider "none" for built-in collation "ucs_basic" instead of
libc.
---
 doc/src/sgml/charset.sgml              | 87 +++++++++++++++++++++-----
 doc/src/sgml/ref/create_collation.sgml |  2 +-
 doc/src/sgml/ref/create_database.sgml  |  2 +-
 doc/src/sgml/ref/createdb.sgml         |  2 +-
 doc/src/sgml/ref/initdb.sgml           |  2 +-
 src/backend/catalog/pg_collation.c     |  7 ++-
 src/backend/commands/collationcmds.c   | 84 ++++++++++++++++++++-----
 src/backend/commands/dbcommands.c      | 69 +++++++++++++++++---
 src/backend/utils/adt/pg_locale.c      | 27 +++++++-
 src/backend/utils/init/postinit.c      | 10 ++-
 src/bin/initdb/initdb.c                | 33 +++++++++-
 src/bin/initdb/t/001_initdb.pl         | 29 +++++++++
 src/bin/pg_dump/pg_dump.c              |  8 ++-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 18 +++++-
 src/bin/psql/describe.c                |  2 +-
 src/bin/scripts/createdb.c             |  2 +-
 src/bin/scripts/t/020_createdb.pl      | 29 +++++++++
 src/include/catalog/pg_collation.dat   |  3 +-
 src/include/catalog/pg_collation.h     |  3 +
 src/test/regress/expected/collate.out  | 10 ++-
 src/test/regress/sql/collate.sql       |  6 ++
 21 files changed, 372 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 6dd95b8966..de7c65ae35 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
    <title>Locale Providers</title>
 
    <para>
-    <productname>PostgreSQL</productname> supports multiple <firstterm>locale
-    providers</firstterm>.  This specifies which library supplies the locale
-    data.  One standard provider name is <literal>libc</literal>, which uses
-    the locales provided by the operating system C library.  These are the
-    locales used by most tools provided by the operating system.  Another
-    provider is <literal>icu</literal>, which uses the external
-    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can
-    only be used if support for ICU was configured when PostgreSQL was built.
+    A locale provider specifies which library defines the locale behavior for
+    collations and character classifications.
    </para>
 
    <para>
     The commands and tools that select the locale settings, as described
-    above, each have an option to select the locale provider.  The examples
-    shown earlier all use the <literal>libc</literal> provider, which is the
-    default.  Here is an example to initialize a database cluster using the
-    ICU provider:
+    above, each have an option to select the locale provider. Here is an
+    example to initialize a database cluster using the ICU provider:
 <programlisting>
 initdb --locale-provider=icu --icu-locale=en
 </programlisting>
@@ -370,12 +362,73 @@ initdb --locale-provider=icu --icu-locale=en
    </para>
 
    <para>
-    Which locale provider to use depends on individual requirements.  For most
-    basic uses, either provider will give adequate results.  For the libc
-    provider, it depends on what the operating system offers; some operating
-    systems are better than others.  For advanced uses, ICU offers more locale
-    variants and customization options.
+    Regardless of the locale provider, the operating system is still used to
+    provide some locale-aware behavior, such as messages (see <xref
+    linkend="guc-lc-messages"/>).
    </para>
+
+   <para>
+    The available locale providers are listed below.
+   </para>
+
+   <sect3 id="locale-provider-none">
+    <title>None</title>
+    <para>
+     The <literal>none</literal> provider uses simple built-in operations
+     which are not locale-aware.
+    </para>
+    <para>
+     The collation and character classification behavior is equivalent to
+     using the <literal>libc</literal> provider with locale
+     <literal>C</literal>, except that <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently.
+    </para>
+    <note>
+     <para>
+      When using the <literal>none</literal> locale provider, behavior may
+      depend on the database encoding.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-icu">
+    <title>ICU</title>
+    <para>
+     The <literal>icu</literal> provider uses the external
+     ICU<indexterm><primary>ICU</primary></indexterm>
+     library. <productname>PostgreSQL</productname> must have been configured
+     with support.
+    </para>
+    <para>
+     ICU provides collation and character classification behavior that is
+     independent of the operating system and database encoding, which is
+     preferable if you expect to transition to other platforms without any
+     change in results. <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
+    </para>
+    <note>
+     <para>
+      For the ICU provider, results may depend on the version of the ICU
+      library used, as it is updated to reflect changes in natural language
+      over time.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-libc">
+    <title>libc</title>
+    <para>
+     The <literal>libc</literal> provider uses the operating system's C
+     library. The collation and character classification behavior is
+     controlled by the settings <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal>, so they cannot be set independently.
+    </para>
+    <note>
+     <para>
+      The same locale name may have different behavior on different platforms
+      when using the libc provider.
+     </para>
+    </note>
+   </sect3>
+
   </sect2>
 
   <sect2 id="locale-problems">
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..5489ae7413 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -120,7 +120,7 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
      <listitem>
       <para>
        Specifies the provider to use for locale services associated with this
-       collation.  Possible values are
+       collation.  Possible values are <literal>none</literal>,
        <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
        (if the server was built with ICU support) or <literal>libc</literal>.
        <literal>libc</literal> is the default.  See <xref
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..60b9da0952 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -212,7 +212,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <listitem>
        <para>
         Specifies the provider to use for the default collation in this
-        database.  Possible values are
+        database.  Possible values are <literal>none</literal>,
         <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
         (if the server was built with ICU support) or <literal>libc</literal>.
         By default, the provider is the same as that of the <xref
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..326a371d34 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -168,7 +168,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         Specifies the locale provider for the database's default collation.
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..e604ab48b7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -323,7 +323,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry id="app-initdb-option-locale-provider">
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         This option sets the locale provider for databases created in the new
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index fd022e6fc2..86b6ba2375 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -68,7 +68,12 @@ CollationCreate(const char *collname, Oid collnamespace,
 	Assert(collname);
 	Assert(collnamespace);
 	Assert(collowner);
-	Assert((collcollate && collctype) || colliculocale);
+	Assert((collprovider == COLLPROVIDER_NONE &&
+			!collcollate && !collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_LIBC &&
+			 collcollate &&  collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_ICU &&
+			!collcollate && !collctype &&  colliculocale));
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index a53700256b..267a551818 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -215,7 +215,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 
 		if (collproviderstr)
 		{
-			if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			if (pg_strcasecmp(collproviderstr, "none") == 0)
+				collprovider = COLLPROVIDER_NONE;
+			else if (pg_strcasecmp(collproviderstr, "icu") == 0)
 				collprovider = COLLPROVIDER_ICU;
 			else if (pg_strcasecmp(collproviderstr, "libc") == 0)
 				collprovider = COLLPROVIDER_LIBC;
@@ -228,6 +230,13 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collprovider = COLLPROVIDER_LIBC;
 
+		if (collprovider == COLLPROVIDER_NONE
+			&& (localeEl || lccollateEl || lcctypeEl))
+		{
+			ereport(ERROR,
+					(errmsg("collation provider \"none\" does not support LOCALE, LC_COLLATE, or LC_CTYPE")));
+		}
+
 		if (localeEl)
 		{
 			if (collprovider == COLLPROVIDER_LIBC)
@@ -317,7 +326,15 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		 */
 		collencoding = GetDatabaseEncoding();
 
-		if (collprovider == COLLPROVIDER_ICU)
+		if (collprovider == COLLPROVIDER_NONE)
+		{
+			/*
+			 * The "none" provider works with all encodings, so no checking is
+			 * required. NB: the behavior may be different for different
+			 * encodings, though.
+			 */
+		}
+		else if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
 			/*
@@ -343,7 +360,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
+	{
+		char *locale;
+
+		if (collprovider == COLLPROVIDER_ICU)
+			locale = colliculocale;
+		else if (collprovider == COLLPROVIDER_LIBC)
+			locale = collcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		collversion = get_collation_actual_version(collprovider, locale);
+	}
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -418,6 +446,7 @@ AlterCollation(AlterCollationStmt *stmt)
 	Form_pg_collation collForm;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 	ObjectAddress address;
@@ -442,8 +471,20 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttrNotNull(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
-	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	if (collForm->collprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_colliculocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (collForm->collprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_collcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(collForm->collprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -506,11 +547,18 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
 
-		datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_daticulocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(dbtup);
 	}
@@ -526,11 +574,19 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
 		Assert(provider != COLLPROVIDER_DEFAULT);
-		datum = SysCacheGetAttrNotNull(COLLOID, colltp,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colliculocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(colltp);
 	}
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..9e73f54803 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -909,7 +909,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
 
-		if (pg_strcasecmp(locproviderstr, "icu") == 0)
+		if (pg_strcasecmp(locproviderstr, "none") == 0)
+			dblocprovider = COLLPROVIDER_NONE;
+		else if (pg_strcasecmp(locproviderstr, "icu") == 0)
 			dblocprovider = COLLPROVIDER_ICU;
 		else if (pg_strcasecmp(locproviderstr, "libc") == 0)
 			dblocprovider = COLLPROVIDER_LIBC;
@@ -1177,9 +1179,17 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 */
 	if (src_collversion && !dcollversion)
 	{
-		char	   *actual_versionstr;
+		char	*actual_versionstr;
+		char	*locale;
 
-		actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dblocprovider, locale);
 		if (!actual_versionstr)
 			ereport(ERROR,
 					(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
@@ -1207,7 +1217,18 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * collation version, which is normally only the case for template0.
 	 */
 	if (dbcollversion == NULL)
-		dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+	{
+		char *locale;
+
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		dbcollversion = get_collation_actual_version(dblocprovider, locale);
+	}
 
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
@@ -2403,6 +2424,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	ObjectAddress address;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 
@@ -2429,10 +2451,24 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
-	if (isnull)
-		elog(ERROR, "unexpected null in pg_database");
-	newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
+	if (datForm->datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_daticulocale, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datForm->datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(datForm->datlocprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -2617,6 +2653,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 	HeapTuple	tp;
 	char		datlocprovider;
 	Datum		datum;
+	char	   *locale;
 	char	   *version;
 
 	tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
@@ -2627,8 +2664,20 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 
 	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
 
-	datum = SysCacheGetAttrNotNull(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-	version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
+	if (datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_daticulocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	version = get_collation_actual_version(datlocprovider, locale);
 
 	ReleaseSysCache(tp);
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index bb4a8d84f6..5ac5036f05 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1228,7 +1228,12 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			elog(ERROR, "cache lookup failed for collation %u", collation);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			cache_entry->collate_is_c = true;
+			cache_entry->ctype_is_c = true;
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 			Datum		datum;
 			const char *collcollate;
@@ -1281,6 +1286,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1334,6 +1342,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1487,8 +1498,10 @@ pg_newlocale_from_collation(Oid collid)
 	{
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return &default_locale;
-		else
+		else if (default_locale.provider == COLLPROVIDER_LIBC)
 			return (pg_locale_t) 0;
+		else
+			elog(ERROR, "cannot open collation with provider \"none\"");
 	}
 
 	cache_entry = lookup_collation_cache(collid, false);
@@ -1513,7 +1526,11 @@ pg_newlocale_from_collation(Oid collid)
 		result.provider = collform->collprovider;
 		result.deterministic = collform->collisdeterministic;
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			elog(ERROR, "cannot open collation with provider \"none\"");
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
 			const char *collcollate;
@@ -1599,6 +1616,7 @@ pg_newlocale_from_collation(Oid collid)
 
 			collversionstr = TextDatumGetCString(datum);
 
+			Assert(collform->collprovider != COLLPROVIDER_NONE);
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
 			actual_versionstr = get_collation_actual_version(collform->collprovider,
@@ -1650,6 +1668,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_NONE)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 53420f4974..8053642fd3 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -461,10 +461,18 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	{
 		char	   *actual_versionstr;
 		char	   *collversionstr;
+		char	   *locale;
 
 		collversionstr = TextDatumGetCString(datum);
 
-		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate);
+		if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			locale = iculocale;
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			locale = collate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, locale);
 		if (!actual_versionstr)
 			/* should not happen */
 			elog(WARNING,
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2b5cc30955..4cf6892bee 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2469,6 +2469,22 @@ setlocales(void)
 
 	/* set empty lc_* values to locale config if set */
 
+	if (locale_provider == COLLPROVIDER_NONE)
+	{
+		if (!lc_ctype)
+			lc_ctype = "C";
+		if (!lc_collate)
+			lc_collate = "C";
+		if (!lc_numeric)
+			lc_numeric = "C";
+		if (!lc_time)
+			lc_time = "C";
+		if (!lc_monetary)
+			lc_monetary = "C";
+		if (!lc_messages)
+			lc_messages = "C";
+	}
+
 	if (locale)
 	{
 		if (!lc_ctype)
@@ -2563,7 +2579,7 @@ usage(const char *progname)
 			 "                            set default locale in the respective category for\n"
 			 "                            new databases (default taken from environment)\n"));
 	printf(_("      --no-locale           equivalent to --locale=C\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                            set default locale provider for new databases\n"));
 	printf(_("      --pwfile=FILE         read password for the new superuser from file\n"));
 	printf(_("  -T, --text-search-config=CFG\n"
@@ -2713,7 +2729,15 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (locale_provider == COLLPROVIDER_LIBC &&
+	if (locale_provider == COLLPROVIDER_NONE &&
+		strcmp(lc_ctype, "C") == 0 &&
+		strcmp(lc_collate, "C") == 0 &&
+		strcmp(lc_time, "C") == 0 &&
+		strcmp(lc_numeric, "C") == 0 &&
+		strcmp(lc_monetary, "C") == 0 &&
+		strcmp(lc_messages, "C") == 0)
+		printf(_("The database cluster will be initialized with no locale.\n"));
+	else if (locale_provider == COLLPROVIDER_LIBC &&
 		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
@@ -3387,7 +3411,9 @@ main(int argc, char *argv[])
 										 "-c debug_discard_caches=1");
 				break;
 			case 15:
-				if (strcmp(optarg, "icu") == 0)
+				if (strcmp(optarg, "none") == 0)
+					locale_provider = COLLPROVIDER_NONE;
+				else if (strcmp(optarg, "icu") == 0)
 					locale_provider = COLLPROVIDER_ICU;
 				else if (strcmp(optarg, "libc") == 0)
 					locale_provider = COLLPROVIDER_LIBC;
@@ -3426,6 +3452,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+
 	if (icu_locale && locale_provider != COLLPROVIDER_ICU)
 		pg_fatal("%s cannot be specified unless locale provider \"%s\" is chosen",
 				 "--icu-locale", "icu");
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 17a444d80c..fe6d224e5b 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -154,6 +154,35 @@ else
 		'locale provider ICU fails since no ICU support');
 }
 
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', "$tempdir/data6" ],
+	'locale provider none');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--locale=C',
+	  "$tempdir/data7" ],
+	'locale provider none with --locale');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-collate=C',
+	  "$tempdir/data8" ],
+	'locale provider none with --lc-collate');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-ctype=C',
+	  "$tempdir/data9" ],
+	'locale provider none with --lc-ctype');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-locale=en',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU locale');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-rules=""',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU rules');
+
 command_fails(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
 	'fails for invalid locale provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41a51ec5cd..be6580ab3c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3070,7 +3070,9 @@ dumpDatabase(Archive *fout)
 	}
 
 	appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
-	if (datlocprovider[0] == 'c')
+	if (datlocprovider[0] == 'n')
+		appendPQExpBufferStr(creaQry, "none");
+	else if (datlocprovider[0] == 'c')
 		appendPQExpBufferStr(creaQry, "libc");
 	else if (datlocprovider[0] == 'i')
 		appendPQExpBufferStr(creaQry, "icu");
@@ -13446,7 +13448,9 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 					  fmtQualifiedDumpable(collinfo));
 
 	appendPQExpBufferStr(q, "provider = ");
-	if (collprovider[0] == 'c')
+	if (collprovider[0] == 'n')
+		appendPQExpBufferStr(q, "none");
+	else if (collprovider[0] == 'c')
 		appendPQExpBufferStr(q, "libc");
 	else if (collprovider[0] == 'i')
 		appendPQExpBufferStr(q, "icu");
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 4a7895a756..6d58f6103e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -114,12 +114,20 @@ my $original_locale = "C";
 my $original_iculocale = "";
 my $provider_field = "'c' AS datlocprovider";
 my $iculocale_field = "NULL AS daticulocale";
-if ($oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes')
+if ($oldnode->pg_version >= 15)
 {
 	$provider_field = "datlocprovider";
 	$iculocale_field = "daticulocale";
-	$original_provider = "i";
-	$original_iculocale = "fr-CA";
+
+	if ($ENV{with_icu} eq 'yes')
+	{
+		$original_provider = "i";
+		$original_iculocale = "fr-CA";
+	}
+	else
+	{
+		$original_provider = "n";
+	}
 }
 
 my @initdb_params = @custom_opts;
@@ -131,6 +139,10 @@ if ($original_provider eq "i")
 	push @initdb_params, ('--locale-provider', 'icu');
 	push @initdb_params, ('--icu-locale', 'fr-CA');
 }
+elsif ($original_provider eq "n")
+{
+	push @initdb_params, ('--locale-provider', 'none');
+}
 
 $node_params{extra} = \@initdb_params;
 $oldnode->init(%node_params);
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 058e41e749..16e726b784 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -932,7 +932,7 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  "  CASE d.datlocprovider WHEN 'n' THEN 'none' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..79367d933b 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -299,7 +299,7 @@ help(const char *progname)
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
 	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
 	printf(_("      --icu-rules=RULES        ICU rules setting for the database\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
 	printf(_("  -S, --strategy=STRATEGY      database creation strategy wal_log or file_copy\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..5aa658b671 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -83,6 +83,35 @@ else
 		'create database with ICU fails since no ICU support');
 }
 
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', 'testnone1' ],
+	'create database with provider "none"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--locale=C',
+	  'testnone2' ],
+	'create database with provider "none" and locale "C"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-collate=C',
+	  'testnone3' ],
+	'create database with provider "none" and LC_COLLATE=C');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-ctype=C',
+	  'testnone4' ],
+	'create database with provider "none" and LC_CTYPE=C');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-locale=en',
+	  'testnone5' ],
+	'create database with provider "none" and ICU_LOCALE="en"');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-rules=""',
+	  'testnone6' ],
+	'create database with provider "none" and ICU_RULES=""');
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index b6a69d1d42..40d62416ea 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -24,8 +24,7 @@
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
 { oid => '962', descr => 'sorts by Unicode code point',
-  collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
-  collcollate => 'C', collctype => 'C' },
+  collname => 'ucs_basic', collprovider => 'n', collencoding => '6' },
 { oid => '963',
   descr => 'sorts using the Unicode Collation Algorithm with default settings',
   collname => 'unicode', collprovider => 'i', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index bfa3568451..29be3f8d94 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -64,6 +64,7 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on
 
 #ifdef EXPOSE_TO_CLIENT_CODE
 
+#define COLLPROVIDER_NONE		'n'
 #define COLLPROVIDER_DEFAULT	'd'
 #define COLLPROVIDER_ICU		'i'
 #define COLLPROVIDER_LIBC		'c'
@@ -73,6 +74,8 @@ collprovider_name(char c)
 {
 	switch (c)
 	{
+		case COLLPROVIDER_NONE:
+			return "none";
 		case COLLPROVIDER_ICU:
 			return "icu";
 		case COLLPROVIDER_LIBC:
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0649564485..b7603c9f6c 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -650,6 +650,13 @@ EXPLAIN (COSTS OFF)
 (3 rows)
 
 -- CREATE/DROP COLLATION
+CREATE COLLATION none ( PROVIDER = none );
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
@@ -754,7 +761,7 @@ DETAIL:  FROM cannot be specified together with any other options.
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 19 other objects
+NOTICE:  drop cascades to 20 other objects
 DETAIL:  drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
@@ -771,6 +778,7 @@ drop cascades to function dup(anyelement)
 drop cascades to table collate_test20
 drop cascades to table collate_test21
 drop cascades to table collate_test22
+drop cascades to collation "none"
 drop cascades to collation mycoll2
 drop cascades to table collate_test23
 drop cascades to view collate_on_int
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index c3d40fc195..e2dceb8dff 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -244,6 +244,12 @@ EXPLAIN (COSTS OFF)
 
 -- CREATE/DROP COLLATION
 
+CREATE COLLATION none ( PROVIDER = none );
+
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
-- 
2.34.1

v5-0005-ICU-for-locale-C-automatically-use-none-provider-.patchtext/x-patch; charset=UTF-8; name=v5-0005-ICU-for-locale-C-automatically-use-none-provider-.patchDownload

From 23e85920dbcfd1d3e71041f92c4adea589acd4f2 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 8 May 2023 13:48:01 -0700
Subject: [PATCH v5 5/7] ICU: for locale "C", automatically use "none" provider
 instead.

Postgres expects locale C to be optimizable to simple locale-unaware
byte operations; while ICU does not recognize the locale "C" at all,
and falls back to the root locale.

If the user specifies locale "C" when creating a new collation or a
new database with the ICU provider, automatically switch it to the
"none" provider.

If provider is libc, behavior is unchanged.
---
 doc/src/sgml/charset.sgml                     |  6 +++
 doc/src/sgml/ref/create_collation.sgml        |  6 +++
 doc/src/sgml/ref/create_database.sgml         |  5 +++
 doc/src/sgml/ref/createdb.sgml                |  5 +++
 doc/src/sgml/ref/initdb.sgml                  |  5 +++
 src/backend/commands/collationcmds.c          | 17 ++++++++
 src/backend/commands/dbcommands.c             | 21 ++++++++++
 src/bin/initdb/initdb.c                       | 10 +++++
 src/bin/initdb/t/001_initdb.pl                | 39 +++++++++++++++++++
 src/bin/scripts/createdb.c                    | 11 ++++++
 src/bin/scripts/t/020_createdb.pl             | 12 ++++++
 .../regress/expected/collate.icu.utf8.out     | 12 ++++--
 src/test/regress/sql/collate.icu.utf8.sql     |  3 ++
 13 files changed, 149 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index de7c65ae35..5c4f713e8b 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -405,6 +405,12 @@ initdb --locale-provider=icu --icu-locale=en
      change in results. <literal>LC_COLLATE</literal> and
      <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
     </para>
+    <para>
+     The ICU provider does not accept the <literal>C</literal>
+     locale. Commands that create collations or database with the
+     <literal>icu</literal> provider and ICU locale <literal>C</literal> use
+     the provider <literal>none</literal> instead.
+    </para>
     <note>
      <para>
       For the ICU provider, results may depend on the version of the ICU
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 5489ae7413..1ac41831d8 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -126,6 +126,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
        <literal>libc</literal> is the default.  See <xref
        linkend="locale-providers"/> for details.
       </para>
+      <para>
+       If the provider is <literal>icu</literal> and the locale is
+       <literal>C</literal> or <literal>POSIX</literal>, the provider is
+       automatically set to <literal>none</literal>; as the ICU provider
+       doesn't support an ICU locale of <literal>C</literal>.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 60b9da0952..c730d02e15 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -190,6 +190,11 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
        <para>
         Specifies the ICU locale ID if the ICU locale provider is used.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 326a371d34..7c573e848a 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -154,6 +154,11 @@ PostgreSQL documentation
         Specifies the ICU locale ID to be used in this database, if the
         ICU locale provider is selected.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index e604ab48b7..76993acdfe 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -250,6 +250,11 @@ PostgreSQL documentation
         Specifies the ICU locale when the ICU provider is used. Locale support
         is described in <xref linkend="locale"/>.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
        <para>
         If this option is not specified, the locale is inherited from the
         environment in which <command>initdb</command> runs. The environment's
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 267a551818..ed64e17504 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -254,6 +254,23 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		if (lcctypeEl)
 			collctype = defGetString(lcctypeEl);
 
+		/*
+		 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+		 * optimizable to byte operations (memcmp(), pg_ascii_tolower(),
+		 * etc.); transform into the "none" provider. Don't transform during
+		 * binary upgrade.
+		 */
+		if (!IsBinaryUpgrade && collprovider == COLLPROVIDER_ICU &&
+			colliculocale && (pg_strcasecmp(colliculocale, "C") == 0 ||
+							  pg_strcasecmp(colliculocale, "POSIX") == 0))
+		{
+			ereport(NOTICE,
+					(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+							colliculocale)));
+			colliculocale = NULL;
+			collprovider = COLLPROVIDER_NONE;
+		}
+
 		if (collprovider == COLLPROVIDER_LIBC)
 		{
 			if (!collcollate)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 9e73f54803..6dc737aebb 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1043,6 +1043,27 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
 
+	/*
+	 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+	 * optimizable to byte operations (memcmp(), pg_ascii_tolower(), etc.);
+	 * transform into the "none" provider.
+	 *
+	 * Don't transform during binary upgrade or when both the provider and ICU
+	 * locale are unchanged from the template.
+	 */
+	if (!IsBinaryUpgrade && dblocprovider == COLLPROVIDER_ICU &&
+		(src_locprovider != COLLPROVIDER_ICU ||
+		 strcmp(dbiculocale, src_iculocale) != 0) &&
+		dbiculocale && (pg_strcasecmp(dbiculocale, "C") == 0 ||
+						pg_strcasecmp(dbiculocale, "POSIX") == 0))
+	{
+		ereport(NOTICE,
+				(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+						dbiculocale)));
+		dbiculocale = NULL;
+		dblocprovider = COLLPROVIDER_NONE;
+	}
+
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
 		if (!(is_encoding_supported_by_icu(encoding)))
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4cf6892bee..ea26bf8361 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2501,6 +2501,16 @@ setlocales(void)
 			lc_messages = locale;
 	}
 
+	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = COLLPROVIDER_NONE;
+	}
+
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index fe6d224e5b..ea92b08511 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -111,6 +111,45 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'option --icu-locale');
 
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			"$tempdir/data4a"
+		],
+		'option --icu-locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--locale=C',
+			"$tempdir/data4b"
+		],
+		'option --icu-locale=C --locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-collate=C',
+			"$tempdir/data4c"
+		],
+		'option --icu-locale=C --lc-collate=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-ctype=C',
+			"$tempdir/data4d"
+		],
+		'option --icu-locale=C --lc-ctype=C');
+
 	command_fails_like(
 		[
 			'initdb',                '--no-sync',
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 79367d933b..9caf9190cf 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -172,6 +172,17 @@ main(int argc, char *argv[])
 			lc_collate = locale;
 	}
 
+	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
+		icu_locale &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = "none";
+	}
+
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 5aa658b671..eb3682f0fd 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -75,6 +75,18 @@ if ($ENV{with_icu} eq 'yes')
 	$node2->command_ok(
 		[ 'createdb', '-T', 'template0', '--icu-locale', 'en-US', 'foobar56' ],
 		'create database with icu locale from template database with icu provider');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  'test_none_icu1' ],
+		'create database with provider "icu" and ICU_LOCALE="C"');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  '--lc-ctype=C', 'test_none_icu_2' ],
+		'create database with provider "icu" and ICU_LOCALE="C" and LC_CTYPE=C');
 }
 else
 {
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 12afc3b65a..c0437231ad 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1035,6 +1035,9 @@ BEGIN
 END
 $$;
 RESET client_min_messages;
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+NOTICE:  using locale provider "none" for ICU locale "C"
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
@@ -1069,7 +1072,8 @@ SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
  test0
  test1
  test5
-(3 rows)
+ testc
+(4 rows)
 
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
@@ -1090,7 +1094,8 @@ SELECT collname, nspname, obj_description(pg_collation.oid, 'pg_collation')
  test0    | collate_tests | US English
  test11   | test_schema   | 
  test5    | collate_tests | 
-(3 rows)
+ testc    | collate_tests | 
+(4 rows)
 
 DROP COLLATION test0, test_schema.test11, test5;
 DROP COLLATION test0; -- fail
@@ -1100,7 +1105,8 @@ NOTICE:  collation "test0" does not exist, skipping
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%';
  collname 
 ----------
-(0 rows)
+ testc
+(1 row)
 
 DROP SCHEMA test_schema;
 DROP ROLE regress_test_role;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 655c965f46..63c29dfe2a 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -375,6 +375,9 @@ $$;
 
 RESET client_min_messages;
 
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-- 
2.34.1

v5-0006-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v5-0006-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From 79732b2f94d5097b5ceebd2a22fdbb692c780156 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v5 6/7] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 ++--
 doc/src/sgml/ref/createdb.sgml                |  5 +++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++--
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 +++++++---
 src/bin/initdb/initdb.c                       | 11 ++++++--
 src/bin/scripts/createdb.c                    | 13 ++++-----
 src/bin/scripts/t/020_createdb.pl             |  4 +--
 src/test/icu/t/010_database.pl                | 23 +++++++++------
 .../regress/expected/collate.icu.utf8.out     | 28 +++++++++----------
 10 files changed, 68 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index c730d02e15..dc57ba0c8b 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 7c573e848a..7991153ecc 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 76993acdfe..d9ef21c422 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index ed64e17504..9a83f9f303 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -302,7 +302,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				if (langtag && strcmp(colliculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, colliculocale)));
 
 					colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6dc737aebb..154f20573c 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1019,7 +1019,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1033,12 +1038,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1094,7 +1101,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			if (langtag && strcmp(dbiculocale, langtag) != 0)
 			{
 				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
+						(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 								langtag, dbiculocale)));
 
 				dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ea26bf8361..ccb2414fed 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2157,7 +2157,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2467,7 +2471,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale_provider == COLLPROVIDER_NONE)
 	{
@@ -2499,6 +2503,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
@@ -3392,7 +3398,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 9caf9190cf..51c4bb3592 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
 		icu_locale &&
 		(pg_strcasecmp(icu_locale, "C") == 0 ||
@@ -230,6 +222,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index eb3682f0fd..81a9931c09 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -167,7 +167,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -175,7 +175,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index c0437231ad..39f61ca281 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1058,11 +1058,11 @@ CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+NOTICE:  using standard form "und-u-cu-eur" for ICU locale "@EURO"
 CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+NOTICE:  using standard form "und-u-co-pinyin" for ICU locale "@pinyin"
 CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
+NOTICE:  using standard form "und-u-co-stroke" for ICU locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
@@ -1211,9 +1211,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1221,7 +1221,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1238,12 +1238,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1252,7 +1252,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1280,13 +1280,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1336,9 +1336,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

v5-0007-Add-default_collation_provider-GUC.patchtext/x-patch; charset=UTF-8; name=v5-0007-Add-default_collation_provider-GUC.patchDownload

From 3ca8e0a84f6593ffff9a409bd31dc1c9ed253d3a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 11 May 2023 12:54:31 -0700
Subject: [PATCH v5 7/7] Add default_collation_provider GUC.

Controls default collation provider for CREATE COLLATION. Does not
affect CREATE DATABASE, which gets its default from the template
database.
---
 doc/src/sgml/config.sgml                       | 17 +++++++++++++++++
 src/backend/commands/collationcmds.c           |  3 ++-
 src/backend/utils/misc/guc_tables.c            | 18 ++++++++++++++++++
 src/backend/utils/misc/postgresql.conf.sample  |  4 ++++
 src/include/commands/collationcmds.h           |  2 ++
 src/test/regress/expected/collate.icu.utf8.out | 17 +++++++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql      | 10 ++++++++++
 7 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c4a9dcb9ae..038ecf9811 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9819,6 +9819,23 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-collation-provider" xreflabel="default_collation_provider">
+      <term><varname>default_collation_provider</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>default_collation_provider</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Default collation provider for <command>CREATE
+        COLLATION</command>. Does not affect <command>CREATE
+        DATABASE</command>, which gets the default collation provider from the
+        template database. Valid values are <literal>icu</literal> and
+        <literal>libc</literal>. The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-icu-validation-level" xreflabel="icu_validation_level">
       <term><varname>icu_validation_level</varname> (<type>enum</type>)
       <indexterm>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9a83f9f303..b42a660386 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -47,6 +47,7 @@ typedef struct
 	int			enc;			/* encoding */
 } CollAliasData;
 
+int		default_collation_provider = (int) COLLPROVIDER_LIBC;
 
 /*
  * CREATE COLLATION
@@ -228,7 +229,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+			collprovider = (char) default_collation_provider;
 
 		if (collprovider == COLLPROVIDER_NONE
 			&& (localeEl || lccollateEl || lcctypeEl))
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8c843f4ab6..d64b3a9a6f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -35,8 +35,10 @@
 #include "access/xlogrecovery.h"
 #include "archive/archive_module.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_collation.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/collationcmds.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "commands/user.h"
@@ -166,6 +168,12 @@ static const struct config_enum_entry intervalstyle_options[] = {
 	{NULL, 0, false}
 };
 
+static const struct config_enum_entry collation_provider_options[] = {
+	{"icu", (int) 'i', false},
+	{"libc", (int) 'c', false},
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry icu_validation_level_options[] = {
 	{"disabled", -1, false},
 	{"debug5", DEBUG5, false},
@@ -4683,6 +4691,16 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"default_collation_provider", PGC_USERSET, CLIENT_CONN_LOCALE,
+		 gettext_noop("Default collation provider for CREATE COLLATION."),
+		 NULL
+		},
+		&default_collation_provider,
+		(int) COLLPROVIDER_LIBC, collation_provider_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"icu_validation_level", PGC_USERSET, CLIENT_CONN_LOCALE,
 		 gettext_noop("Log level for reporting invalid ICU locale strings."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 87bad8ecbf..b2b015b31f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -734,6 +734,10 @@
 #lc_numeric = 'C'			# locale for number formatting
 #lc_time = 'C'				# locale for time formatting
 
+#default_collation_provider = 'libc'	# default collation provider
+					# for CREATE COLLATION
+					# (none, icu, libc)
+
 #icu_validation_level = WARNING		# report ICU locale validation
 					# errors at the given level
 
diff --git a/src/include/commands/collationcmds.h b/src/include/commands/collationcmds.h
index b76c7b3dc3..f54389525d 100644
--- a/src/include/commands/collationcmds.h
+++ b/src/include/commands/collationcmds.h
@@ -18,6 +18,8 @@
 #include "catalog/objectaddress.h"
 #include "parser/parse_node.h"
 
+extern PGDLLIMPORT int default_collation_provider;
+
 extern ObjectAddress DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_exists);
 extern void IsThereCollationInNamespace(const char *collname, Oid nspOid);
 extern ObjectAddress AlterCollation(AlterCollationStmt *stmt);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 39f61ca281..d9da8d1310 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1038,6 +1038,23 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 NOTICE:  using locale provider "none" for ICU locale "C"
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+ collname | collprovider 
+----------+--------------
+ def_libc | c
+(1 row)
+
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+ collname | collprovider 
+----------+--------------
+ def_icu  | i
+(1 row)
+
+RESET default_collation_provider;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 63c29dfe2a..13089c7f8e 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -378,6 +378,16 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+
+RESET default_collation_provider;
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-- 
2.34.1

#50

Alexander Lakhin

exclusion@gmail.com

over 2 years ago

In reply to: Jeff Davis (#44)

Re: Order changes in PG16 since ICU introduction

Hello Jeff,

09.05.2023 00:59, Jeff Davis wrote:

The easiest thing to do is revert it for now, and after we sort out the
memcmp() path for the ICU provider, then I can commit it again (after
that point it would just be code cleanup and should have no functional
impact).

On the current master (after 455f948b0, and before f7faa9976, of course)
I get an ASAN-detected failure with the following query:
CREATE COLLATION col (provider = icu, locale = '123456789012');

==2929883==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc491be09c at pc 0x556e8571a260 bp 0x7
ffc491be020 sp 0x7ffc491bd7c8
READ of size 15 at 0x7ffc491be09c thread T0
    #0 0x556e8571a25f in __interceptor_strcmp.part.0 (.../usr/local/pgsql/bin/postgres+0x2aa025f)
    #1 0x556e86d77ee6 in icu_language_tag .../src/backend/utils/adt/pg_locale.c:2802
...
Address 0x7ffc491be09c is located in stack of thread T0 at offset 76 in frame
    #0 0x556e86d77cfe in icu_language_tag .../src/backend/utils/adt/pg_locale.c:2782

This frame has 2 object(s):
[48, 52) 'status' (line 2784)
[64, 76) 'lang' (line 2785) <== Memory access at offset 76 overflows this variable
...

Here, uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status) returns
status = -124, i.e.,
U_STRING_NOT_TERMINATED_WARNING = -124,/**< An output string could not be NUL-terminated because output
length==destCapacity. */
(ULOC_LANG_CAPACITY = 12)
this value is not covered by U_FAILURE(status), and strcmp(), that follows,
goes out of the lang variable bounds.

Best regards,
Alexander

#51

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#49)

Re: Order changes in PG16 since ICU introduction

On 11.05.23 23:29, Jeff Davis wrote:

New patch series attached.

=== 0001: fix bug that allows creating hidden collations

Bug:
/messages/by-id/051c9395cf880307865ee8b17acdbf7f838c1e39.camel@j-davis.com

This is still being debated in the other thread. Not really related to
this thread, so I suggest dropping it from this patch series.

=== 0002: handle some kinds of libc-stlye locale strings

ICU used to handle libc locale strings like 'fr_FR@euro', but doesn't
in later versions. Handle them in postgres for consistency.

I tend to agree with ICU that these variants are obsolete, and we don't
need to support them anymore. If this were a tiny patch, then maybe ok,
but the way it's presented here the whole code is duplicated between
pg_locale.c and initdb.c, which is not great.

=== 0003: reduce icu_validation_level to WARNING

Given that we've seen some inconsistency in which locale names are
accepted in different ICU versions, it seems best not to be too strict.
Peter Eisentraut suggested that it be set to ERROR originally, but a
WARNING should be sufficient to see problems without introducing risks
migrating to version 16.

I'm not sure why this is the conclusion. Presumably, the detection
capabilities of ICU improve over time, so we want to take advantage of
that? What are some example scenarios where this change would help?

=== 0004-0006:

To solve the issues that have come up in this thread, we need CREATE
DATABASE (and createdb and initdb) to use LOCALE to mean the collation
locale regardless of which provider is in use (which is what 0006
does).

0006 depends on ICU handling libc locale names. It already does a good
job for most libc locale names (though patch 0002 fixes a few cases
where it doesn't). There may be more cases, but for the most part libc
names are interpreted in a reasonable way. But one important case is
missing: ICU does not handle the "C" locale as we expect (that is,
using memcmp()).

We've already allowed users to create ICU collations with the C locale
in the past, which uses the root collation (not memcmp()), and we need
to keep supporting that for upgraded clusters.

I'm not sure I agree that we need to keep supporting that. The only way
you could get that in past releases is if you specify explicitly, "give
me provider ICU and locale C", and then it wouldn't actually even work
correctly. So nobody should be using that in practice, and nobody
should have stumbled into that combination of settings by accident.

3. Introduce collation provider "none", which is always memcmp-based
(patch 0004). It's equivalent to the libc locale=C, but it allows
specifying the LC_COLLATE and LC_CTYPE independently. A command like
CREATE DATABASE ... LOCALE_PROVIDER='icu' ICU_LOCALE='C'
LC_COLLATE='en_US' would get changed (with a NOTICE) to provider "none"
(patch 0005), so you'd have datlocprovider=none, datcollate=en_US. For
the database default collation, that would always use memcmp(), but the
server environment LC_COLLATE would be set to en_US as the user
specified.

This seems most attractive, but I think it's quite invasive at this
point, especially given the dubious premise (see above).

=== 0007: Add a GUC to control the default collation provider

Having a GUC would make it easier to migrate to ICU without surprises.
This only affects the default for CREATE COLLATION, not CREATE DATABASE
(and obviously not initdb).

It's not clear to me why we would want that. Also not clear why it
should only affect CREATE COLLATION.

#52

https://unicode-org.atlassian.net/browse/ICU-22394

pgsql@j-davis.com

over 2 years ago

In reply to: Alexander Lakhin (#50)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Sat, 2023-05-13 at 13:00 +0300, Alexander Lakhin wrote:

On the current master (after 455f948b0, and before f7faa9976, of
course)
I get an ASAN-detected failure with the following query:
CREATE COLLATION col (provider = icu, locale = '123456789012');

Thank you for the report!

ICU source specifically says:

/**
* Useful constant for the maximum size of the language
part of a locale ID.
* (including the terminating NULL).
* @stable ICU 2.0
*/
#define ULOC_LANG_CAPACITY 12

So the fact that it returning success without nul-terminating the
result is an ICU bug in my opinion, and I reported it here:

Unfortunately that means we need to be a bit more paranoid and always
check for that warning, even if we have a preallocated buffer of the
"correct" size. It also means that both U_STRING_NOT_TERMINATED_WARNING
and U_BUFFER_OVERFLOW_ERROR will be user-facing errors (potentially
scary), unless we check for those errors each time and report specific
errors for them.

Another approach is to always check the length and loop using dynamic
allocation so that we never overflow (and forget about any static
buffers). That seems like overkill given that the problem case is
obviously invalid input; I think as long as we catch it and throw an
ERROR it's fine. But I can do this if others think it's worthwhile.

Patch attached. It just checks for the U_STRING_NOT_TERMINATED_WARNING
in a few places and turns it into an ERROR. It also cleans up the loop
for uloc_getLanguageTag() to check for U_STRING_NOT_TERMINATED_WARNING
rather than (U_SUCCESS(status) && len >= buflen).

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

0001-ICU-check-for-U_STRING_NOT_TERMINATED_WARNING.patchtext/x-patch; charset=UTF-8; name=0001-ICU-check-for-U_STRING_NOT_TERMINATED_WARNING.patchDownload

From 9c8e9272ca807c9f75a7b32fa3190700cc600260 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 15 May 2023 13:35:07 -0700
Subject: [PATCH] ICU: check for U_STRING_NOT_TERMINATED_WARNING.

In some cases, ICU can fail to NUL-terminate a result string even if
using an appropriately-sized static buffer. The caller must either
check for len >= buflen or U_STRING_NOT_TERMINATED_WARNING.

The specific problem is related to uloc_getLanguage(), but add the
check in other cases as well.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/2098874d-c111-41e4-9063-30bcf135226b@gmail.com
---
 src/backend/utils/adt/pg_locale.c | 29 +++++++++++------------------
 src/bin/initdb/initdb.c           | 15 ++++-----------
 2 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f0b6567da1..1cf93b2d20 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2468,7 +2468,7 @@ pg_ucol_open(const char *loc_str)
 
 		status = U_ZERO_ERROR;
 		uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-		if (U_FAILURE(status))
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 		{
 			ereport(ERROR,
 					(errmsg("could not get language from locale \"%s\": %s",
@@ -2504,7 +2504,7 @@ pg_ucol_open(const char *loc_str)
 		 * Pretend the error came from ucol_open(), for consistent error
 		 * message across ICU versions.
 		 */
-		if (U_FAILURE(status))
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 		{
 			ucol_close(collator);
 			ereport(ERROR,
@@ -2639,7 +2639,8 @@ icu_from_uchar(char **result, const UChar *buff_uchar, int32_t len_uchar)
 	status = U_ZERO_ERROR;
 	len_result = ucnv_fromUChars(icu_converter, *result, len_result + 1,
 								 buff_uchar, len_uchar, &status);
-	if (U_FAILURE(status))
+	if (U_FAILURE(status) ||
+		status == U_STRING_NOT_TERMINATED_WARNING)
 		ereport(ERROR,
 				(errmsg("%s failed: %s", "ucnv_fromUChars",
 						u_errorName(status))));
@@ -2681,7 +2682,7 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 	icu_locale_id = palloc(len + 1);
 	*status = U_ZERO_ERROR;
 	len = uloc_canonicalize(loc, icu_locale_id, len + 1, status);
-	if (U_FAILURE(*status))
+	if (U_FAILURE(*status) || *status == U_STRING_NOT_TERMINATED_WARNING)
 		return;
 
 	lower_str = asc_tolower(icu_locale_id, strlen(icu_locale_id));
@@ -2765,7 +2766,6 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 
 	pfree(lower_str);
 }
-
 #endif
 
 /*
@@ -2789,7 +2789,7 @@ icu_language_tag(const char *loc_str, int elevel)
 
 	status = U_ZERO_ERROR;
 	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
+	if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 	{
 		if (elevel > 0)
 			ereport(elevel,
@@ -2811,19 +2811,12 @@ icu_language_tag(const char *loc_str, int elevel)
 	langtag = palloc(buflen);
 	while (true)
 	{
-		int32_t		len;
-
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
 
-		/*
-		 * If the result fits in the buffer exactly (len == buflen),
-		 * uloc_toLanguageTag() will return success without nul-terminating
-		 * the result. Check for either U_BUFFER_OVERFLOW_ERROR or len >=
-		 * buflen and try again.
-		 */
+		/* try again if the buffer is not large enough */
 		if ((status == U_BUFFER_OVERFLOW_ERROR ||
-			 (U_SUCCESS(status) && len >= buflen)) &&
+			 status == U_STRING_NOT_TERMINATED_WARNING) &&
 			buflen < MaxAllocSize)
 		{
 			buflen = Min(buflen * 2, MaxAllocSize);
@@ -2878,7 +2871,7 @@ icu_validate_locale(const char *loc_str)
 	/* validate that we can extract the language */
 	status = U_ZERO_ERROR;
 	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
+	if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 	{
 		ereport(elevel,
 				(errmsg("could not get language from ICU locale \"%s\": %s",
@@ -2901,7 +2894,7 @@ icu_validate_locale(const char *loc_str)
 
 		status = U_ZERO_ERROR;
 		uloc_getLanguage(otherloc, otherlang, ULOC_LANG_CAPACITY, &status);
-		if (U_FAILURE(status))
+		if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 			continue;
 
 		if (strcmp(lang, otherlang) == 0)
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index e03d498b1e..30b576932f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2252,7 +2252,7 @@ icu_language_tag(const char *loc_str)
 
 	status = U_ZERO_ERROR;
 	uloc_getLanguage(loc_str, lang, ULOC_LANG_CAPACITY, &status);
-	if (U_FAILURE(status))
+	if (U_FAILURE(status) || status == U_STRING_NOT_TERMINATED_WARNING)
 	{
 		pg_fatal("could not get language from locale \"%s\": %s",
 				 loc_str, u_errorName(status));
@@ -2272,19 +2272,12 @@ icu_language_tag(const char *loc_str)
 	langtag = pg_malloc(buflen);
 	while (true)
 	{
-		int32_t		len;
-
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
 
-		/*
-		 * If the result fits in the buffer exactly (len == buflen),
-		 * uloc_toLanguageTag() will return success without nul-terminating
-		 * the result. Check for either U_BUFFER_OVERFLOW_ERROR or len >=
-		 * buflen and try again.
-		 */
+		/* try again if the buffer is not large enough */
 		if (status == U_BUFFER_OVERFLOW_ERROR ||
-			(U_SUCCESS(status) && len >= buflen))
+			status == U_STRING_NOT_TERMINATED_WARNING)
 		{
 			buflen = buflen * 2;
 			langtag = pg_realloc(langtag, buflen);
-- 
2.34.1

#53

pgsql@j-davis.com

over 2 years ago

In reply to: Jeff Davis (#44)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-05-08 at 14:59 -0700, Jeff Davis wrote:

The easiest thing to do is revert it for now, and after we sort out
the
memcmp() path for the ICU provider, then I can commit it again (after
that point it would just be code cleanup and should have no
functional
impact).

The conversion won't be entirely dead code even after we handle the "C"
locale with memcmp(): for a locale like "C.UTF-8", it will still be
passed to the collation provider (same as with libc), and in that case,
we should still convert that to a language tag consistently across ICU
versions.

For it to be entirely dead code, we would need to convert any locale
with language "C" (e.g. "C.UTF-8") to use the memcmp() path. I'm fine
with that, but that's not what the libc provider does today, and
perhaps we should be consistent between the two. If we do leave the
code in place, we can document that specific "en-US-u-va-posix" locale
so that it's not too surprising for users.

Regards,
Jeff Davis

#54

Alexander Lakhin

exclusion@gmail.com

over 2 years ago

In reply to: Jeff Davis (#52)

Re: Order changes in PG16 since ICU introduction

Hi Jeff,

16.05.2023 00:03, Jeff Davis wrote:

On Sat, 2023-05-13 at 13:00 +0300, Alexander Lakhin wrote:

On the current master (after 455f948b0, and before f7faa9976, of
course)
I get an ASAN-detected failure with the following query:
CREATE COLLATION col (provider = icu, locale = '123456789012');

Thank you for the report!

ICU source specifically says:

/**
* Useful constant for the maximum size of the language
part of a locale ID.
* (including the terminating NULL).
* @stable ICU 2.0
*/
#define ULOC_LANG_CAPACITY 12

So the fact that it returning success without nul-terminating the
result is an ICU bug in my opinion, and I reported it here:

https://unicode-org.atlassian.net/browse/ICU-22394

Unfortunately that means we need to be a bit more paranoid and always
check for that warning, even if we have a preallocated buffer of the
"correct" size. It also means that both U_STRING_NOT_TERMINATED_WARNING
and U_BUFFER_OVERFLOW_ERROR will be user-facing errors (potentially
scary), unless we check for those errors each time and report specific
errors for them.

Another approach is to always check the length and loop using dynamic
allocation so that we never overflow (and forget about any static
buffers). That seems like overkill given that the problem case is
obviously invalid input; I think as long as we catch it and throw an
ERROR it's fine. But I can do this if others think it's worthwhile.

Patch attached. It just checks for the U_STRING_NOT_TERMINATED_WARNING
in a few places and turns it into an ERROR. It also cleans up the loop
for uloc_getLanguageTag() to check for U_STRING_NOT_TERMINATED_WARNING
rather than (U_SUCCESS(status) && len >= buflen).

I'm not sure about the proposed change in icu_from_uchar(). It seems that
len_result + 1 bytes should always be enough for the result string terminated
with NUL. If that's not true (we want to protect from some ICU bug here),
then the change should be backpatched?

Best regards,
Alexander

#55

pgsql@j-davis.com

over 2 years ago

In reply to: Alexander Lakhin (#54)

Re: Order changes in PG16 since ICU introduction

On Tue, 2023-05-16 at 19:00 +0300, Alexander Lakhin wrote:

I'm not sure about the proposed change in icu_from_uchar(). It seems
that
len_result + 1 bytes should always be enough for the result string
terminated
with NUL. If that's not true (we want to protect from some ICU bug
here),
then the change should be backpatched?

I believe it's enough and I'm not aware of any bug there so no backport
is required.

I added checks in places that were (a) checking for U_FAILURE; and (b)
expecting the result to be NUL-terminated. That's mostly callers of
uloc_getLanguage(), where I was not quite paranoid enough.

There were a couple other places though, and I went ahead and added
checks there out of paranoia, too. One was ucnv_fromUChars(), and the
other was uloc_canonicalize().

Regards,
Jeff Davis

#56

[1]: https://unicode-org.github.io/icu/userguide/collation/concepts.html

jkatz@postgresql.org

over 2 years ago

In reply to: Jeff Davis (#42)

Re: Order changes in PG16 since ICU introduction

On 5/5/23 8:25 PM, Jeff Davis wrote:

On Fri, 2023-04-21 at 20:12 -0400, Robert Haas wrote:

On Fri, Apr 21, 2023 at 5:56 PM Jeff Davis <pgsql@j-davis.com> wrote:

Most of the complaints seem to be complaints about v15 as well, and
while those complaints may be a reason to not make ICU the default,
they are also an argument that we should continue to learn and try
to
fix those issues because they exist in an already-released version.
Leaving it the default for now will help us fix those issues rather
than hide them.

It's still early, so we have plenty of time to revert the initdb
default if we need to.

That's fair enough, but I really think it's important that some
energy
get invested in providing adequate documentation for this stuff. Just
patching the code is not enough.

Attached a significant documentation patch.

I tried to make it comprehensive without trying to be exhaustive, and I
separated the explanation of language tags from what collation settings
you can include in a language tag, so hopefully that's more clear.

I added quite a few examples spread throughout the various sections,
and I preserved the existing examples at the end. I also left all of
the external links at the bottom for those interested enough to go
beyond what's there.

[Personal hat, not RMT]

Thanks -- this is super helpful. A bunch of these examples I had
previously had to figure out by randomly searching blog posts /
trial-and-error, so I think this will help developers get started more
quickly.

Comments (and a lot are just little nits to tighten the language)

Commit message -- typo: "documentaiton"

+     If you see such a message, ensure that the 
<symbol>PROVIDER</symbol> and
+     <symbol>LOCALE</symbol> are as you expect, and consider specifying
+     directly as the canonical language tag instead of relying on the
+     transformation.
+    </para>

I'd recommend make this more prescriptive:

"If you see this notice, ensure that the <symbol>PROVIDER</symbol> and
<symbol>LOCALE</symbol> are the expected result. For consistent results
when using the ICU provider, specify the canonical <link
linkend="icu-language-tag">language tag</link> instead of relying on the
transformation."

+     If there is some problem interpreting the locale name, or if it 
represents
+     a language or region that ICU does not recognize, a message will 
be reported:

This is passive voice, consider:

"If there is a problem interpreting the locale name, or if the locale
name represents a language or region that ICU does not recognize, you'll
see the following error:"

+   <sect3 id="icu-language-tag">
+    <title>Language Tag</title>
+    <para>

Before jumping in, I'd recommend a quick definition of what a language
tag is, e.g.:

"A language tag, defined in BCP 47, is a standardized identifier used to
identify languages in computer systems" or something similar.

(I did find a database that made it simpler to search for these, which
is one issue I've previously add, but I don't think we'd want to link to i)

+     To include this additional collation information in a language tag,
+     append <literal>-u</literal>, followed by one or more

My first question was "what's special about '-u'", so maybe we say:

"To include this additional collation information in a language tag,
append <literal>-u</literal>, which indicates there are additional
collation settings, followed by one or more..."

+     ICU locales are specified as a <link 
linkend="icu-language-tag">Language
+     Tag</link>, but can also accept most libc-style locale names 
(which will
+     be transformed into language tags if possible).
+    </para>

I'd recommend removing the parantheticals:

ICU locales are specified as a BCP 47 <link
linkend="icu-language-tag">Language
Tag</link>, but can also accept most libc-style locale names. If
possible, libc-style locale names are transformed into language tags.

+ <title>ICU Collation Levels</title>

Nothing to add here other than to say I'm extremely appreciative of this
section. Once upon a time I sunk a lot of time trying to figure out how
all of these levels worked.

+          Sensitivity when determining equality, with
+          <literal>level1</literal> the least sensitive and
+          <literal>identic</literal> the most sensitive. See <xref
+          linkend="icu-collation-levels"/> for details.

This discusses equality sensitivity, but I'm not sure if I understand
that term here. The ICU docs seem to call these "strengths"[1]https://unicode-org.github.io/icu/userguide/collation/concepts.html, maybe we
use that term to be consistent with upstream?

+          If set to <literal>upper</literal>, upper case sorts before lower
+          case. If set to <literal>lower</literal>, lower case sorts before
+          upper case. If set to <literal>false</literal>, it depends on the
+          locale.

Suggestion to tighten this up:

"If set to <literal>false</literal>, the sort depends on the rules of
the locale."

+      Defaults may depend on locale. The above table is not meant to be
+      complete. See <xref linkend="icu-external-references"/> for additinal
+      options and details.

Typo: additinal => "additional"

I didn't add additional documentation for ICU rules. There are so many
options for collations that it's hard for me to think of realistic
examples to specify the rules directly, unless someone wants to invent
a new language. Perhaps useful if working with an interesting text file
format with special treatment for delimiters?

I asked the question about rules here:

/messages/by-id/e861ac4fdae9f9f5ce2a938a37bcb5e083f0f489.camel@cybertec.at

and got some limited response about addressing sort complaints. That
sounds reasonable, but a lot of that can also be handled just by
specifying the right collation settings. Someone who understands the
use case better could add some more documentation.

I'm not too sure about this one -- from my experience, users want
predictability in sorts, but there are a variety of ways to get that
experience.

Thanks,

Jonathan

#57

pgsql@j-davis.com

over 2 years ago

In reply to: Jonathan S. Katz (#56)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Tue, 2023-05-16 at 15:35 -0400, Jonathan S. Katz wrote:

+          Sensitivity when determining equality, with
+          <literal>level1</literal> the least sensitive and
+          <literal>identic</literal> the most sensitive. See <xref
+          linkend="icu-collation-levels"/> for details.
This discusses equality sensitivity, but I'm not sure if I understand
that term here. The ICU docs seem to call these "strengths"[1], maybe
we
use that term to be consistent with upstream?

"Sensitivity" comes from "case sensitivity" which is more clear to me
than "strength". I added the term "strength" to correspond to the
unicode terminology, but I kept sensitivity and I tried to make it
slightly more clear.

Other than that, and I took your suggestions almost verbatim. Patch
attached. Thank you!

I also made a few other changes:

* added paragraph transformation of '' or 'root' to the 'und'
language (root collation)
* added paragraph that the "identic" level still performs some basic
normalization
* added example for when full normalization matters

I should also say that I don't really understand the case when "kc" is
set to true and "ks" is level 2 or higher. If someone has an example of
where that matters, let me know.

Regards,
Jeff Davis

Attachments:

v2-0001-Doc-improvements-for-language-tags-and-custom-ICU.patchtext/x-patch; charset=UTF-8; name=v2-0001-Doc-improvements-for-language-tags-and-custom-ICU.patchDownload

From 8633ec205b0b0297910cef8f931092d0c05eb3ce Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 27 Apr 2023 14:43:46 -0700
Subject: [PATCH v2] Doc improvements for language tags and custom ICU
 collations.

Separate the documentation for language tags themselves from the
available collation settings which can be included in a language tag.

Include tables of the available options, more details about the
effects of each option, and additional examples.

Also include an explanation of the "levels" of textual features and
how they relate to collation.

Discussion: https://postgr.es/m/25787ec7-4c04-9a8a-d241-4dc9be0b1ba3@postgresql.org
Reviewed-by: Jonathan S. Katz
---
 doc/src/sgml/charset.sgml | 680 +++++++++++++++++++++++++++++++-------
 1 file changed, 559 insertions(+), 121 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 6dd95b8966..ea43732ec9 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -377,7 +377,134 @@ initdb --locale-provider=icu --icu-locale=en
     variants and customization options.
    </para>
   </sect2>
+  <sect2 id="icu-locales">
+   <title>ICU Locales</title>
+   <sect3 id="icu-locale-names">
+    <title>ICU Locale Names</title>
+    <para>
+     The ICU format for the locale name is a <link
+     linkend="icu-language-tag">Language Tag</link>.
+
+<programlisting>
+CREATE COLLATION mycollation1 (PROVIDER = icu, LOCALE = 'ja-JP);
+CREATE COLLATION mycollation2 (PROVIDER = icu, LOCALE = 'fr');
+</programlisting>
+    </para>
+   </sect3>
+   <sect3 id="icu-canonicalization">
+    <title>Locale Canonicalization and Validation</title>
+    <para>
+     When defining a new ICU collation object or database with ICU as the
+     provider, the given locale name is transformed ("canonicalized") into a
+     language tag if not already in that form. For instance,
+
+<screen>
+CREATE COLLATION mycollation3 (PROVIDER = icu, LOCALE = 'en-US-u-kn-true');
+NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
+CREATE COLLATION mycollation4 (PROVIDER = icu, LOCALE = 'de_DE.utf8');
+NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"
+</screen>
+
+     If you see this notice, ensure that the <symbol>PROVIDER</symbol> and
+     <symbol>LOCALE</symbol> are the expected result. For consistent results
+     when using the ICU provider, specify the canonical <link
+     linkend="icu-language-tag">language tag</link> instead of relying on the
+     transformation.
+    </para>
+    <para>
+     A locale with no language name, or the special language name
+     <literal>root</literal>, is transformed to have the language
+     <literal>und</literal> ("undefined").
+    </para>
+    <para>
+     ICU can transform most libc locale names, as well as some other formats,
+     into language tags for easier transition to ICU. If a libc locale name is
+     used in ICU, it may not have precisely the same behavior as in libc.
+    </para>
+    <para>
+     If there is a problem interpreting the locale name, or if the locale name
+     represents a language or region that ICU does not recognize, you will see
+     the following error:
+
+<screen>
+CREATE COLLATION nonsense (PROVIDER = icu, LOCALE = 'nonsense');
+ERROR:  ICU locale "nonsense" has unknown language "nonsense"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+</screen>
+
+     <xref
+     linkend="guc-icu-validation-level"/> controls how the message is
+     reported. If set below <literal>ERROR</literal>, the collation will still
+     be created, but the behavior may not be what the user intended.
+    </para>
+   </sect3>
+   <sect3 id="icu-language-tag">
+    <title>Language Tag</title>
+    <para>
+     A language tag, defined in BCP 47, is a standardized identifier used to
+     identify languages, regions, and other information about a locale.
+    </para>
+    <para>
+     Basic language tags are simply
+     <replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
+     or even just <replaceable>language</replaceable>. The
+     <replaceable>language</replaceable> is a language code
+     (e.g. <literal>fr</literal> for French), and
+     <replaceable>region</replaceable> is a region code
+     (e.g. <literal>CA</literal> for Canada). Examples:
+     <literal>ja-JP</literal>, <literal>de</literal>, or
+     <literal>fr-CA</literal>.
+    </para>
+    <para>
+     Collation settings may be included in the language tag to customize
+     collation behavior. ICU allows extensive customization, such as
+     sensitivity (or insensitivity) to accents, case, and punctuation;
+     treatment of digits within text; and many other options to satisfy a
+     variety of uses.
+    </para>
+    <para>
+     To include this additional collation information in a language tag,
+     append <literal>-u</literal>, which indicates there are additional
+     collation settings, followed by one or more
+     <literal>-</literal><replaceable>key</replaceable><literal>-</literal><replaceable>value</replaceable>
+     pairs. The <replaceable>key</replaceable> is the key for a <link
+     linkend="icu-collation-settings">collation setting</link> and
+     <replaceable>value</replaceable> is a valid value for that setting. For
+     boolean settings, the <literal>-</literal><replaceable>key</replaceable>
+     may be specified without a corresponding
+     <literal>-</literal><replaceable>value</replaceable>, which implies a
+     value of <literal>true</literal>.
+    </para>
+    <para>
+     For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
+     means the locale with the English language in the US region, with
+     collation settings <literal>kn</literal> set to <literal>true</literal>
+     and <literal>ks</literal> set to <literal>level2</literal>. Those
+     settings mean the collation will be case-insensitive and treat a sequence
+     of digits as a single number:
 
+<screen>
+CREATE COLLATION mycollation5 (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'en-US-u-kn-ks-level2');
+SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+
+SELECT 'N-45' &lt; 'N-123' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+</screen>
+    </para>
+    <para>
+     See <xref linkend="icu-custom-collations"/> for details and additional
+     examples of using language tags with custom collation information for the
+     locale.
+    </para>
+   </sect3>
+  </sect2>
   <sect2 id="locale-problems">
    <title>Problems</title>
 
@@ -658,6 +785,13 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
     code byte values.
    </para>
 
+   <note>
+    <para>
+     The <literal>C</literal> and <literal>POSIX</literal> locales may behave
+     differently depending on the database encoding.
+    </para>
+   </note>
+
    <para>
     Additionally, two SQL standard collation names are available:
 
@@ -869,132 +1003,24 @@ CREATE COLLATION german (provider = libc, locale = 'de_DE');
    <sect4 id="collation-managing-create-icu">
     <title>ICU Collations</title>
 
-   <para>
-    ICU allows collations to be customized beyond the basic language+country
-    set that is preloaded by <command>initdb</command>.  Users are encouraged
-    to define their own collation objects that make use of these facilities to
-    suit the sorting behavior to their requirements.
-    See <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
-    and <ulink url="https://unicode-org.github.io/icu/userguide/collation/api.html"></ulink> for
-    information on ICU locale naming.  The set of acceptable names and
-    attributes depends on the particular ICU version.
-   </para>
-
-   <para>
-    Here are some examples:
-
-    <variablelist>
-     <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
-      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
-      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');</literal></term>
-      <listitem>
-       <para>German collation with phone book collation type</para>
-       <para>
-        The first example selects the ICU locale using a <quote>language
-        tag</quote> per BCP 47.  The second example uses the traditional
-        ICU-specific locale syntax.  The first style is preferred going
-        forward, and is used internally to store locales.
-       </para>
-       <para>
-        Note that you can name the collation objects in the SQL environment
-        anything you want.  In this example, we follow the naming style that
-        the predefined collations use, which in turn also follow BCP 47, but
-        that is not required for user-defined collations.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
-      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
-      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');</literal></term>
-      <listitem>
-       <para>
-        Root collation with Emoji collation type, per Unicode Technical Standard #51
-       </para>
-       <para>
-        Observe how in the traditional ICU locale naming system, the root
-        locale is selected by an empty string.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
-      <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
-      <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en@colReorder=grek-latn');</literal></term>
-      <listitem>
-       <para>
-        Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
-      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
-      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');</literal></term>
-      <listitem>
-       <para>
-        Sort upper-case letters before lower-case letters.  (The default is
-        lower-case letters first.)
-       </para>
-      </listitem>
-     </varlistentry>
-
-    <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
-      <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
-      <term><literal>CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=grek-latn');</literal></term>
-      <listitem>
-       <para>
-        Combines both of the above options.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kn-true">
-      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');</literal></term>
-      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');</literal></term>
-      <listitem>
-       <para>
-        Numeric ordering, sorts sequences of digits by their numeric value,
-        for example: <literal>A-21</literal> &lt; <literal>A-123</literal>
-        (also known as natural sort).
-       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>
-
-    See <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
-    Technical Standard #35</ulink>
-    and <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink> for
-    details.  The list of possible collation types (<literal>co</literal>
-    subtag) can be found in
-    the <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
-    repository</ulink>.
-   </para>
+    <para>
+     ICU collations can be created like:
 
-   <para>
-    Note that while this system allows creating collations that <quote>ignore
-    case</quote> or <quote>ignore accents</quote> or similar (using the
-    <literal>ks</literal> key), in order for such collations to act in a
-    truly case- or accent-insensitive manner, they also need to be declared as not
-    <firstterm>deterministic</firstterm> in <command>CREATE COLLATION</command>;
-    see <xref linkend="collation-nondeterministic"/>.
-    Otherwise, any strings that compare equal according to the collation but
-    are not byte-wise equal will be sorted according to their byte values.
-   </para>
+<programlisting>
+CREATE COLLATION german (provider = icu, locale = 'de-DE');
+</programlisting>
 
-   <note>
+     ICU locales are specified as a BCP 47 <link
+     linkend="icu-language-tag">Language Tag</link>, but can also accept most
+     libc-style locale names. If possible, libc-style locale names are
+     transformed into language tags.
+    </para>
     <para>
-     By design, ICU will accept almost any string as a locale name and match
-     it to the closest locale it can provide, using the fallback procedure
-     described in its documentation.  Thus, there will be no direct feedback
-     if a collation specification is composed using features that the given
-     ICU installation does not actually support.  It is therefore recommended
-     to create application-level test cases to check that the collation
-     definitions satisfy one's requirements.
+     New ICU collations can customize collation behavior extensively by
+     including collation attributes in the langugage tag. See <xref
+     linkend="icu-custom-collations"/> for details and examples.
     </para>
-   </note>
    </sect4>
-
    <sect4 id="collation-copy">
    <title>Copying Collations</title>
 
@@ -1072,6 +1098,418 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
     </tip>
    </sect3>
   </sect2>
+  <sect2 id="icu-custom-collations">
+   <title>ICU Custom Collations</title>
+
+   <para>
+    ICU allows extensive control over collation behavior by defining new
+    collations with collation settings as a part of the language tag. These
+    settings can modify the collation order to suit a variety of needs. For
+    instance:
+
+<programlisting>
+-- ignore differences in accents and case
+CREATE COLLATION ignore_accent_case (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'und-u-ks-level1');
+SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
+SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
+
+-- upper case letters sort before lower case.
+CREATE COLLATION upper_first (PROVIDER=icu, LOCALE = 'und-u-kf-upper');
+SELECT 'B' &lt; 'b' COLLATE upper_first; -- true
+
+-- treat digits numerically and ignore punctuation
+CREATE COLLATION num_ignore_punct (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'und-u-ka-shifted-kn');
+SELECT 'id-45' &lt; 'id-123' COLLATE num_ignore_punct; -- true
+SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
+</programlisting>
+
+    Many of the available options are described in <xref
+    linkend="icu-collation-settings"/>, or see <xref
+    linkend="icu-external-references"/> for more details.
+   </para>
+   <sect3 id="icu-collation-comparison-levels">
+    <title>ICU Comparison Levels</title>
+    <para>
+     Comparison of two strings (collation) in ICU is determined by a
+     multi-level process, where textual features are grouped into
+     "levels". Treatment of each level is controlled by the <link
+     linkend="icu-collation-settings-table">collation settings</link>. Higher
+     levels correspond to finer textual features.
+    </para>
+    <para>
+     <table id="icu-collation-levels">
+      <title>ICU Collation Levels</title>
+      <tgroup cols="3">
+       <thead>
+        <row>
+         <entry>Level</entry>
+         <entry>Description</entry>
+         <entry><literal>'f' = 'f'</literal></entry>
+         <entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
+         <entry><literal>'x-y' = 'x_y'</literal></entry>
+         <entry><literal>'g' = 'G'</literal></entry>
+         <entry><literal>'n' = 'ñ'</literal></entry>
+         <entry><literal>'y' = 'z'</literal></entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+         <entry>level1</entry>
+         <entry>Base Character</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level2</entry>
+         <entry>Accents</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level3</entry>
+         <entry>Case/Variants</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level4</entry>
+         <entry>Punctuation</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>identic</entry>
+         <entry>All</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+
+     The above table shows which textual feature differences are
+     considered significant when determining equality at the given level. The
+     unicode character <literal>U+2063</literal> is an invisible separator,
+     and as seen in the table, is ignored for at all levels of comparison less
+     than <literal>identic</literal>.
+    </para>
+    <para>
+     At every level, even with full normalization off, basic normalization is
+     performed. For example, <literal>'á'</literal> may be composed of the
+     code points <literal>U&amp;'\0061\0301'</literal> or the single code
+     point <literal>U&amp;'\00E1'</literal>, and those sequences will be
+     considered equal even at the <literal>identic</literal> level. To treat
+     any difference in code point representation as distinct, use a collation
+     created with <symbol>DETERMINISTIC</symbol> set to
+     <literal>false</literal>.
+    </para>
+    <para>
+     Examples:
+
+<programlisting>
+CREATE COLLATION level3 (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-level3');
+CREATE COLLATION level4 (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-level4');
+CREATE COLLATION identic (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-identic');
+
+-- invisible separator ignored at all levels except identic
+SELECT 'ab' = U&amp;'a\2063b' COLLATE level4; -- true
+SELECT 'ab' = U&amp;'a\2063b' COLLATE identic; -- false
+
+-- punctuation ignored at level3 but not at level 4
+SELECT 'x-y' = 'x_y' COLLATE level3; -- true
+SELECT 'x-y' = 'x_y' COLLATE level4; -- false
+</programlisting>
+
+    </para>
+    <note>
+     <para>
+      For many collation settings, you must create the collation with
+      <option>DETERMINISTIC</option> set to <literal>false</literal> for the
+      setting to have the desired effect. Additionally, some settings only
+      take effect when the key <literal>ka</literal> is set to
+      <literal>shifted</literal> (see <xref
+      linkend="icu-collation-settings-table"/>).
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="icu-collation-settings">
+    <title>Collation Settings for an ICU Locale</title>
+    <para>
+     <table id="icu-collation-settings-table">
+      <title>ICU Collation Settings</title>
+      <tgroup cols="4">
+       <thead>
+        <row>
+         <entry>Key</entry>
+         <entry>Values</entry>
+         <entry>Default</entry>
+         <entry>Description</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+         <entry><literal>co</literal></entry>
+         <entry><literal>emoji</literal>, <literal>phonebk</literal>, <literal>standard</literal>, <replaceable>...</replaceable></entry>
+         <entry><literal>standard</literal></entry>
+         <entry>
+          Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>ks</literal></entry>
+         <entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
+         <entry><literal>level3</literal></entry>
+         <entry>
+          Sensitivity (or "strength") when determining equality, with
+          <literal>level1</literal> the least sensitive to differences and
+          <literal>identic</literal> the most sensitive to differences. See
+          <xref linkend="icu-collation-levels"/> for details.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>ka</literal></entry>
+         <entry><literal>noignore</literal>, <literal>shifted</literal></entry>
+         <entry><literal>noignore</literal></entry>
+         <entry>
+          If set to <literal>shifted</literal>, causes some characters
+          (e.g. punctuation or space) to be ignored in comparison. Key
+          <literal>ks</literal> must be set to <literal>level3</literal> or
+          lower to take effect. Set key <literal>kv</literal> to control which
+          character classes are ignored.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kb</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          Backwards comparison for the level 2 differences. For example,
+          locale <literal>und-u-kb</literal> sorts <literal>'àe'</literal>
+          before <literal>'aé'</literal>.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kk</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Enable full normalization; may affect performance. Basic
+           normalization is performed even when set to
+           <literal>false</literal>. Locales for languages that require full
+           normalization typically enable it by default.
+          </para>
+          <para>
+           Full normalization is important in some cases, such as when
+           multiple accents are applied to a single character. For instance,
+           <literal>'ệ'</literal> can be composed of code points
+           <literal>U&amp;'\0065\0323\0302'</literal> or
+           <literal>U&amp;'\0065\0302\0323'</literal>. With full normalization
+           on, these code point sequences are treated as equal; otherwise they
+           are unequal.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kc</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Separates case into a "level 2.5" that falls between accents and
+           other level 3 features.
+          </para>
+          <para>
+           If set to <literal>true</literal> and <literal>ks</literal> is set
+           to <literal>level1</literal>, will ignore accents but take case
+           into account.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kf</literal></entry>
+         <entry>
+          <literal>upper</literal>, <literal>lower</literal>,
+          <literal>false</literal>
+         </entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>upper</literal>, upper case sorts before lower
+          case. If set to <literal>lower</literal>, lower case sorts before
+          upper case. If set to <literal>false</literal>, the sort depends on
+          the rules of the locale.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kn</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>true</literal>, numbers within a string are
+          treated as a single numeric value rather than a sequence of
+          digits. For example, <literal>'id-45'</literal> sorts before
+          <literal>'id-123'</literal>.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kr</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>,
+          <literal>digit</literal>, <replaceable>script-id</replaceable>
+         </entry>
+         <entry></entry>
+         <entry>
+          <para>
+           Set to one or more of the valid values, or any BCP 47
+           <replaceable>script-id</replaceable>, e.g. <literal>latn</literal>
+           ("Latin") or <literal>grek</literal> ("Greek"). Multiple values are
+           separated by "<literal>-</literal>".
+          </para>
+          <para>
+           Redefines the ordering of classes of characters; those characters
+           belonging to a class earlier in the list sort before characters
+           belonging to a class later in the list. For instance, the value
+           <literal>digit-currency-space</literal> (as part of a language tag
+           like <literal>und-u-kr-digit-currency-space</literal>) sorts
+           punctuation before digits and spaces.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kv</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>
+         </entry>
+         <entry><literal>punct</literal></entry>
+         <entry>
+          Classes of characters ignored during comparison at level 3. Setting
+          to a later value includes earlier values;
+          e.g. <literal>symbol</literal> also includes
+          <literal>punct</literal> and <literal>space</literal> in the
+          characters to be ignored. Key <literal>ka</literal> must be set to
+          <literal>shifted</literal> and key <literal>ks</literal> must be set
+          to <literal>level3</literal> or lower to take effect.
+         </entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+      Defaults may depend on locale. The above table is not meant to be
+      complete. See <xref linkend="icu-external-references"/> for additional
+      options and details.
+    </para>
+   </sect3>
+   <sect3 id="icu-locale-examples">
+    <title>Examples</title>
+    <para>
+     <variablelist>
+      <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
+       <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
+       <listitem>
+        <para>German collation with phone book collation type</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
+       <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
+       <listitem>
+        <para>
+         Root collation with Emoji collation type, per Unicode Technical Standard #51
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
+       <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
+       <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
+       <listitem>
+        <para>
+         Sort upper-case letters before lower-case letters.  (The default is
+         lower-case letters first.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
+       <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Combines both of the above options.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </para>
+   </sect3>
+   <sect3 id="icu-external-references">
+    <title>External References for ICU</title>
+    <para>
+     This section (<xref linkend="icu-custom-collations"/>) is only a brief
+     overview of ICU behavior and language tags. Refer to the following
+     documents for technical details, additional options, and new behavior:
+    </para>
+    <itemizedlist>
+     <listitem>
+      <para>
+       <ulink
+           url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
+       Technical Standard #35</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
+       repository</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/collation/api.html"></ulink>
+      </para>
+     </listitem>
+    </itemizedlist>
+   </sect3>
+  </sect2>
  </sect1>
 
  <sect1 id="multibyte">
-- 
2.34.1

#58

pgsql@j-davis.com

over 2 years ago

In reply to: Jeff Davis (#57)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Tue, 2023-05-16 at 20:23 -0700, Jeff Davis wrote:

Other than that, and I took your suggestions almost verbatim. Patch
attached. Thank you!

Attached new patch with a typo fix and a few other edits. I plan to
commit soon.

Regards,
Jeff Davis

Attachments:

0001-Doc-improvements-for-language-tags-and-custom-ICU-co.patchtext/x-patch; charset=UTF-8; name=0001-Doc-improvements-for-language-tags-and-custom-ICU-co.patchDownload

From d0d2375fa55618b60f361f6bb64b2c49490125b9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 27 Apr 2023 14:43:46 -0700
Subject: [PATCH] Doc improvements for language tags and custom ICU collations.

Separate the documentation for language tags themselves from the
available collation settings which can be included in a language tag.

Include tables of the available options, more details about the
effects of each option, and additional examples.

Also include an explanation of the "levels" of textual features and
how they relate to collation.

Discussion: https://postgr.es/m/25787ec7-4c04-9a8a-d241-4dc9be0b1ba3@postgresql.org
Reviewed-by: Jonathan S. Katz
---
 doc/src/sgml/charset.sgml | 683 +++++++++++++++++++++++++++++++-------
 1 file changed, 562 insertions(+), 121 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 6dd95b8966..6b9c323edd 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -377,7 +377,134 @@ initdb --locale-provider=icu --icu-locale=en
     variants and customization options.
    </para>
   </sect2>
+  <sect2 id="icu-locales">
+   <title>ICU Locales</title>
+   <sect3 id="icu-locale-names">
+    <title>ICU Locale Names</title>
+    <para>
+     The ICU format for the locale name is a <link
+     linkend="icu-language-tag">Language Tag</link>.
+
+<programlisting>
+CREATE COLLATION mycollation1 (PROVIDER = icu, LOCALE = 'ja-JP);
+CREATE COLLATION mycollation2 (PROVIDER = icu, LOCALE = 'fr');
+</programlisting>
+    </para>
+   </sect3>
+   <sect3 id="icu-canonicalization">
+    <title>Locale Canonicalization and Validation</title>
+    <para>
+     When defining a new ICU collation object or database with ICU as the
+     provider, the given locale name is transformed ("canonicalized") into a
+     language tag if not already in that form. For instance,
+
+<screen>
+CREATE COLLATION mycollation3 (PROVIDER = icu, LOCALE = 'en-US-u-kn-true');
+NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
+CREATE COLLATION mycollation4 (PROVIDER = icu, LOCALE = 'de_DE.utf8');
+NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"
+</screen>
+
+     If you see this notice, ensure that the <symbol>PROVIDER</symbol> and
+     <symbol>LOCALE</symbol> are the expected result. For consistent results
+     when using the ICU provider, specify the canonical <link
+     linkend="icu-language-tag">language tag</link> instead of relying on the
+     transformation.
+    </para>
+    <para>
+     A locale with no language name, or the special language name
+     <literal>root</literal>, is transformed to have the language
+     <literal>und</literal> ("undefined").
+    </para>
+    <para>
+     ICU can transform most libc locale names, as well as some other formats,
+     into language tags for easier transition to ICU. If a libc locale name is
+     used in ICU, it may not have precisely the same behavior as in libc.
+    </para>
+    <para>
+     If there is a problem interpreting the locale name, or if the locale name
+     represents a language or region that ICU does not recognize, you will see
+     the following warning:
+
+<screen>
+CREATE COLLATION nonsense (PROVIDER = icu, LOCALE = 'nonsense');
+WARNING:  ICU locale "nonsense" has unknown language "nonsense"
+HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION
+</screen>
+
+     <xref linkend="guc-icu-validation-level"/> controls how the message is
+     reported. Unless set to <literal>ERROR</literal>, the collation will
+     still be created, but the behavior may not be what the user intended.
+    </para>
+   </sect3>
+   <sect3 id="icu-language-tag">
+    <title>Language Tag</title>
+    <para>
+     A language tag, defined in BCP 47, is a standardized identifier used to
+     identify languages, regions, and other information about a locale.
+    </para>
+    <para>
+     Basic language tags are simply
+     <replaceable>language</replaceable><literal>-</literal><replaceable>region</replaceable>;
+     or even just <replaceable>language</replaceable>. The
+     <replaceable>language</replaceable> is a language code
+     (e.g. <literal>fr</literal> for French), and
+     <replaceable>region</replaceable> is a region code
+     (e.g. <literal>CA</literal> for Canada). Examples:
+     <literal>ja-JP</literal>, <literal>de</literal>, or
+     <literal>fr-CA</literal>.
+    </para>
+    <para>
+     Collation settings may be included in the language tag to customize
+     collation behavior. ICU allows extensive customization, such as
+     sensitivity (or insensitivity) to accents, case, and punctuation;
+     treatment of digits within text; and many other options to satisfy a
+     variety of uses.
+    </para>
+    <para>
+     To include this additional collation information in a language tag,
+     append <literal>-u</literal>, which indicates there are additional
+     collation settings, followed by one or more
+     <literal>-</literal><replaceable>key</replaceable><literal>-</literal><replaceable>value</replaceable>
+     pairs. The <replaceable>key</replaceable> is the key for a <link
+     linkend="icu-collation-settings">collation setting</link> and
+     <replaceable>value</replaceable> is a valid value for that setting. For
+     boolean settings, the <literal>-</literal><replaceable>key</replaceable>
+     may be specified without a corresponding
+     <literal>-</literal><replaceable>value</replaceable>, which implies a
+     value of <literal>true</literal>.
+    </para>
+    <para>
+     For example, the language tag <literal>en-US-u-kn-ks-level2</literal>
+     means the locale with the English language in the US region, with
+     collation settings <literal>kn</literal> set to <literal>true</literal>
+     and <literal>ks</literal> set to <literal>level2</literal>. Those
+     settings mean the collation will be case-insensitive and treat a sequence
+     of digits as a single number:
 
+<screen>
+CREATE COLLATION mycollation5 (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'en-US-u-kn-ks-level2');
+SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+
+SELECT 'N-45' &lt; 'N-123' COLLATE mycollation5 as result;
+ result
+--------
+ t
+(1 row)
+</screen>
+    </para>
+    <para>
+     See <xref linkend="icu-custom-collations"/> for details and additional
+     examples of using language tags with custom collation information for the
+     locale.
+    </para>
+   </sect3>
+  </sect2>
   <sect2 id="locale-problems">
    <title>Problems</title>
 
@@ -658,6 +785,13 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
     code byte values.
    </para>
 
+   <note>
+    <para>
+     The <literal>C</literal> and <literal>POSIX</literal> locales may behave
+     differently depending on the database encoding.
+    </para>
+   </note>
+
    <para>
     Additionally, two SQL standard collation names are available:
 
@@ -869,132 +1003,24 @@ CREATE COLLATION german (provider = libc, locale = 'de_DE');
    <sect4 id="collation-managing-create-icu">
     <title>ICU Collations</title>
 
-   <para>
-    ICU allows collations to be customized beyond the basic language+country
-    set that is preloaded by <command>initdb</command>.  Users are encouraged
-    to define their own collation objects that make use of these facilities to
-    suit the sorting behavior to their requirements.
-    See <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
-    and <ulink url="https://unicode-org.github.io/icu/userguide/collation/api.html"></ulink> for
-    information on ICU locale naming.  The set of acceptable names and
-    attributes depends on the particular ICU version.
-   </para>
-
-   <para>
-    Here are some examples:
-
-    <variablelist>
-     <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
-      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
-      <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');</literal></term>
-      <listitem>
-       <para>German collation with phone book collation type</para>
-       <para>
-        The first example selects the ICU locale using a <quote>language
-        tag</quote> per BCP 47.  The second example uses the traditional
-        ICU-specific locale syntax.  The first style is preferred going
-        forward, and is used internally to store locales.
-       </para>
-       <para>
-        Note that you can name the collation objects in the SQL environment
-        anything you want.  In this example, we follow the naming style that
-        the predefined collations use, which in turn also follow BCP 47, but
-        that is not required for user-defined collations.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
-      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
-      <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');</literal></term>
-      <listitem>
-       <para>
-        Root collation with Emoji collation type, per Unicode Technical Standard #51
-       </para>
-       <para>
-        Observe how in the traditional ICU locale naming system, the root
-        locale is selected by an empty string.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
-      <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
-      <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en@colReorder=grek-latn');</literal></term>
-      <listitem>
-       <para>
-        Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
-      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
-      <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');</literal></term>
-      <listitem>
-       <para>
-        Sort upper-case letters before lower-case letters.  (The default is
-        lower-case letters first.)
-       </para>
-      </listitem>
-     </varlistentry>
-
-    <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
-      <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
-      <term><literal>CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=grek-latn');</literal></term>
-      <listitem>
-       <para>
-        Combines both of the above options.
-       </para>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry id="collation-managing-create-icu-en-u-kn-true">
-      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');</literal></term>
-      <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');</literal></term>
-      <listitem>
-       <para>
-        Numeric ordering, sorts sequences of digits by their numeric value,
-        for example: <literal>A-21</literal> &lt; <literal>A-123</literal>
-        (also known as natural sort).
-       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>
-
-    See <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
-    Technical Standard #35</ulink>
-    and <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink> for
-    details.  The list of possible collation types (<literal>co</literal>
-    subtag) can be found in
-    the <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
-    repository</ulink>.
-   </para>
+    <para>
+     ICU collations can be created like:
 
-   <para>
-    Note that while this system allows creating collations that <quote>ignore
-    case</quote> or <quote>ignore accents</quote> or similar (using the
-    <literal>ks</literal> key), in order for such collations to act in a
-    truly case- or accent-insensitive manner, they also need to be declared as not
-    <firstterm>deterministic</firstterm> in <command>CREATE COLLATION</command>;
-    see <xref linkend="collation-nondeterministic"/>.
-    Otherwise, any strings that compare equal according to the collation but
-    are not byte-wise equal will be sorted according to their byte values.
-   </para>
+<programlisting>
+CREATE COLLATION german (provider = icu, locale = 'de-DE');
+</programlisting>
 
-   <note>
+     ICU locales are specified as a BCP 47 <link
+     linkend="icu-language-tag">Language Tag</link>, but can also accept most
+     libc-style locale names. If possible, libc-style locale names are
+     transformed into language tags.
+    </para>
     <para>
-     By design, ICU will accept almost any string as a locale name and match
-     it to the closest locale it can provide, using the fallback procedure
-     described in its documentation.  Thus, there will be no direct feedback
-     if a collation specification is composed using features that the given
-     ICU installation does not actually support.  It is therefore recommended
-     to create application-level test cases to check that the collation
-     definitions satisfy one's requirements.
+     New ICU collations can customize collation behavior extensively by
+     including collation attributes in the langugage tag. See <xref
+     linkend="icu-custom-collations"/> for details and examples.
     </para>
-   </note>
    </sect4>
-
    <sect4 id="collation-copy">
    <title>Copying Collations</title>
 
@@ -1072,6 +1098,421 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
     </tip>
    </sect3>
   </sect2>
+  <sect2 id="icu-custom-collations">
+   <title>ICU Custom Collations</title>
+
+   <para>
+    ICU allows extensive control over collation behavior by defining new
+    collations with collation settings as a part of the language tag. These
+    settings can modify the collation order to suit a variety of needs. For
+    instance:
+
+<programlisting>
+-- ignore differences in accents and case
+CREATE COLLATION ignore_accent_case (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'und-u-ks-level1');
+SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
+SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
+
+-- upper case letters sort before lower case.
+CREATE COLLATION upper_first (PROVIDER=icu, LOCALE = 'und-u-kf-upper');
+SELECT 'B' &lt; 'b' COLLATE upper_first; -- true
+
+-- treat digits numerically and ignore punctuation
+CREATE COLLATION num_ignore_punct (PROVIDER = icu, DETERMINISTIC = false, LOCALE = 'und-u-ka-shifted-kn');
+SELECT 'id-45' &lt; 'id-123' COLLATE num_ignore_punct; -- true
+SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
+</programlisting>
+
+    Many of the available options are described in <xref
+    linkend="icu-collation-settings"/>, or see <xref
+    linkend="icu-external-references"/> for more details.
+   </para>
+   <sect3 id="icu-collation-comparison-levels">
+    <title>ICU Comparison Levels</title>
+    <para>
+     Comparison of two strings (collation) in ICU is determined by a
+     multi-level process, where textual features are grouped into
+     "levels". Treatment of each level is controlled by the <link
+     linkend="icu-collation-settings-table">collation settings</link>. Higher
+     levels correspond to finer textual features.
+    </para>
+    <para>
+     <table id="icu-collation-levels">
+      <title>ICU Collation Levels</title>
+      <tgroup cols="3">
+       <thead>
+        <row>
+         <entry>Level</entry>
+         <entry>Description</entry>
+         <entry><literal>'f' = 'f'</literal></entry>
+         <entry><literal>'ab' = U&amp;'a\2063b'</literal></entry>
+         <entry><literal>'x-y' = 'x_y'</literal></entry>
+         <entry><literal>'g' = 'G'</literal></entry>
+         <entry><literal>'n' = 'ñ'</literal></entry>
+         <entry><literal>'y' = 'z'</literal></entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+         <entry>level1</entry>
+         <entry>Base Character</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level2</entry>
+         <entry>Accents</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level3</entry>
+         <entry>Case/Variants</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>level4</entry>
+         <entry>Punctuation</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+        <row>
+         <entry>identic</entry>
+         <entry>All</entry>
+         <entry><literal>true</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+
+     The above table shows which textual feature differences are
+     considered significant when determining equality at the given level. The
+     unicode character <literal>U+2063</literal> is an invisible separator,
+     and as seen in the table, is ignored for at all levels of comparison less
+     than <literal>identic</literal>.
+    </para>
+    <para>
+     At every level, even with full normalization off, basic normalization is
+     performed. For example, <literal>'á'</literal> may be composed of the
+     code points <literal>U&amp;'\0061\0301'</literal> or the single code
+     point <literal>U&amp;'\00E1'</literal>, and those sequences will be
+     considered equal even at the <literal>identic</literal> level. To treat
+     any difference in code point representation as distinct, use a collation
+     created with <symbol>DETERMINISTIC</symbol> set to
+     <literal>true</literal>.
+    </para>
+    <sect4 id="icu-collation-level-examples">
+     <title>Collation Level Examples</title>
+     <para>
+
+<programlisting>
+CREATE COLLATION level3 (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-level3');
+CREATE COLLATION level4 (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-level4');
+CREATE COLLATION identic (PROVIDER=icu, DETERMINISTIC=false, LOCALE='und-u-ka-shifted-ks-identic');
+
+-- invisible separator ignored at all levels except identic
+SELECT 'ab' = U&amp;'a\2063b' COLLATE level4; -- true
+SELECT 'ab' = U&amp;'a\2063b' COLLATE identic; -- false
+
+-- punctuation ignored at level3 but not at level 4
+SELECT 'x-y' = 'x_y' COLLATE level3; -- true
+SELECT 'x-y' = 'x_y' COLLATE level4; -- false
+</programlisting>
+
+     </para>
+    </sect4>
+   </sect3>
+   <sect3 id="icu-collation-settings">
+    <title>Collation Settings for an ICU Locale</title>
+    <para>
+     <table id="icu-collation-settings-table">
+      <title>ICU Collation Settings</title>
+      <tgroup cols="4">
+       <thead>
+        <row>
+         <entry>Key</entry>
+         <entry>Values</entry>
+         <entry>Default</entry>
+         <entry>Description</entry>
+        </row>
+       </thead>
+       <tbody>
+        <row>
+         <entry><literal>co</literal></entry>
+         <entry><literal>emoji</literal>, <literal>phonebk</literal>, <literal>standard</literal>, <replaceable>...</replaceable></entry>
+         <entry><literal>standard</literal></entry>
+         <entry>
+          Collation type. See <xref linkend="icu-external-references"/> for additional options and details.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>ks</literal></entry>
+         <entry><literal>level1</literal>, <literal>level2</literal>, <literal>level3</literal>, <literal>level4</literal>, <literal>identic</literal></entry>
+         <entry><literal>level3</literal></entry>
+         <entry>
+          Sensitivity (or "strength") when determining equality, with
+          <literal>level1</literal> the least sensitive to differences and
+          <literal>identic</literal> the most sensitive to differences. See
+          <xref linkend="icu-collation-levels"/> for details.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>ka</literal></entry>
+         <entry><literal>noignore</literal>, <literal>shifted</literal></entry>
+         <entry><literal>noignore</literal></entry>
+         <entry>
+          If set to <literal>shifted</literal>, causes some characters
+          (e.g. punctuation or space) to be ignored in comparison. Key
+          <literal>ks</literal> must be set to <literal>level3</literal> or
+          lower to take effect. Set key <literal>kv</literal> to control which
+          character classes are ignored.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kb</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          Backwards comparison for the level 2 differences. For example,
+          locale <literal>und-u-kb</literal> sorts <literal>'àe'</literal>
+          before <literal>'aé'</literal>.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kk</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Enable full normalization; may affect performance. Basic
+           normalization is performed even when set to
+           <literal>false</literal>. Locales for languages that require full
+           normalization typically enable it by default.
+          </para>
+          <para>
+           Full normalization is important in some cases, such as when
+           multiple accents are applied to a single character. For instance,
+           <literal>'ệ'</literal> can be composed of code points
+           <literal>U&amp;'\0065\0323\0302'</literal> or
+           <literal>U&amp;'\0065\0302\0323'</literal>. With full normalization
+           on, these code point sequences are treated as equal; otherwise they
+           are unequal.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kc</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          <para>
+           Separates case into a "level 2.5" that falls between accents and
+           other level 3 features.
+          </para>
+          <para>
+           If set to <literal>true</literal> and <literal>ks</literal> is set
+           to <literal>level1</literal>, will ignore accents but take case
+           into account.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kf</literal></entry>
+         <entry>
+          <literal>upper</literal>, <literal>lower</literal>,
+          <literal>false</literal>
+         </entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>upper</literal>, upper case sorts before lower
+          case. If set to <literal>lower</literal>, lower case sorts before
+          upper case. If set to <literal>false</literal>, the sort depends on
+          the rules of the locale.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kn</literal></entry>
+         <entry><literal>true</literal>, <literal>false</literal></entry>
+         <entry><literal>false</literal></entry>
+         <entry>
+          If set to <literal>true</literal>, numbers within a string are
+          treated as a single numeric value rather than a sequence of
+          digits. For example, <literal>'id-45'</literal> sorts before
+          <literal>'id-123'</literal>.
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kr</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>,
+          <literal>digit</literal>, <replaceable>script-id</replaceable>
+         </entry>
+         <entry></entry>
+         <entry>
+          <para>
+           Set to one or more of the valid values, or any BCP 47
+           <replaceable>script-id</replaceable>, e.g. <literal>latn</literal>
+           ("Latin") or <literal>grek</literal> ("Greek"). Multiple values are
+           separated by "<literal>-</literal>".
+          </para>
+          <para>
+           Redefines the ordering of classes of characters; those characters
+           belonging to a class earlier in the list sort before characters
+           belonging to a class later in the list. For instance, the value
+           <literal>digit-currency-space</literal> (as part of a language tag
+           like <literal>und-u-kr-digit-currency-space</literal>) sorts
+           punctuation before digits and spaces.
+          </para>
+         </entry>
+        </row>
+        <row>
+         <entry><literal>kv</literal></entry>
+         <entry>
+          <literal>space</literal>, <literal>punct</literal>,
+          <literal>symbol</literal>, <literal>currency</literal>
+         </entry>
+         <entry><literal>punct</literal></entry>
+         <entry>
+          Classes of characters ignored during comparison at level 3. Setting
+          to a later value includes earlier values;
+          e.g. <literal>symbol</literal> also includes
+          <literal>punct</literal> and <literal>space</literal> in the
+          characters to be ignored. Key <literal>ka</literal> must be set to
+          <literal>shifted</literal> and key <literal>ks</literal> must be set
+          to <literal>level3</literal> or lower to take effect.
+         </entry>
+        </row>
+       </tbody>
+      </tgroup>
+     </table>
+      Defaults may depend on locale. The above table is not meant to be
+      complete. See <xref linkend="icu-external-references"/> for additional
+      options and details.
+    </para>
+    <note>
+     <para>
+      For many collation settings, you must create the collation with
+      <option>DETERMINISTIC</option> set to <literal>false</literal> for the
+      setting to have the desired effect (see <xref
+      linkend="collation-nondeterministic"/>). Additionally, some settings
+      only take effect when the key <literal>ka</literal> is set to
+      <literal>shifted</literal> (see <xref
+      linkend="icu-collation-settings-table"/>).
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="icu-locale-examples">
+    <title>Examples</title>
+    <para>
+     <variablelist>
+      <varlistentry id="collation-managing-create-icu-de-u-co-phonebk-x-icu">
+       <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term>
+       <listitem>
+        <para>German collation with phone book collation type</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-und-u-co-emoji-x-icu">
+       <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term>
+       <listitem>
+        <para>
+         Root collation with Emoji collation type, per Unicode Technical Standard #51
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kr-grek-latn">
+       <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Sort Greek letters before Latin ones.  (The default is Latin before Greek.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper">
+       <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term>
+       <listitem>
+        <para>
+         Sort upper-case letters before lower-case letters.  (The default is
+         lower-case letters first.)
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="collation-managing-create-icu-en-u-kf-upper-kr-grek-latn">
+       <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term>
+       <listitem>
+        <para>
+         Combines both of the above options.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </para>
+   </sect3>
+   <sect3 id="icu-external-references">
+    <title>External References for ICU</title>
+    <para>
+     This section (<xref linkend="icu-custom-collations"/>) is only a brief
+     overview of ICU behavior and language tags. Refer to the following
+     documents for technical details, additional options, and new behavior:
+    </para>
+    <itemizedlist>
+     <listitem>
+      <para>
+       <ulink
+           url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode
+       Technical Standard #35</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR
+       repository</ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <ulink url="https://unicode-org.github.io/icu/userguide/collation/api.html"></ulink>
+      </para>
+     </listitem>
+    </itemizedlist>
+   </sect3>
+  </sect2>
  </sect1>
 
  <sect1 id="multibyte">
-- 
2.34.1

#59

jkatz@postgresql.org

over 2 years ago

In reply to: Jeff Davis (#58)

Re: Order changes in PG16 since ICU introduction

On 5/17/23 6:59 PM, Jeff Davis wrote:

On Tue, 2023-05-16 at 20:23 -0700, Jeff Davis wrote:

Other than that, and I took your suggestions almost verbatim. Patch
attached. Thank you!

Attached new patch with a typo fix and a few other edits. I plan to
commit soon.

I did a quicker read through this time. LGTM overall. I like what you
did with the explanations around sensitivity (now it makes sense).

Thanks,

Jonathan

#60

pgsql@j-davis.com

over 2 years ago

In reply to: Jonathan S. Katz (#59)

Re: Order changes in PG16 since ICU introduction

On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote:

I did a quicker read through this time. LGTM overall. I like what you
did with the explanations around sensitivity (now it makes sense).

Committed, thank you.

There are a few things I don't understand that would be good to
document better:

* Rules. I still don't quite understand the use case: are these for
people inventing new languages? What is a plausible use case that isn't
covered by the existing locales and collation settings? Do rules make
sense for a database default collation? Are they for language experts
only or might an ordinary developer benefit from using them?

* The collation types "phonebk", "emoji", etc.: are these variants of
particular locales, or do they make sense in multiple locales? I don't
know where they fit in or how to document them.

* I don't understand what "kc" means if "ks" is not set to "level1".

Regards,
Jeff Davis

#61

[0]: /messages/by-id/CAEze2WiZFQyyb-DcKwayUmE4rY42Bo6kuK9nBjvqRHYxUYJ-DA@mail.gmail.com

jkatz@postgresql.org

over 2 years ago

In reply to: Jeff Davis (#60)

Re: Order changes in PG16 since ICU introduction

On 5/18/23 1:55 PM, Jeff Davis wrote:

On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote:

I did a quicker read through this time. LGTM overall. I like what you
did with the explanations around sensitivity (now it makes sense).

Committed, thank you.

\o/

There are a few things I don't understand that would be good to
document better:

* Rules. I still don't quite understand the use case: are these for
people inventing new languages? What is a plausible use case that isn't
covered by the existing locales and collation settings? Do rules make
sense for a database default collation? Are they for language experts
only or might an ordinary developer benefit from using them?

From my read of them, as an app developer I'd be very unlikely to use
this. Maybe there is something with building out some collation rules
vis-a-vis an extension, but I have trouble imagining the use-case. I may
also not be the target audience for this feature.

* The collation types "phonebk", "emoji", etc.: are these variants of
particular locales, or do they make sense in multiple locales? I don't
know where they fit in or how to document them.

I remember I had a exploratory use case for "phonebk" but I couldn't
figure out how to get it to work. AIUI from random searching, the idea
is that it provides the "phonebook" rules for ordering "names" in a
particular locale, but I couldn't get it to work.

* I don't understand what "kc" means if "ks" is not set to "level1".

Me neither, but I haven't stared at this as hard as others.

Thanks,

Jonathan

#62

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Jeff Davis (#24)

Re: Order changes in PG16 since ICU introduction

On Fri, 21 Apr 2023 at 22:46, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2023-04-21 at 19:00 +0100, Andrew Gierth wrote:

Also, somewhere along the line someone broke initdb --no-locale,
which
should result in C locale being the default everywhere, but when I
just
tested it it picked 'en' for an ICU locale, which is not the right
thing.

Fixed, thank you.

As I complain about in [0]/messages/by-id/CAEze2WiZFQyyb-DcKwayUmE4rY42Bo6kuK9nBjvqRHYxUYJ-DA@mail.gmail.com, since 5cd1a5af --no-locale has been broken
/ bahiving outside it's description: Instead of being equivalent to
`--locale=C` it now also overrides `--locale-provider=libc`, resulting
in undocumented behaviour.

Kind regards,

Matthias van de Meent
Neon, Inc.

#63

pgsql@j-davis.com

over 2 years ago

In reply to: Jonathan S. Katz (#61)

Re: Order changes in PG16 since ICU introduction

On Thu, 2023-05-18 at 13:58 -0400, Jonathan S. Katz wrote:

From my read of them, as an app developer I'd be very unlikely to
use
this. Maybe there is something with building out some collation rules
vis-a-vis an extension, but I have trouble imagining the use-case. I
may
also not be the target audience for this feature.

That's a problem for the ICU rules feature. I understand some features
may be for domain experts only, but we at least need to call that out
so that ordinary developers don't get confused. And we should hear from
some of those domain experts that they actually want it and it solves a
real problem.

For the features that can be described with collation
settings/attributes right in the locale name, the use cases are more
plausible and we've supported them since v10, so it's good to document
them as best we can. It's hard to expose only the particular ICU
collation settings we understand best (e.g. the "ks" setting that
allows case insensitive collation), so it's inevitable that there will
be some settings that are more obscure and harder to document.

But in the case of ICU rules, they are newly-supported in 16, so there
should be a clear reason we're adding them. Otherwise we're just
setting up users for confusion or problems, and creating backwards-
compatibility headaches for ourselves (and the last thing we want is to
fret over backwards compatibility for a feature with no users).

Beyond that, there seems to be some danger: if the syntax for rules is
not perfectly compatible between ICU versions, the user might run into
big problems.

Regards,
Jeff Davis

#64

pgsql@j-davis.com

over 2 years ago

In reply to: Matthias van de Meent (#62)

5 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Thu, 2023-05-18 at 20:11 +0200, Matthias van de Meent wrote:

As I complain about in [0], since 5cd1a5af --no-locale has been
broken
/ bahiving outside it's description: Instead of being equivalent to
`--locale=C` it now also overrides `--locale-provider=libc`,
resulting
in undocumented behaviour.

I agree that 5cd1a5af is incomplete.

Posting updated patches. Feedback on the approaches below would be
appreciated.

For context, in version 15:

$ initdb -D data --locale-provider=icu --icu-locale=en
=> create database clocale template template0 locale='C';
=> select datname, datlocprovider, daticulocale
from pg_database where datname='clocale';
datname | datlocprovider | daticulocale
---------+----------------+--------------
clocale | i | en
(1 row)

That behavior is confusing, and when I made ICU the default provider in
v16, the confusion was extended into more cases.

If we leave the CREATE DATABASE (and createdb and initdb) syntax in
place, such that LOCALE (and --locale) do not apply to ICU at all, then
I don't see a path to a good ICU user experience.

Therefore I conclude that we need LOCALE (and --locale) to apply to ICU
somehow. (The LOCALE option already applies to ICU during CREATE
COLLATION, just not CREATE DATABASE or initdb.)

Patch 0003 does this. It's fairly straightforward and I believe we need
this patch.

But to actually fix your complaint we also need --no-locale to be
equivalent to --locale=C and for those options to both use memcmp()
semantics. There are several approaches to accomplish this, and I think
this is the part where I most need some feedback. There are only so
many approaches, and each one has some potential downsides, but I
believe we need to select one:

(1) Give up and leave the existing CREATE DATABASE (and createdb, and
initdb) semantics in place, along with the confusing behavior in v15.

This is a last resort, in my opinion. It gives us no path toward a good
user experience with ICU, and leaves us with all of the problems of the
OS as a collation provider.

(2) Automatically change the provider to libc when locale=C.

Almost works, but it's not clear how we handle the case "provider=icu
lc_collate='fr_FR.utf8' locale=C".

If we change it to "provider=libc lc_collate=C", we've overridden the
specified lc_collate. If we ignore the locale=C, that would be
surprising to users. If we throw an error, that would be a backwards
compatibility issue.

One possible solution would be to change the catalog representation to
allow setting the default collation locale separately from datcollate
even for the libc provider. For instance, rename daticulocale to
datdeflocale, and store the default collation locale there for both
libc and ICU. Then, "provider=icu lc_collate='fr_FR.utf8' locale=C"
could be changed into "provider=libc lc_collate='fr_FR.utf8'
deflocale=C". It may be confusing that datcollate is a different
concept from datdeflocale; but then again they are different concepts
and it's confusing that they are currently combined into one.

(3) Support iculocale=C in the ICU provider using the memcmp() path.

In other words, if provider=icu and iculocale=C, lc_collate_is_c() and
lc_ctpye_is_c() would both return true.

There's a potential problem for users who've misused ICU in the past
(15 or earlier) by using provider=icu and iculocale=C. ICU would accept
such a locale name, but not recognize it and fall back to the root
locale, so it never worked as the user intended it. But if we redefine
C to be memcmp(), then such users will have broken indexes if they
upgrade.

We could add a check at pg_upgrade time for iculocale=C in versions 15
and earlier, and cause the check (and therefore the upgrade) to fail.
That may be reasonable considering that it never really worked in the
past, and perhaps very few users actually ever created such a
collation. But if some user runs into that problem, we'd have to resort
to a hack like telling them to "update pg_collation set iculocale='und'
where iculocale='C'" and then try the upgrade again, which is not a
great answer (as far as I can tell it would be a correct answer and
should not break their indexes, but it feels pretty dangerous).

There may be some other resolutions to this problem, such as catalog
hacks that allow for different representations of iculocale=C pre-16
and post-16. That doesn't sound great though, and we'd have to figure
out what to do with pg_dump.

(4) Create a new "none" provider (which has no locale and always memcmp
semantics), and automatically change the provider to "none" if
provider=icu and iculocale=C.

This solves the problem case in #2 and the potential upgrade problem in
#3. It also makes the documentation a bit more natural, in my opinion,
even if we retain the special case for provider=libc collate=C.

#4 is the approach I chose (patches 0001 and 0002), but I'd like to
hear what others think.

For historical reasons, users may assume that LC_COLLATE controls the
default collation order because that's true in libc. And if their
provider is ICU, they may be surprised that it doesn't. I believe we
could extend each of the above approaches to use LC_COLLATE as the
default for ICU_LOCALE if the former is specified and the latter is
not, and that may make things smoother.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

v6-0002-ICU-for-locale-C-automatically-use-none-provider-.patchtext/x-patch; charset=UTF-8; name=v6-0002-ICU-for-locale-C-automatically-use-none-provider-.patchDownload

From dc7200153a9ac65c2518b32b789d1a9dc4454850 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 8 May 2023 13:48:01 -0700
Subject: [PATCH v6 2/5] ICU: for locale "C", automatically use "none" provider
 instead.

Postgres expects locale C to be optimizable to simple locale-unaware
byte operations; while ICU does not recognize the locale "C" at all,
and falls back to the root locale.

If the user specifies locale "C" when creating a new collation or a
new database with the ICU provider, automatically switch it to the
"none" provider.

If provider is libc, behavior is unchanged.
---
 doc/src/sgml/charset.sgml                     |  6 +++
 doc/src/sgml/ref/create_collation.sgml        |  6 +++
 doc/src/sgml/ref/create_database.sgml         |  5 +++
 doc/src/sgml/ref/createdb.sgml                |  5 +++
 doc/src/sgml/ref/initdb.sgml                  |  5 +++
 src/backend/commands/collationcmds.c          | 17 ++++++++
 src/backend/commands/dbcommands.c             | 21 ++++++++++
 src/bin/initdb/initdb.c                       | 10 +++++
 src/bin/initdb/t/001_initdb.pl                | 39 +++++++++++++++++++
 src/bin/scripts/createdb.c                    | 11 ++++++
 src/bin/scripts/t/020_createdb.pl             | 12 ++++++
 .../regress/expected/collate.icu.utf8.out     | 14 +++++--
 src/test/regress/sql/collate.icu.utf8.sql     |  6 +++
 13 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 7a791a2b7c..68bad646e9 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -405,6 +405,12 @@ initdb --locale-provider=icu --icu-locale=en
      change in results. <literal>LC_COLLATE</literal> and
      <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
     </para>
+    <para>
+     The ICU provider does not accept the <literal>C</literal>
+     locale. Commands that create collations or database with the
+     <literal>icu</literal> provider and ICU locale <literal>C</literal> use
+     the provider <literal>none</literal> instead.
+    </para>
     <note>
      <para>
       For the ICU provider, results may depend on the version of the ICU
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 5489ae7413..1ac41831d8 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -126,6 +126,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
        <literal>libc</literal> is the default.  See <xref
        linkend="locale-providers"/> for details.
       </para>
+      <para>
+       If the provider is <literal>icu</literal> and the locale is
+       <literal>C</literal> or <literal>POSIX</literal>, the provider is
+       automatically set to <literal>none</literal>; as the ICU provider
+       doesn't support an ICU locale of <literal>C</literal>.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 60b9da0952..c730d02e15 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -190,6 +190,11 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
        <para>
         Specifies the ICU locale ID if the ICU locale provider is used.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 326a371d34..7c573e848a 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -154,6 +154,11 @@ PostgreSQL documentation
         Specifies the ICU locale ID to be used in this database, if the
         ICU locale provider is selected.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index e604ab48b7..76993acdfe 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -250,6 +250,11 @@ PostgreSQL documentation
         Specifies the ICU locale when the ICU provider is used. Locale support
         is described in <xref linkend="locale"/>.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
        <para>
         If this option is not specified, the locale is inherited from the
         environment in which <command>initdb</command> runs. The environment's
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index aeaf6c419e..8bc6f8347d 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -254,6 +254,23 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		if (lcctypeEl)
 			collctype = defGetString(lcctypeEl);
 
+		/*
+		 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+		 * optimizable to byte operations (memcmp(), pg_ascii_tolower(),
+		 * etc.); transform into the "none" provider. Don't transform during
+		 * binary upgrade.
+		 */
+		if (!IsBinaryUpgrade && collprovider == COLLPROVIDER_ICU &&
+			colliculocale && (pg_strcasecmp(colliculocale, "C") == 0 ||
+							  pg_strcasecmp(colliculocale, "POSIX") == 0))
+		{
+			ereport(NOTICE,
+					(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+							colliculocale)));
+			colliculocale = NULL;
+			collprovider = COLLPROVIDER_NONE;
+		}
+
 		if (collprovider == COLLPROVIDER_LIBC)
 		{
 			if (!collcollate)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 9e73f54803..6dc737aebb 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1043,6 +1043,27 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
 
+	/*
+	 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+	 * optimizable to byte operations (memcmp(), pg_ascii_tolower(), etc.);
+	 * transform into the "none" provider.
+	 *
+	 * Don't transform during binary upgrade or when both the provider and ICU
+	 * locale are unchanged from the template.
+	 */
+	if (!IsBinaryUpgrade && dblocprovider == COLLPROVIDER_ICU &&
+		(src_locprovider != COLLPROVIDER_ICU ||
+		 strcmp(dbiculocale, src_iculocale) != 0) &&
+		dbiculocale && (pg_strcasecmp(dbiculocale, "C") == 0 ||
+						pg_strcasecmp(dbiculocale, "POSIX") == 0))
+	{
+		ereport(NOTICE,
+				(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+						dbiculocale)));
+		dbiculocale = NULL;
+		dblocprovider = COLLPROVIDER_NONE;
+	}
+
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
 		if (!(is_encoding_supported_by_icu(encoding)))
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4a6cad3cb9..e5ec2a243e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2440,6 +2440,16 @@ setlocales(void)
 			lc_messages = locale;
 	}
 
+	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = COLLPROVIDER_NONE;
+	}
+
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index fe6d224e5b..ea92b08511 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -111,6 +111,45 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'option --icu-locale');
 
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			"$tempdir/data4a"
+		],
+		'option --icu-locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--locale=C',
+			"$tempdir/data4b"
+		],
+		'option --icu-locale=C --locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-collate=C',
+			"$tempdir/data4c"
+		],
+		'option --icu-locale=C --lc-collate=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-ctype=C',
+			"$tempdir/data4d"
+		],
+		'option --icu-locale=C --lc-ctype=C');
+
 	command_fails_like(
 		[
 			'initdb',                '--no-sync',
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 79367d933b..9caf9190cf 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -172,6 +172,17 @@ main(int argc, char *argv[])
 			lc_collate = locale;
 	}
 
+	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
+		icu_locale &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = "none";
+	}
+
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 5aa658b671..eb3682f0fd 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -75,6 +75,18 @@ if ($ENV{with_icu} eq 'yes')
 	$node2->command_ok(
 		[ 'createdb', '-T', 'template0', '--icu-locale', 'en-US', 'foobar56' ],
 		'create database with icu locale from template database with icu provider');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  'test_none_icu1' ],
+		'create database with provider "icu" and ICU_LOCALE="C"');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  '--lc-ctype=C', 'test_none_icu_2' ],
+		'create database with provider "icu" and ICU_LOCALE="C" and LC_CTYPE=C');
 }
 else
 {
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index c658ee1404..7c186e9f69 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1035,6 +1035,9 @@ BEGIN
 END
 $$;
 RESET client_min_messages;
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+NOTICE:  using locale provider "none" for ICU locale "C"
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
@@ -1058,8 +1061,11 @@ SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
  test0
  test1
  test5
-(3 rows)
+ testc
+(4 rows)
 
+DROP COLLATION test1;
+CREATE COLLATION test1 (provider = icu, locale = 'und');
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
 ERROR:  collation "test11" already exists in schema "collate_tests"
@@ -1079,7 +1085,8 @@ SELECT collname, nspname, obj_description(pg_collation.oid, 'pg_collation')
  test0    | collate_tests | US English
  test11   | test_schema   | 
  test5    | collate_tests | 
-(3 rows)
+ testc    | collate_tests | 
+(4 rows)
 
 DROP COLLATION test0, test_schema.test11, test5;
 DROP COLLATION test0; -- fail
@@ -1089,7 +1096,8 @@ NOTICE:  collation "test0" does not exist, skipping
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%';
  collname 
 ----------
-(0 rows)
+ testc
+(1 row)
 
 DROP SCHEMA test_schema;
 DROP ROLE regress_test_role;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 7bd0901281..e59200df9a 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -375,6 +375,9 @@ $$;
 
 RESET client_min_messages;
 
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
@@ -388,6 +391,9 @@ CREATE COLLATION test5 FROM test0;
 
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
 
+DROP COLLATION test1;
+CREATE COLLATION test1 (provider = icu, locale = 'und');
+
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
 ALTER COLLATION test1 RENAME TO test22; -- fail
-- 
2.34.1

v6-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v6-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From c04053021eaa6db480143393a7de83525a8f4f7e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v6 3/5] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 +++--
 doc/src/sgml/ref/createdb.sgml                |  5 +++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++---
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++++----
 src/bin/initdb/initdb.c                       | 11 ++++++---
 src/bin/scripts/createdb.c                    | 13 ++++-------
 src/bin/scripts/t/020_createdb.pl             |  4 ++--
 src/test/icu/t/010_database.pl                | 23 ++++++++++++-------
 .../regress/expected/collate.icu.utf8.out     | 22 +++++++++---------
 10 files changed, 65 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index c730d02e15..dc57ba0c8b 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 7c573e848a..7991153ecc 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 76993acdfe..d9ef21c422 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8bc6f8347d..21615746f9 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -302,7 +302,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				if (langtag && strcmp(colliculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, colliculocale)));
 
 					colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6dc737aebb..154f20573c 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1019,7 +1019,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1033,12 +1038,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1094,7 +1101,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			if (langtag && strcmp(dbiculocale, langtag) != 0)
 			{
 				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
+						(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 								langtag, dbiculocale)));
 
 				dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index e5ec2a243e..f0827154cd 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2164,7 +2164,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2406,7 +2410,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale_provider == COLLPROVIDER_NONE)
 	{
@@ -2438,6 +2442,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
@@ -3331,7 +3337,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 9caf9190cf..51c4bb3592 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
 		icu_locale &&
 		(pg_strcasecmp(icu_locale, "C") == 0 ||
@@ -230,6 +222,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index eb3682f0fd..81a9931c09 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -167,7 +167,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -175,7 +175,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 7c186e9f69..cf1852c89d 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1202,9 +1202,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1212,7 +1212,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1229,12 +1229,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1243,7 +1243,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1271,13 +1271,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1327,9 +1327,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

v6-0004-Add-default_collation_provider-GUC.patchtext/x-patch; charset=UTF-8; name=v6-0004-Add-default_collation_provider-GUC.patchDownload

From 2a857a2cb080dbc015c59b89acbb195ae7991a99 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 11 May 2023 12:54:31 -0700
Subject: [PATCH v6 4/5] Add default_collation_provider GUC.

Controls default collation provider for CREATE COLLATION. Does not
affect CREATE DATABASE, which gets its default from the template
database.
---
 doc/src/sgml/config.sgml                      | 17 +++++++++++++
 doc/src/sgml/ref/create_collation.sgml        | 15 ++++++++---
 src/backend/commands/collationcmds.c          |  8 +++++-
 src/backend/utils/misc/guc_tables.c           | 18 +++++++++++++
 src/backend/utils/misc/postgresql.conf.sample |  4 +++
 src/include/commands/collationcmds.h          |  2 ++
 .../regress/expected/collate.icu.utf8.out     | 25 +++++++++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql     | 13 ++++++++++
 8 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 18ce06729b..58a1046340 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9820,6 +9820,23 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-collation-provider" xreflabel="default_collation_provider">
+      <term><varname>default_collation_provider</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>default_collation_provider</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Default collation provider for <command>CREATE
+        COLLATION</command>. Does not affect <command>CREATE
+        DATABASE</command>, which gets the default collation provider from the
+        template database. Valid values are <literal>icu</literal> and
+        <literal>libc</literal>. The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-icu-validation-level" xreflabel="icu_validation_level">
       <term><varname>icu_validation_level</varname> (<type>enum</type>)
       <indexterm>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 1ac41831d8..c9b3e6e218 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -121,10 +121,17 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
       <para>
        Specifies the provider to use for locale services associated with this
        collation.  Possible values are <literal>none</literal>,
-       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
-       (if the server was built with ICU support) or <literal>libc</literal>.
-       <literal>libc</literal> is the default.  See <xref
-       linkend="locale-providers"/> for details.
+       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
+       the server was built with ICU support) or <literal>libc</literal>.  See
+       <xref linkend="locale-providers"/> for details.
+      </para>
+      <para>
+       If <replaceable>provider</replaceable> is not specified, and
+       <replaceable>lc_collate</replaceable> or
+       <replaceable>lc_ctype</replaceable> is specified, the
+       <literal>libc</literal> provider is used. Otherwise, the default
+       provider is controlled by <xref
+       linkend="guc-default-collation-provider"/>.
       </para>
       <para>
        If the provider is <literal>icu</literal> and the locale is
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 21615746f9..25e8d32fd9 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -47,6 +47,7 @@ typedef struct
 	int			enc;			/* encoding */
 } CollAliasData;
 
+int		default_collation_provider = (int) COLLPROVIDER_LIBC;
 
 /*
  * CREATE COLLATION
@@ -228,7 +229,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				collprovider = (char) default_collation_provider;
+		}
 
 		if (collprovider == COLLPROVIDER_NONE
 			&& (localeEl || lccollateEl || lcctypeEl))
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 844781a7f5..901cfda819 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -35,8 +35,10 @@
 #include "access/xlogrecovery.h"
 #include "archive/archive_module.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_collation.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/collationcmds.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "commands/user.h"
@@ -166,6 +168,12 @@ static const struct config_enum_entry intervalstyle_options[] = {
 	{NULL, 0, false}
 };
 
+static const struct config_enum_entry collation_provider_options[] = {
+	{"icu", (int) 'i', false},
+	{"libc", (int) 'c', false},
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry icu_validation_level_options[] = {
 	{"disabled", -1, false},
 	{"debug5", DEBUG5, false},
@@ -4683,6 +4691,16 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"default_collation_provider", PGC_USERSET, CLIENT_CONN_LOCALE,
+		 gettext_noop("Default collation provider for CREATE COLLATION."),
+		 NULL
+		},
+		&default_collation_provider,
+		(int) COLLPROVIDER_LIBC, collation_provider_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"icu_validation_level", PGC_USERSET, CLIENT_CONN_LOCALE,
 		 gettext_noop("Log level for reporting invalid ICU locale strings."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c8018da04a..c1f247378d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -734,6 +734,10 @@
 #lc_numeric = 'C'			# locale for number formatting
 #lc_time = 'C'				# locale for time formatting
 
+#default_collation_provider = 'libc'	# default collation provider
+					# for CREATE COLLATION
+					# (none, icu, libc)
+
 #icu_validation_level = WARNING		# report ICU locale validation
 					# errors at the given level
 
diff --git a/src/include/commands/collationcmds.h b/src/include/commands/collationcmds.h
index b76c7b3dc3..f54389525d 100644
--- a/src/include/commands/collationcmds.h
+++ b/src/include/commands/collationcmds.h
@@ -18,6 +18,8 @@
 #include "catalog/objectaddress.h"
 #include "parser/parse_node.h"
 
+extern PGDLLIMPORT int default_collation_provider;
+
 extern ObjectAddress DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_exists);
 extern void IsThereCollationInNamespace(const char *collname, Oid nspOid);
 extern ObjectAddress AlterCollation(AlterCollationStmt *stmt);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index cf1852c89d..ea96e27f45 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1038,6 +1038,31 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 NOTICE:  using locale provider "none" for ICU locale "C"
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+ collname | collprovider 
+----------+--------------
+ def_libc | c
+(1 row)
+
+DROP COLLATION def_libc;
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+ collname | collprovider 
+----------+--------------
+ def_icu  | i
+(1 row)
+
+CREATE COLLATION def_libc (LC_COLLATE = 'C', LC_CTYPE='C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+ collname | collprovider 
+----------+--------------
+ def_libc | c
+(1 row)
+
+RESET default_collation_provider;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index e59200df9a..ee607ca3a5 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -378,6 +378,19 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+DROP COLLATION def_libc;
+
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+CREATE COLLATION def_libc (LC_COLLATE = 'C', LC_CTYPE='C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+
+RESET default_collation_provider;
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-- 
2.34.1

v6-0001-Introduce-collation-provider-none.patchtext/x-patch; charset=UTF-8; name=v6-0001-Introduce-collation-provider-none.patchDownload

From de37bfb02dcc41c2e932a788ba10a05e5a539870 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 1 May 2023 15:38:29 -0700
Subject: [PATCH v6 1/5] Introduce collation provider "none".

Provides locale-unaware semantics that are implemented as fast byte
operations in Postgres, independent of the operating system or any
provider libraries.

Equivalent (in semantics and implementation) to the libc provider with
locale "C", except that LC_COLLATE and LC_CTYPE can be set
independently.

Use provider "none" for built-in collation "ucs_basic" instead of
libc.
---
 doc/src/sgml/charset.sgml              | 87 +++++++++++++++++++++-----
 doc/src/sgml/ref/create_collation.sgml |  2 +-
 doc/src/sgml/ref/create_database.sgml  |  2 +-
 doc/src/sgml/ref/createdb.sgml         |  2 +-
 doc/src/sgml/ref/initdb.sgml           |  2 +-
 src/backend/catalog/pg_collation.c     |  7 ++-
 src/backend/commands/collationcmds.c   | 84 +++++++++++++++++++++----
 src/backend/commands/dbcommands.c      | 69 +++++++++++++++++---
 src/backend/utils/adt/pg_locale.c      | 27 +++++++-
 src/backend/utils/init/postinit.c      | 10 ++-
 src/bin/initdb/initdb.c                | 33 +++++++++-
 src/bin/initdb/t/001_initdb.pl         | 29 +++++++++
 src/bin/pg_dump/pg_dump.c              |  8 ++-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 18 +++++-
 src/bin/psql/describe.c                |  2 +-
 src/bin/scripts/createdb.c             |  2 +-
 src/bin/scripts/t/020_createdb.pl      | 29 +++++++++
 src/include/catalog/pg_collation.dat   |  3 +-
 src/include/catalog/pg_collation.h     |  3 +
 src/test/regress/expected/collate.out  | 10 ++-
 src/test/regress/sql/collate.sql       |  6 ++
 21 files changed, 373 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 9db14649aa..7a791a2b7c 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
    <title>Locale Providers</title>
 
    <para>
-    <productname>PostgreSQL</productname> supports multiple <firstterm>locale
-    providers</firstterm>.  This specifies which library supplies the locale
-    data.  One standard provider name is <literal>libc</literal>, which uses
-    the locales provided by the operating system C library.  These are the
-    locales used by most tools provided by the operating system.  Another
-    provider is <literal>icu</literal>, which uses the external
-    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can
-    only be used if support for ICU was configured when PostgreSQL was built.
+    A locale provider specifies which library defines the locale behavior for
+    collations and character classifications.
    </para>
 
    <para>
     The commands and tools that select the locale settings, as described
-    above, each have an option to select the locale provider.  The examples
-    shown earlier all use the <literal>libc</literal> provider, which is the
-    default.  Here is an example to initialize a database cluster using the
-    ICU provider:
+    above, each have an option to select the locale provider. Here is an
+    example to initialize a database cluster using the ICU provider:
 <programlisting>
 initdb --locale-provider=icu --icu-locale=en
 </programlisting>
@@ -370,12 +362,73 @@ initdb --locale-provider=icu --icu-locale=en
    </para>
 
    <para>
-    Which locale provider to use depends on individual requirements.  For most
-    basic uses, either provider will give adequate results.  For the libc
-    provider, it depends on what the operating system offers; some operating
-    systems are better than others.  For advanced uses, ICU offers more locale
-    variants and customization options.
+    Regardless of the locale provider, the operating system is still used to
+    provide some locale-aware behavior, such as messages (see <xref
+    linkend="guc-lc-messages"/>).
    </para>
+
+   <para>
+    The available locale providers are listed below.
+   </para>
+
+   <sect3 id="locale-provider-none">
+    <title>None</title>
+    <para>
+     The <literal>none</literal> provider uses simple built-in operations
+     which are not locale-aware.
+    </para>
+    <para>
+     The collation and character classification behavior is equivalent to
+     using the <literal>libc</literal> provider with locale
+     <literal>C</literal>, except that <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently.
+    </para>
+    <note>
+     <para>
+      When using the <literal>none</literal> locale provider, behavior may
+      depend on the database encoding.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-icu">
+    <title>ICU</title>
+    <para>
+     The <literal>icu</literal> provider uses the external
+     ICU<indexterm><primary>ICU</primary></indexterm>
+     library. <productname>PostgreSQL</productname> must have been configured
+     with support.
+    </para>
+    <para>
+     ICU provides collation and character classification behavior that is
+     independent of the operating system and database encoding, which is
+     preferable if you expect to transition to other platforms without any
+     change in results. <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
+    </para>
+    <note>
+     <para>
+      For the ICU provider, results may depend on the version of the ICU
+      library used, as it is updated to reflect changes in natural language
+      over time.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-libc">
+    <title>libc</title>
+    <para>
+     The <literal>libc</literal> provider uses the operating system's C
+     library. The collation and character classification behavior is
+     controlled by the settings <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal>, so they cannot be set independently.
+    </para>
+    <note>
+     <para>
+      The same locale name may have different behavior on different platforms
+      when using the libc provider.
+     </para>
+    </note>
+   </sect3>
+
   </sect2>
   <sect2 id="icu-locales">
    <title>ICU Locales</title>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..5489ae7413 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -120,7 +120,7 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
      <listitem>
       <para>
        Specifies the provider to use for locale services associated with this
-       collation.  Possible values are
+       collation.  Possible values are <literal>none</literal>,
        <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
        (if the server was built with ICU support) or <literal>libc</literal>.
        <literal>libc</literal> is the default.  See <xref
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..60b9da0952 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -212,7 +212,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <listitem>
        <para>
         Specifies the provider to use for the default collation in this
-        database.  Possible values are
+        database.  Possible values are <literal>none</literal>,
         <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
         (if the server was built with ICU support) or <literal>libc</literal>.
         By default, the provider is the same as that of the <xref
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..326a371d34 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -168,7 +168,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         Specifies the locale provider for the database's default collation.
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..e604ab48b7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -323,7 +323,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry id="app-initdb-option-locale-provider">
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         This option sets the locale provider for databases created in the new
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index fd022e6fc2..86b6ba2375 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -68,7 +68,12 @@ CollationCreate(const char *collname, Oid collnamespace,
 	Assert(collname);
 	Assert(collnamespace);
 	Assert(collowner);
-	Assert((collcollate && collctype) || colliculocale);
+	Assert((collprovider == COLLPROVIDER_NONE &&
+			!collcollate && !collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_LIBC &&
+			 collcollate &&  collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_ICU &&
+			!collcollate && !collctype &&  colliculocale));
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..aeaf6c419e 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -215,7 +215,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 
 		if (collproviderstr)
 		{
-			if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			if (pg_strcasecmp(collproviderstr, "none") == 0)
+				collprovider = COLLPROVIDER_NONE;
+			else if (pg_strcasecmp(collproviderstr, "icu") == 0)
 				collprovider = COLLPROVIDER_ICU;
 			else if (pg_strcasecmp(collproviderstr, "libc") == 0)
 				collprovider = COLLPROVIDER_LIBC;
@@ -228,6 +230,13 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collprovider = COLLPROVIDER_LIBC;
 
+		if (collprovider == COLLPROVIDER_NONE
+			&& (localeEl || lccollateEl || lcctypeEl))
+		{
+			ereport(ERROR,
+					(errmsg("collation provider \"none\" does not support LOCALE, LC_COLLATE, or LC_CTYPE")));
+		}
+
 		if (localeEl)
 		{
 			if (collprovider == COLLPROVIDER_LIBC)
@@ -302,6 +311,16 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("ICU rules cannot be specified unless locale provider is ICU")));
 
+		if (collprovider == COLLPROVIDER_NONE)
+		{
+			/*
+			 * Behavior may be different in different encodings, so set
+			 * collencoding to the current database encoding. No validation is
+			 * required, because the "none" provider is compatible with any
+			 * encoding.
+			 */
+			collencoding = GetDatabaseEncoding();
+		}
 		if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
@@ -331,7 +350,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
+	{
+		char *locale;
+
+		if (collprovider == COLLPROVIDER_ICU)
+			locale = colliculocale;
+		else if (collprovider == COLLPROVIDER_LIBC)
+			locale = collcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		collversion = get_collation_actual_version(collprovider, locale);
+	}
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -406,6 +436,7 @@ AlterCollation(AlterCollationStmt *stmt)
 	Form_pg_collation collForm;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 	ObjectAddress address;
@@ -430,8 +461,20 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttrNotNull(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
-	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	if (collForm->collprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_colliculocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (collForm->collprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_collcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(collForm->collprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -494,11 +537,18 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
 
-		datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_daticulocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(dbtup);
 	}
@@ -514,11 +564,19 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
 		Assert(provider != COLLPROVIDER_DEFAULT);
-		datum = SysCacheGetAttrNotNull(COLLOID, colltp,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colliculocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(colltp);
 	}
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..9e73f54803 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -909,7 +909,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
 
-		if (pg_strcasecmp(locproviderstr, "icu") == 0)
+		if (pg_strcasecmp(locproviderstr, "none") == 0)
+			dblocprovider = COLLPROVIDER_NONE;
+		else if (pg_strcasecmp(locproviderstr, "icu") == 0)
 			dblocprovider = COLLPROVIDER_ICU;
 		else if (pg_strcasecmp(locproviderstr, "libc") == 0)
 			dblocprovider = COLLPROVIDER_LIBC;
@@ -1177,9 +1179,17 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 */
 	if (src_collversion && !dcollversion)
 	{
-		char	   *actual_versionstr;
+		char	*actual_versionstr;
+		char	*locale;
 
-		actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dblocprovider, locale);
 		if (!actual_versionstr)
 			ereport(ERROR,
 					(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
@@ -1207,7 +1217,18 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * collation version, which is normally only the case for template0.
 	 */
 	if (dbcollversion == NULL)
-		dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+	{
+		char *locale;
+
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		dbcollversion = get_collation_actual_version(dblocprovider, locale);
+	}
 
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
@@ -2403,6 +2424,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	ObjectAddress address;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 
@@ -2429,10 +2451,24 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
-	if (isnull)
-		elog(ERROR, "unexpected null in pg_database");
-	newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
+	if (datForm->datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_daticulocale, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datForm->datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(datForm->datlocprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -2617,6 +2653,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 	HeapTuple	tp;
 	char		datlocprovider;
 	Datum		datum;
+	char	   *locale;
 	char	   *version;
 
 	tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
@@ -2627,8 +2664,20 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 
 	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
 
-	datum = SysCacheGetAttrNotNull(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-	version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
+	if (datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_daticulocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	version = get_collation_actual_version(datlocprovider, locale);
 
 	ReleaseSysCache(tp);
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index eea1d1ae0f..95eb5cf464 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1228,7 +1228,12 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			elog(ERROR, "cache lookup failed for collation %u", collation);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			cache_entry->collate_is_c = true;
+			cache_entry->ctype_is_c = true;
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 			Datum		datum;
 			const char *collcollate;
@@ -1281,6 +1286,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1334,6 +1342,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1487,8 +1498,10 @@ pg_newlocale_from_collation(Oid collid)
 	{
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return &default_locale;
-		else
+		else if (default_locale.provider == COLLPROVIDER_LIBC)
 			return (pg_locale_t) 0;
+		else
+			elog(ERROR, "cannot open collation with provider \"none\"");
 	}
 
 	cache_entry = lookup_collation_cache(collid, false);
@@ -1513,7 +1526,11 @@ pg_newlocale_from_collation(Oid collid)
 		result.provider = collform->collprovider;
 		result.deterministic = collform->collisdeterministic;
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			elog(ERROR, "cannot open collation with provider \"none\"");
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
 			const char *collcollate;
@@ -1599,6 +1616,7 @@ pg_newlocale_from_collation(Oid collid)
 
 			collversionstr = TextDatumGetCString(datum);
 
+			Assert(collform->collprovider != COLLPROVIDER_NONE);
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
 			actual_versionstr = get_collation_actual_version(collform->collprovider,
@@ -1650,6 +1668,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_NONE)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 53420f4974..8053642fd3 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -461,10 +461,18 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	{
 		char	   *actual_versionstr;
 		char	   *collversionstr;
+		char	   *locale;
 
 		collversionstr = TextDatumGetCString(datum);
 
-		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate);
+		if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			locale = iculocale;
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			locale = collate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, locale);
 		if (!actual_versionstr)
 			/* should not happen */
 			elog(WARNING,
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 30b576932f..4a6cad3cb9 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2408,6 +2408,22 @@ setlocales(void)
 
 	/* set empty lc_* values to locale config if set */
 
+	if (locale_provider == COLLPROVIDER_NONE)
+	{
+		if (!lc_ctype)
+			lc_ctype = "C";
+		if (!lc_collate)
+			lc_collate = "C";
+		if (!lc_numeric)
+			lc_numeric = "C";
+		if (!lc_time)
+			lc_time = "C";
+		if (!lc_monetary)
+			lc_monetary = "C";
+		if (!lc_messages)
+			lc_messages = "C";
+	}
+
 	if (locale)
 	{
 		if (!lc_ctype)
@@ -2502,7 +2518,7 @@ usage(const char *progname)
 			 "                            set default locale in the respective category for\n"
 			 "                            new databases (default taken from environment)\n"));
 	printf(_("      --no-locale           equivalent to --locale=C\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                            set default locale provider for new databases\n"));
 	printf(_("      --pwfile=FILE         read password for the new superuser from file\n"));
 	printf(_("  -T, --text-search-config=CFG\n"
@@ -2652,7 +2668,15 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (locale_provider == COLLPROVIDER_LIBC &&
+	if (locale_provider == COLLPROVIDER_NONE &&
+		strcmp(lc_ctype, "C") == 0 &&
+		strcmp(lc_collate, "C") == 0 &&
+		strcmp(lc_time, "C") == 0 &&
+		strcmp(lc_numeric, "C") == 0 &&
+		strcmp(lc_monetary, "C") == 0 &&
+		strcmp(lc_messages, "C") == 0)
+		printf(_("The database cluster will be initialized with no locale.\n"));
+	else if (locale_provider == COLLPROVIDER_LIBC &&
 		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
@@ -3326,7 +3350,9 @@ main(int argc, char *argv[])
 										 "-c debug_discard_caches=1");
 				break;
 			case 15:
-				if (strcmp(optarg, "icu") == 0)
+				if (strcmp(optarg, "none") == 0)
+					locale_provider = COLLPROVIDER_NONE;
+				else if (strcmp(optarg, "icu") == 0)
 					locale_provider = COLLPROVIDER_ICU;
 				else if (strcmp(optarg, "libc") == 0)
 					locale_provider = COLLPROVIDER_LIBC;
@@ -3365,6 +3391,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+
 	if (icu_locale && locale_provider != COLLPROVIDER_ICU)
 		pg_fatal("%s cannot be specified unless locale provider \"%s\" is chosen",
 				 "--icu-locale", "icu");
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 17a444d80c..fe6d224e5b 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -154,6 +154,35 @@ else
 		'locale provider ICU fails since no ICU support');
 }
 
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', "$tempdir/data6" ],
+	'locale provider none');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--locale=C',
+	  "$tempdir/data7" ],
+	'locale provider none with --locale');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-collate=C',
+	  "$tempdir/data8" ],
+	'locale provider none with --lc-collate');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-ctype=C',
+	  "$tempdir/data9" ],
+	'locale provider none with --lc-ctype');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-locale=en',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU locale');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-rules=""',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU rules');
+
 command_fails(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
 	'fails for invalid locale provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f9cbeb65ab..ddc8a5f71f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3070,7 +3070,9 @@ dumpDatabase(Archive *fout)
 	}
 
 	appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
-	if (datlocprovider[0] == 'c')
+	if (datlocprovider[0] == 'n')
+		appendPQExpBufferStr(creaQry, "none");
+	else if (datlocprovider[0] == 'c')
 		appendPQExpBufferStr(creaQry, "libc");
 	else if (datlocprovider[0] == 'i')
 		appendPQExpBufferStr(creaQry, "icu");
@@ -13429,7 +13431,9 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 					  fmtQualifiedDumpable(collinfo));
 
 	appendPQExpBufferStr(q, "provider = ");
-	if (collprovider[0] == 'c')
+	if (collprovider[0] == 'n')
+		appendPQExpBufferStr(q, "none");
+	else if (collprovider[0] == 'c')
 		appendPQExpBufferStr(q, "libc");
 	else if (collprovider[0] == 'i')
 		appendPQExpBufferStr(q, "icu");
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 4a7895a756..6d58f6103e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -114,12 +114,20 @@ my $original_locale = "C";
 my $original_iculocale = "";
 my $provider_field = "'c' AS datlocprovider";
 my $iculocale_field = "NULL AS daticulocale";
-if ($oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes')
+if ($oldnode->pg_version >= 15)
 {
 	$provider_field = "datlocprovider";
 	$iculocale_field = "daticulocale";
-	$original_provider = "i";
-	$original_iculocale = "fr-CA";
+
+	if ($ENV{with_icu} eq 'yes')
+	{
+		$original_provider = "i";
+		$original_iculocale = "fr-CA";
+	}
+	else
+	{
+		$original_provider = "n";
+	}
 }
 
 my @initdb_params = @custom_opts;
@@ -131,6 +139,10 @@ if ($original_provider eq "i")
 	push @initdb_params, ('--locale-provider', 'icu');
 	push @initdb_params, ('--icu-locale', 'fr-CA');
 }
+elsif ($original_provider eq "n")
+{
+	push @initdb_params, ('--locale-provider', 'none');
+}
 
 $node_params{extra} = \@initdb_params;
 $oldnode->init(%node_params);
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index ab4279ed58..c842a62ae9 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -932,7 +932,7 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  "  CASE d.datlocprovider WHEN 'n' THEN 'none' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..79367d933b 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -299,7 +299,7 @@ help(const char *progname)
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
 	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
 	printf(_("      --icu-rules=RULES        ICU rules setting for the database\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
 	printf(_("  -S, --strategy=STRATEGY      database creation strategy wal_log or file_copy\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..5aa658b671 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -83,6 +83,35 @@ else
 		'create database with ICU fails since no ICU support');
 }
 
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', 'testnone1' ],
+	'create database with provider "none"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--locale=C',
+	  'testnone2' ],
+	'create database with provider "none" and locale "C"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-collate=C',
+	  'testnone3' ],
+	'create database with provider "none" and LC_COLLATE=C');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-ctype=C',
+	  'testnone4' ],
+	'create database with provider "none" and LC_CTYPE=C');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-locale=en',
+	  'testnone5' ],
+	'create database with provider "none" and ICU_LOCALE="en"');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-rules=""',
+	  'testnone6' ],
+	'create database with provider "none" and ICU_RULES=""');
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index b6a69d1d42..40d62416ea 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -24,8 +24,7 @@
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
 { oid => '962', descr => 'sorts by Unicode code point',
-  collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
-  collcollate => 'C', collctype => 'C' },
+  collname => 'ucs_basic', collprovider => 'n', collencoding => '6' },
 { oid => '963',
   descr => 'sorts using the Unicode Collation Algorithm with default settings',
   collname => 'unicode', collprovider => 'i', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index bfa3568451..29be3f8d94 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -64,6 +64,7 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on
 
 #ifdef EXPOSE_TO_CLIENT_CODE
 
+#define COLLPROVIDER_NONE		'n'
 #define COLLPROVIDER_DEFAULT	'd'
 #define COLLPROVIDER_ICU		'i'
 #define COLLPROVIDER_LIBC		'c'
@@ -73,6 +74,8 @@ collprovider_name(char c)
 {
 	switch (c)
 	{
+		case COLLPROVIDER_NONE:
+			return "none";
 		case COLLPROVIDER_ICU:
 			return "icu";
 		case COLLPROVIDER_LIBC:
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0649564485..b7603c9f6c 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -650,6 +650,13 @@ EXPLAIN (COSTS OFF)
 (3 rows)
 
 -- CREATE/DROP COLLATION
+CREATE COLLATION none ( PROVIDER = none );
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
@@ -754,7 +761,7 @@ DETAIL:  FROM cannot be specified together with any other options.
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 19 other objects
+NOTICE:  drop cascades to 20 other objects
 DETAIL:  drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
@@ -771,6 +778,7 @@ drop cascades to function dup(anyelement)
 drop cascades to table collate_test20
 drop cascades to table collate_test21
 drop cascades to table collate_test22
+drop cascades to collation "none"
 drop cascades to collation mycoll2
 drop cascades to table collate_test23
 drop cascades to view collate_on_int
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index c3d40fc195..e2dceb8dff 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -244,6 +244,12 @@ EXPLAIN (COSTS OFF)
 
 -- CREATE/DROP COLLATION
 
+CREATE COLLATION none ( PROVIDER = none );
+
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
-- 
2.34.1

v6-0005-ICU-fix-up-old-libc-style-locale-strings.patchtext/x-patch; charset=UTF-8; name=v6-0005-ICU-fix-up-old-libc-style-locale-strings.patchDownload

From 274a887f8970647b2c932ee55c4783095719985d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 28 Apr 2023 12:22:41 -0700
Subject: [PATCH v6 5/5] ICU: fix up old libc-style locale strings.

Before transforming a locale string into a language tag, fix up old
libc-style locale strings such as 'fr_FR@euro'. Older ICU versions did
this automatically, but ICU version 64 removed that support.

Discussion: https://postgr.es/m/654a49f7ff7461bcf47be4181430678d45f93858.camel%40j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 57 ++++++++++++++++-
 src/bin/initdb/initdb.c                       | 61 ++++++++++++++++++-
 .../regress/expected/collate.icu.utf8.out     | 11 ++++
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +++
 4 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 95eb5cf464..2ee81e9804 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2787,6 +2787,58 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 
 	pfree(lower_str);
 }
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < lengthof(icu_variant_map); i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = palloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pstrdup(loc_str);
+}
+
 #endif
 
 /*
@@ -2803,6 +2855,7 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2833,7 +2886,7 @@ icu_language_tag(const char *loc_str, int elevel)
 	while (true)
 	{
 		status = U_ZERO_ERROR;
-		uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/* try again if the buffer is not large enough */
 		if ((status == U_BUFFER_OVERFLOW_ERROR ||
@@ -2848,6 +2901,8 @@ icu_language_tag(const char *loc_str, int elevel)
 		break;
 	}
 
+	pfree(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pfree(langtag);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f0827154cd..1304a235ce 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2240,6 +2240,61 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+#ifdef USE_ICU
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < lengthof(icu_variant_map); i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = pg_malloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pg_strdup(loc_str);
+}
+
+#endif
+
 /*
  * Convert to canonical BCP47 language tag. Must be consistent with
  * icu_language_tag().
@@ -2249,6 +2304,7 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2277,7 +2333,8 @@ icu_language_tag(const char *loc_str)
 	while (true)
 	{
 		status = U_ZERO_ERROR;
-		uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+
+		uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/* try again if the buffer is not large enough */
 		if (status == U_BUFFER_OVERFLOW_ERROR ||
@@ -2291,6 +2348,8 @@ icu_language_tag(const char *loc_str)
 		break;
 	}
 
+	pg_free(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pg_free(langtag);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index ea96e27f45..692e8cdf18 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1071,12 +1071,23 @@ ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
+ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-cu-eur" for ICU locale "@EURO"
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-pinyin" for ICU locale "@pinyin"
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-stroke" for ICU locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index ee607ca3a5..0b90e2a5b9 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -395,9 +395,16 @@ CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, nee
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
-- 
2.34.1

#65

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#64)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

2) Automatically change the provider to libc when locale=C.

Almost works, but it's not clear how we handle the case "provider=icu
lc_collate='fr_FR.utf8' locale=C".

If we change it to "provider=libc lc_collate=C", we've overridden the
specified lc_collate. If we ignore the locale=C, that would be
surprising to users. If we throw an error, that would be a backwards
compatibility issue.

This thread started with a report illustrating that when users mention
the locale "C", they implicitly mean "C" from the libc provider, as
when libc was the default. The problem is that as soon as ICU is the
default, any reference to a libc collation should mention explicitly
that the provider is libc.

It seems what we're set on the idea to create an exception for "C"
(and I assume also "POSIX") to avoid too much confusion, and because
"C" is quite special anyway, and has no equivalent in ICU (the switch
in v16 to ICU as the default provider is based on the premise that the
locales with the same name will behave pretty much the same with ICU
as they did with libc, but it's absolutely not the case with "C").

ISTM that if we want to go that route, we need the make the minimum
changes at the user interface level and not any deeper, so that when
(locale="C" OR locale="POSIX") AND the provider has not been specified,
then the command (initdb and create database) act as if the user had
specified provider=libc.

(3) Support iculocale=C in the ICU provider using the memcmp() path.

In other words, if provider=icu and iculocale=C, lc_collate_is_c() and
lc_ctpye_is_c() would both return true.

ICU does not provide a locale that behaves like that, and it doesn't
feel right to pretend it does. It feels like attacking the problem
at the wrong level.

(4) Create a new "none" provider (which has no locale and always memcmp
semantics), and automatically change the provider to "none" if
provider=icu and iculocale=C.

It still uses libc/C for character classification and case changing,
so "no locale" is technically not true. Personally I don't see
the benefit of adding a "none" provider. C is a libc locale
and libc is not disappearing. I also think that when users explicitly
indicate provider=icu, they should get icu.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#66

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Jeff Davis (#60)

2 attachment(s)

Re: Order changes in PG16 since ICU introduction

Jeff Davis <pgsql@j-davis.com> writes:

Committed, thank you.

This commit has given the PDF docs build some indigestion:

Making portrait pages on A4 paper (210mmx297mm)
/home/postgres/bin/fop -fo postgres-A4.fo -pdf postgres-A4.pdf
[WARN] FOUserAgent - Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".
[WARN] FOUserAgent - Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400".
[WARN] FOUserAgent - Hyphenation pattern not found. URI: en.
[WARN] FOUserAgent - The contents of fo:block line 1 exceed the available area in the inline-progression direction by 3531 millipoints. (See position 55117:2388)
[WARN] FOUserAgent - The contents of fo:block line 1 exceed the available area in the inline-progression direction by 1871 millipoints. (See position 55117:12998)
[WARN] FOUserAgent - Glyph "?" (0x323, dotbelowcmb) not available in font "Courier".
[WARN] FOUserAgent - Glyph "?" (0x302, circumflexcmb) not available in font "Courier".
[WARN] FOUserAgent - The contents of fo:block line 12 exceed the available area in the inline-progression direction by 20182 millipoints. (See position 55172:188)
[WARN] FOUserAgent - The contents of fo:block line 10 exceed the available area in the inline-progression direction by 17682 millipoints. (See position 55172:188)
[WARN] FOUserAgent - Glyph "?" (0x142, lslash) not available in font "Times-Roman".
[WARN] PropertyMaker - span="inherit" on fo:block, but no explicit value found on the parent FO.

(The first three and last one warnings are things we've been living
with, but the ones between are new.)

The first two "exceed the available area" complaints are in the "ICU
Collation Levels" table. We can silence them by providing some column
width hints to make the "Description" column a tad wider than the rest,
as in the proposed patch attached. The other two, as well as the first
two glyph-not-available complaints, are caused by this bit:

Full normalization is important in some cases, such as when
multiple accents are applied to a single character. For instance,
<literal>'ệ'</literal> can be composed of code points
<literal>U&'\0065\0323\0302'</literal> or
<literal>U&'\0065\0302\0323'</literal>. With full normalization
on, these code point sequences are treated as equal; otherwise they
are unequal.

which renders just abysmally (see attached screen shot), and even in HTML
where it's rendering about as intended, it really is an unintelligible
example. Few people are going to understand that the circumflex and the
dot-below are separately applied accents. How about we drop the explicit
example and write something like

Full normalization allows code point sequences such as
characters with multiple accent marks applied in different
orders to be seen as equal.

(The last missing-glyph complaint is evidently from the release notes;
I'll bug Bruce about that separately.)

regards, tom lane

Attachments:

add-column-width-hints.patchtext/x-diff; charset=us-ascii; name=add-column-width-hints.patchDownload

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 9db14649aa..96a23bf530 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1140,6 +1140,14 @@ SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
      <table id="icu-collation-levels">
       <title>ICU Collation Levels</title>
       <tgroup cols="8">
+       <colspec colname="col1" colwidth="1*"/>
+       <colspec colname="col2" colwidth="1.25*"/>
+       <colspec colname="col3" colwidth="1*"/>
+       <colspec colname="col4" colwidth="1*"/>
+       <colspec colname="col5" colwidth="1*"/>
+       <colspec colname="col6" colwidth="1*"/>
+       <colspec colname="col7" colwidth="1*"/>
+       <colspec colname="col8" colwidth="1*"/>
        <thead>
         <row>
          <entry>Level</entry>

ICU-documentation-excerpt.pngimage/png; name=ICU-documentation-excerpt.pngDownload

�PNG


IHDR��"3-�@iCCPICC ProfileH��WXS��[��@h�K	�	"5��Z�ATB �A��.*�v��
]Q�bA�����bAEYv�M
���|o�o����3�9sfn�NpD�<T�|a�8.$�>6%�Nz
@�{�����11������u�H�+R�������p@b ��p�!>^�� Jy�)�")�h�a�/��,9���9�+�I�cA���
�#�@���E�,������'�F��7?�t�m��b�>#����ifkr8Y�X>YQ
��8���t����'�a�J�84N:g�������X�>aFT4���d���lIh��5��`��:���	���`a^T�����!�;�*(d'@��B~AP��f�xR����)f1�Y�X�W���$7�����g+�1����d�)[	�� V��� 7>\a3�8�5d#��I���8�/	��cE���8�}Y~��|���v��/�N��k�rd���`��Bf���`l��\x�� ���g|ab�B���0 N>���b��?/D��A�ZP��'�
)��3E�1	�8��NX�<|�,�@k�r�������{��A�34"Y�#��xP���
���z���_�Y��d�z�d#r���A8����(���$�2�x�������*������aB&B�H�<���,�A�@b(1�h����7���:��sh��	O]���k�n�����OQF�n���E���������@u�����w�~����Y�"niV�?i�m?�����DF��d���#U�T��U���1?�X3������?����`��%�;��c'�s�Q����	���I���z,�]C��d��B�?�
��4�N�N�N_�}����w4`MM���L�E���B��H����������&V��@t:�s����epp��w.��}��?���a�O�2gs%�"9�K/��P�O�>0�����o��@�	 L��g�}.S�0��r����&��{�~������.�k��==����gAH�!��	b��#��E��$IA��,D�H��<�Y��G� 5�>�0r9�t!��H/���b�
���V�(��2�p4�f���bt>�]�V����$z��v�/�`��f�9`��Ec�X&&�faeXV��a�p��`�X�'�4��;��'�\|2>_���w�
x~����T�!���E`��S��
�v�!�i�,���D�5�>�)��t�b�b=�����8@"��I�$R4�C*$����v�ZH�I=�J�J&J�J�J�JB��
�]J��.+=U�LV'[����dyy)y��|��C�L��XS|(	��\�ZJ�4�.��������r��@y��Z���g�(T�T�Sa���HT���P9�rK�
�J���SS���%��)�}�U���*[��:[�R�A���K5���Sm�Z�Z����j}�du+u�:G}�z��a��4�����5vi��x�I�����i����yJ�
���X4.mm�4�G��e�����*�������������=U�R��v��c�����Y��_���']#]�._w�n��e��z#����zez�z��>�����s���7��3�
�b
�l48m�7Bk������G�6D
���n5�0026
1�3:e�g�c�o�c����q�	���D`�����9]����������������-��������J�����S������[��-L,"-fX�Z��$[2,�-�X�[����J�Z`�h��Z��m]l]k}��j�g3�����-��a�k����j�f�mWiw��w��o��I�9R8�z�
�C�C��G���F���,F��Z>�}�7'7�<�mNwFk�]2�y�kg;g�s��U�K��l�&�W���|���7�hn�n�Z���{�����{=,<�=�<n0�1���������G=?z�{z������;�{���1�c�c��y�c�������K�M�����g�����{�o�������i��a�f�p

x��b�d��C�;�4����6�
�
�q�r"��<����e����<�f��������a!�h�D#�"WF����F5F�hv���{1�1�c��ccb+c������O���+�]B@���;�6����$����������+����;s���AJS*)5)u{����q��������]o=~��s&�M86Qm"g��tBzr���/�hN5g ��Q���eq�p_��y�x�|�
��L�����|�Vf�f�eWd�	X���W9�9�r��F����K���W�O�?,��
�&O�:�Kd/*uO���zr�8\��)_�T��;$6�_$�|�*�>LI�r`��T���iv�M{Z\��t|:wz��sg<����e2+cV�l���g��	��s.en���K�JV����<�y���9���Km�j�������;�,Z��[��|�SyE���������_�d.�\��t�2�2�������\���x����+V�W��z�z��s���P�H�t��X���b��u_�g��VPY_eX��������7�m2�T���f���[B�4T[UWl%n-��d[������l7�^�������q;�j<jjv�ZZ��Jj{w����'pOS�C��z����`�d��}������z�q������C�Ce
H�������������a�[���q<������c�����|���e���D�����Z'��95������������	>s����r����s^��g�o��~�������n��t�l��q�������1]�/�]>y%�������E]���x�����7y7������v���w��%�-��~������?l���v�>� �A����wq�x\��K��'�'OM��<s~v�7����q�{^�^|�+�S����6/���WG����W�W�����������������}~_�A���������?=�<������_����;�?8(��9�_V43��;��@��3�8��OV��U����3���P��c����
�n��/���@�O��������\)-Dx��5#?��"?s���-������Md|�8���eXIfMM*>F(�iN����x����ASCIIScreenshot]~Z	pHYs%%IR$��iTXtXML:com.adobe.xmp<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 6.0.0">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:exif="http://ns.adobe.com/exif/1.0/">
         <exif:PixelYDimension>494</exif:PixelYDimension>
         <exif:PixelXDimension>470</exif:PixelXDimension>
         <exif:UserComment>Screenshot</exif:UserComment>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
ax�iDOT�(��o>� �@IDATx���TE��{u
k�������YsF��0`Q	�U��(�D�����"�y�OE1fX0�b����������o=�xf���3���;����;���������S��]��B���%�����s���w��sN���`H�����?����7�I�b�&�V� V��
K�����,�3`���I���~	&��0`-,#�a�j���0`���I��X��r��o`�X��`(,��2���X(V�)�
K�����,���J���~
&��0`-,#�a�j���0`���I��X��r��o`�X��`(,��2���X(V�)�
K�����,���J���~
&��0`-,#�a�j���0`���I��X��r��o`�X��`(,��2���X(V�)�
K�����,���J���~
&��0`-,#�a�j���0`���I��X��r��o`�X��`(,��2���X(V�)�
K�����,���J ���7��s�=�d��N;��w��`����=�����p��?�����(�����|P�������*��2����g���{�9w�}����[�Y��u6K�JT�k�Y�ld32	�������^zi�$H?~�_z������������������O�����{�������f�M�����L��U�V������7�p��}��5�}��I�k.=_��Xc
���k���%�l����"�~���p2�,W������@.`
�O���#�I�'�|�?���2�
�|���J�Y�f������9s����_H��z��7��s�=�A�r����*L0`��Ak�&����+��|���[7���:*g�_z�%��7�|��x�G4��_�.��%���sg�6$������kn�Xj�~�m�����%�k�5������t�n���!Y�Y�~���?��O��,�����?�����O?u���?4m�Eu{���^��k����[oub
v�����S~�
7tb
v��M���?�f����r��{��G���m���3�}������_v����[���m����2�,X��=��S��t�<�������h�n�Eq?����E����K7}�t���k��6�,�!���g��u��Q:�+?��������6"1�Gy����&��^{m����^��#�6�x��oR��<��S�������o����m�l�ls��{��7�����b�-�e�A������ZK�(�������VZ)��j�m��j��y.�HF&��k�A,�4�u�]������w���v�j�2�j�7�x��r�)���2�v��X�U@����������0`��������K2��������g�}�I'��WYe/����5��/��=�P��Kxy��|����o�'�p�_~����+������)S����3gzd@{H[a��P�q���A��+����K��v�m�����(O��*/)�-��<y�����[����3&�s���4���eC���S^|�v���8���E]������L����1�ai�����k��C���I#/NZ������Z�#�<�7Sp����kIc
���v�i�a]�c�X!���>+i���m��
Xqy����I�v|��'^<g5m�W�Z�3f(�D�Vg��_���;���G���/���x��~���~����o����:uj�]�t��{�f}p��E�����|��}���g����V�t���
����k�����l`��[o����j+��0�b�-"^_|���408AAi���2���_�+�N20��A4�(�X1�GXw`��e�]�O�<Y�9�6j���l5_�V��5_�
X�O�]S`eRG��3t�P�0i`s�����Nb�V���Q���4<������I������%�h��}�]<�^3���	$�TM���[B�������@c���rq����������X�w�C8�m<N���������:��)S�h�0�	�2���Ch��><�z��7j��@��0����k�e\5��\�K�n�i�+��f'L��������~�K.���cB�&]�a�d
V�7�d�x���|���\�i�N%���/�����&��W_]��M�I�L�X�����{�U>�����)%���c��
�e�y��k=���)�~��������+������ab�2`��Q,2g���g5T���@���t��4�4�*�H:�t���o��G�����U-�1�:d��QX����N����/���6lX��b���-:��#frQ��b����O�����#^�^w��I�!��U�`L���~*��k\|����.��92N����T.`�/��6m�h�U
`=��c���&���Bu`�q)��5V���f�l:��C�Yx�0`������X�u����XqPar��f��X[n�e��������Q6�+�I�6�l��Dc�X���Q�������B�V�`�@>��`�s���;�X��i��T���Y���&`��#F��i���t�M���v<��������� �\�*�Vj]f
�����Xk|��v/���L����Gl�D9�����^�&P����/��2�;N�0!b&�u�Y'J�u�i~�E�`���e���t88q$cp�!��l��������^�r� '^�Z-^���6�`%��,�<wq����_�U�
���%�;x�N�4I��W�;���&`�g�Y/��*M���6�FM�^�z��3�d�IZp�����^01��q�e����Z�r9J�<��mk��4Z-�V���}E��X�|�I]d�9�!����ipj�4(#�+�F�7)&L��8�e,�u����-'�4��h
yw�m7��"�.Dm��*)Z4�?�����r���C#��rh�����Z�������.�S�����J��ES��A�#���-�z�q�y9PA���������( �~�\��C(yq��j����8_x��dJ}���������<���4�t�Mmp����8��G��}�%	m��`�@�`��3/��P���]�d�Z�#�<���������|'/+�������(��Fl��)i~�r����!��ZR���qZ�h�N�E���W�G��*���i��}�N�	���G����K�F��Zi��b^%����*�n�k@�T_J`��>Y;L���k�%Z��Xks\���k�"�u(�:�Ft���B��"��8���RK�����M~��9�E����h�Y�:��kz#�l����b�ZU�S�%`�Zj��&?����{e�Z���@J����]6`m��j��k-�����0`-�Dk��km�k��2`-ZdV�%`�Z����.�6Bh�X���G��Tj	��Z������6���^�V9��IDw�}�99��h�����s����H<w�	'�d�n�J5&������������kjQ�v�|�*�;9KV�U�o� ����:�4���w'�7��tz\d)��dY�X��W��6`�����w>`E��H1Ni�*U��D�Q�w�wtLP�F]�������/.[�%^��j���,�|��jL��a�J����L,��X��.S�k]
$G��t��8�^b��x��$���$D��2e�~7��q(�XK-���g�Z��Zt���ZrO?��[{����y�xS��{�uz�m��F)EB�E�s]Hx:'����{�=�x�N�{:	-��5}�t5uJ��(��Y�TkxZ�n����������5�LA�:��sO��ZkE������-��I��(Ox���mS���<��k��]y��n���ZFB�i����#q`��ssj�I5}����&���>��������VZI�Ix;��C��g��Q�F9	m��kR�\cB;
�5*�E��L#?	
���6m��9s�8��H��(���uS�NUyl����h9q�x��������T
�V�HTx;�':��R��������0����<�H��q��iSn�S&G?q���y�����5�6�I���?��c/NL����\�8�!Vk��]���3�������_I���X���~�_u�U�m={���M������;j�h�����X�)����H����5����=qN)/��(pz��m����k���|�s�=��A�i\Y�����z	��<	�������L����?�|�{�������ZG�g�	�H����y���g�1��m����_�~�~�����h%�/���I��q�7����#��o~/����x��2��t^���l����&�������U4	
t}�y�E���$�����7��|����Es��TL����K�T���z�yxK�R��o_M;��3����h4�&���'�|��w��i�
F�)����GM��<�HMM���	�����r�N���z���tq���#�@8S^�hM�)C���.�y�����k�h��h����(iL��'�w��L#?�3��@k����5 :i��Yg��E#��g|H�����I�' =�����-�hZK�1`m����
X�c����\�*=���u���6�F(��^L��M�i4B�\��4l�����t@#�����i�[�n�zhH��e��n���(
^��5���o��f��hAq���S��/���d+��
6,�5����?#��s��Q���c�|0�f�c���Q�o�Q��[`1)G��.���|i���Yh<���u�]������j���ib:����s��c-�t�M��	�O���Q��nX[J��U�ku�W�Z�X�u��cV���_���
ZJ.`��GM��gb&����%�m��������4L�q�����_z��j�����Y�:�i����hT����;2�P����p#M�_}����%�\���h��|�A�w�����h^x��(-�"��`�OSO.����xZ�!G�^x��Q5�0�un���Q���+����r��?���*�X+a*�
��?F���b�3n�*��i��N���C��I���
�������c���G�
���S'}F}P�����uE��h��]F��5���4���|�Y��r+r]t�E�+}f�xa�d2`�������V�X�hK�VL������3!��"����9���3��6���9R�q���C9D�/�8.���o�x�z�Z�\C�V��U$~�Rc-4��������:�!d��o6�y��^����������E^��V������+�'x��q��3&�������V�\�>iM��{L�����={v��*�����[����L�v�g��w��c~&m�v����i|�����S�/n
����l!J�����]h<��_�����L�X�����	�7L��@��0
�����fia>`
&�I�&i[��Y3�[+�����a�'�@��1�~�@l�`�
&Z�e����%
k��1h4��R6��'���:��PT����
^��a�hy�����W�^�5g��`����W+��G&4i����X^}���w��'�0O�r�S�?�"{Lx���l^i�3���RD�K �{I�/�@{�����=]��zO^��Yg�=��-	�J �J��o�k��Q��0���������V'<�Q-&���_^?lUA�^}����bN\g�u���}����8�H����.^N�hqK,����j�J����H+&/�B,�XY�d���6�n���O?)�����/�F�.�:� �3�P<o���� ���y�M7){�gk	��C0�"��+[F(�^�#Fx^P��K��WbKI���$�����z�y��B�����_���_�`�^��R�l�z��W������>������gx	�Ys��S��
X�%���k'/���s��#f<w�9���8N��=
I&����
2(aa'�g�hxN�����V'u�Zdk���$�I=�W0g��DN@�N"�.��2N@Q�����#�[��$�s1��,��������/M�E�u�z*��gNS�����K�&���������t�/XKWk�9��U4��34&�<0`�#{I��5E}_T+���N�pD�!���I��0`-�tk��k��e�zRm���R�9�e-�m���N�:t����P�&	�
�b0`�	�.%`��(��zPm�*NEN�s3I�a���������nL���k#�V�EX�p�su���5W,�$Pn	��[������6����0`m��AH����]4`-�k��k-�����0`-��k��km�c�{a��d�:��kr	�h�Z!��ZE�C�%`�Zn	����&�����"�`��Tr��{����F�����nj�/M��kS%X�
X�c��������������~���i��9	��$|]Q�+)s-���r5`m����k}�s�^�Q�����6�l��V:^K})z cXc���D	�&�����~��z�-
P�+����4e�
X�"��)k�Z?c�������1O����{z|�&�l���'�|�$�����n��vr�+#uH�M'a��Fm�Qb$]��rs��q����c�q���y��sQ�GF���C�>}�{�����pn�����7OKI:M���O�?��M#�������O.�`��;��i�:��v�>I7�d���\�f���$�\��� 	��m��I7�{��6�B}�Pt�{a}VB��6m�Dup�\�}�YG���?��=���NB��o+)2Q��1`m!�W[���id����Z�E?���=��3��_��G7�`/��4��H��������5�������[#v������K�7��*a�2����^��w�����@�����]jQ�<�}���=��4��[n��[o=-!`�;�0�*Z�����>]��p����7�T^����!0�����Q���K.��7�{����@�1"�,�����'�+4c�/a�<���b�"�ZcU$�7qW���
�#/.�X��?r0`���B��J&��Z��S9m�@��3-��\��i�h.����������3����a�$(6�i�����h-T�����x�~��
�$��� *�>�s��lZ���`��Q�Fi~�����g��u�]7*E�q�
����5�\���.�����^��=�)e�%�0v�\sM����*��W_i���i�r�]wi�s^� �L�w^��-��{�1��:O>�d/Z�����4�`��L��<:��6���mI�����^��~��Q������_|�EM{��g��[o��N�����a����:IK�nj2�3��<y���4i�N�C��&L��D��4
`+�C���1������]u�UQI1��8��M/�i��h�v��s!�V�PO�B2�\����-�W�[�E��qM�}�6v�X�<���}s��OG�yqA�Tv�>����A.�+�X+ud*�]��5-��\��9r�%���2��	�����J5�(fF&S�N=������^�z�3�b����!C��o��V����)��)��"k�^��dI�i���g�d�Y��?~<9�u>��KN����[h���@X�����C�5��E�U�&��q�>�����]v��?�#=h�+����B���a�Z��\���o���Nt����-#G���Z�Rs"��k����`zP�.NK^���q��5�����I�-;�9��w��Q�4��V.i�(�8�E!�R$__r��$����i���k��/0���O�h�k������4}y���G�n�B����9�b�5$���t%������w������=04q�D�P�@q`E�O_}4e�ub�`M;���<��l���<
��.�t��+�(`��(�������n�"i�5�\��QR;��r���������j�+���?��s|���u
{�UV��-���`�jy��O����q`�/���w���7`������XkmD��\�
+���!��1Y�\�:%����k��f
�,������sL�qs��_��*�/����E�1������������y�2�
<��3�3����Z��g-�v�-H�����@J#<���m
f��t�*�
��r�����t��h^& @6�e�H��K�4Hy�?x�c����U��h�f
R��j��k��\��������v��7���n��T�>�)�52�����>�EK9�����_5C9;7j5`����?����+���j�lq�=����<�3�s��(�b�����C�d���	|s�`��G��e�'*x_v�e���F.lo�L���B����Gy$$5�.$S�[�/
��f���m@�pJ��h����/x��7>{���g�\���X�de��&�#g�
����.G(jZ%��5�J��k�k��I��(	Xi�w�������x��>�(j'&���������/�IQ�D2i�9��#��;u���h`�d��L���t��� �Z���>x�=}C|F�)��s�1��]v�E�2�0�,
���EoW��G���`�|r������@M���!m�u�I�B2}���S�%���k4�'�x"#��M��;<(�/��
����O~��B��������n���!��A�m��o�J��n���$��F�w��������!������5�#?s'����gb���<��������drr��1������6�pC'���QZ���~F�������'/�S�����o"��������v����a>�d0-�&�La��/I�"+N���x!g�a�Y1�b�pb�w�_��z{1E+:���T��S1�3`���h��:��e[��v9����N4���,�I �XS
������ t��
Y��8q�r���f�&��I���d��iF�5=��;W������$�8	�6Nn�V����F<���	��d�@L�1a�e�XES_X�k������k��Vo�X�m��k�� K6	�$`��]&J��5Q4��������z�8	�6Nn�V����F<���	��d�@L�1a�e�XES_X�k������k��Vo�X�m������i?r��{��8b��u�]	���d�#��+ITw�	'��(����z��B��*�Q�Y��$�W�y�c�d"32	h\�K/��E$����?,�I=��P#B4�����P���D�9�����`��\#�@1�����V���C��w�3z�����e��5kVN`���s�=7�V��~��z�}���7|��U���K�)�W��X�z�Sw�L�Au����4#�y������?m��h4�t�����K'�����^t���q����u�]���*��C�U)�"m��"V��
X�t����XEC��I@��>}�F}��j�85r
k��[��t	I�Qm�!��6�/����;���d���SO=��=�I(:=X����N���/:	����!2��{���8�^#��4�N��X����s�i�%��k��M(������O?�$����rNB����#���C�g{��g������9s�h���w�qG'!���I�r��:H��	m�~������5�|rE��/��`�n�Wt�/�_����H�����e�i	�2Pyq�8� ���-�y*�g����XYd5�JH9���o���W���Kj�8
i�M7��n>|x$Sbr�O6Zc%i�v�4����p��������=k��-��P���{%�*m n�����b}
�Guq���m
�k����9��q�WD�r]�s�=�>4H��������MZ���gyA�x����i����L���u�a��{�yy1��G����0`@�.��m~�\y(a�����1�4i��r�-�W�~�k����u������'����XY�$�u.�C�JO/16u�f��$������y�����`�k�^>`%�h��/����|����Ub���'j|��mOc�AH�'�X�Y��x*{������jZ��_~�E�K|��������8hv�a
P����#�;2�<>��#������_T��
��-�=�X$��O>���'�����4�<�r����Q�Fy���w�q&����0`�Uv�,�d����\��d���H���v�-���=��(ML��/�hO�K�hgqz����|V��������F��z4d��x����\��������	&D@����f_HPu��s�(y���Z���|�I���#Nh���w�&���I[xim���4��|�����M:c`���\��h�z��q�Z���k��Rv��5��j=[>`����n\a'�(��w��|���o�V
`���c������k=q`e+
�!��y���kM��(Xe]U�`fE�����"Y���P�����+�^��6�Di����K�"���
2��������]v�����A��-W���^�ye}���wFEupc�Z�\�.��@����X`e2g��t`el�Xb	���)X_x��K�n�B�����v�m��������a��)����+��i8�y�<� ���"��(4X�+�K~�5�T>����wRuu�n�Z��^t�
X�Ym�d`�q)N�h�_}���Z[�l`e�`�zm6�y��jf�NGK�b�-��P6��s�9��O�>����M;M9eJ�`j�&x��k�\_y��'2c�<���v�m�U���k�}Q7`-J\���X`�C0*��b%N4�����l�`�d;���T�8��S������E6�� hix��	G)4�\�G0�w�a��1k���
��=�h�R�E��>�(U��L�i ����e�N"���!mC0'�ss�d}���j�3�8#TW���u�H%�Tb��LI��	=a�k��N9��t1	b+ �����p�m���s
���zkO�����&�NCQ�>Y3 ���T
k�8
k��>��S�����za\
@�P����^��]�7����6���e�o���L���%�uU��p���E��=���I�
N~Z|����1c�_�<?���_{�����h���������;�mL���������k�����$�"�}��G=����1��/���������u�"�U�Ys�o���������/���?���u���V�Z���?����^o�|� ,�m+�L�^��u�]�����@��k���A��x���qM������#=�K���7+�J�_}��m[m���vp'u�����_���0����;�8�����#�/j8����G����r��q��e�M��d�jh�����]{e�lx��n������FX���J�J?x1�/��e�]�#��0`��B	��K2�9����R��S�D�s8N�#���R�~����x�j�����)I��D[�)9bIEr��f�d�����JN�������iT��;�B����#�[3��o�[�d���xn��A���������-W9 B�, ��R�8��(Sv�R�<�.�Xs���R�
��hX9���$P	��C���������Q=�v`�l`9B�L9��Q2�B&�B0`-$!{�X�w��f`��`w��:qp�����5M=|���$PJ	��R������v����U3���u�8��j��4�6	4U�M�`}�7`��q.��j����&�I���D��q6�5>�i�g��VR���%`�Z�������eU�9
Xkzx�s%��k�Y�lXk|��v��5��,_=K����G?}�
X����s����Z�J$�	��������^����$��{��8���u�]��$������(�$���\H�'���Nu�������
$`�Z�TM4`��A��&�;�J������M�:��o��M�6-	 ���}W9�?J/�b���N�EV>��~�1bD�,,K�����J�7`���*w3�X�����g
g���s�����$�x��Ab�:	[��������`/�;9x?o>{�|0`m>YWsM��<z%l��s����������bvw�}�;��C���/�<��Dr;����2e�~'f��*�fw�Vf�Z�CW��V��H�T���?�D/!�2 !����sx����}E��j���3�q�/��;��$�YF�\7�{�������	%�[z����<g�R��9	�$~�#2���sk��V�����5W�!�P�����[�_�����\m��4jL�n��{w�^zi�+~����u��Ae8j�(�)Rh�!�
p4�-!�T�
��|r]�\e,�W	��*��#�4��,O�K�x�2�7�(�>��k/��f���o��t�A~�E���z����X��1%V(�Z�����<����v[/@�1@��J>9(�A]��{��G��4Hc�������_���f�5JOu����K���^/�aQ������+�X�,��E������#�h:��KP��o���^���k��QX#^��{��������y�����|�M�F��eK\��N:I�W����f�q_h32�M�,k�h�AL�<&�z�L�1c������O�;�������'������{o/?
�����D�5`�&�/����	X<x��
U�l.�D���F��3gz��|���5M������Z����l`�Q������_Gy�
�K�Tm3v�uWm���~�����9�6c�
�~�)�Di��j��Zp��}Qh���}n	�����fJ��5Su{��b��[o��oY�
�"A���^#YI�m	d�%Xv�>i�$M:t��0a�~�V�����>��}q�1����;G�c���rb���=�XM��(
�L���~���kz�|���}���/k�~��WA��8]w�u�+-��<*����hg��&�5!���cRYK�U������%`��,��z��A���K86��0��!������&��$�������~M���2�S������s��
��;�8M�l�>}�hz0m�����&������_�;l��x�����wM0`@Fz�M�X_x�}��`�%�������3��=Z�1	�|�X��=����~i$`��FJu�'��9��j�JM��q���5�@I�*�St��?~����5����N�\u�l=Q^q�JL�P���������k�~a�
h�q
�:p��xr��l`�0��w�y'#�l�g��1_'Q�qL*g��J���WY�U�X�eSWO��U5�I�I9PZ`
��7�X#�>}zv����	���ii���g��m����_�f�K�i�u�Ygiv�}��>�8�����x��!I�Yg%'�$J3�Ie-�W	��*�J��k�l��I���	:iO�<Y��������SP����� 
��-������2�1�;��C���GZ\��hlf����6_Z`M�/�}i/�[����j��|�'0��|��(����i��i��q�~�����3���cF��)��b��,	�f	�^o��5�=�2s�
7�6����D��e����%�n���?z����V^ye��gz<Ze/�����E��"��`����54����l�S��*�����'�t��:����w0������i��������0���_o��4m�u��b�u�N�:y^4.��2/�V�2�,�^���!^$h��x6�4��]��J����L,��X��.S��oR@�B����;��^��rH���o#/������d�z �Vy��u@<b���G��b�-�k�x��2����{�
�	p�����g�M��4M>��������{9��H��t9�!����_l�a{
�^��3��	'���]vYO����Dr �:�Qo`�����y������E�y������8����kaY���%�����w�����/���������"&�����l�����L� T���_~�DCs���Y�����r�-��F�����q�Y���2
��I�/d%{j����7z�T�C�j�d����I���[=u�H9i�����Z�c'/�����o���eUs�����8`���f������J���~
i$`��FJu����(1D�g"'���QJ�/��=0	,�����H��5��� O5+&��n�I#�����~���8F9���C��p�����I��XI��#V���	X���r�-#'�N����X_M�/�����H��5�p�Q$�H�}QM�Z�#e�oI	������n����������5�5"��2w��������2R�����kKJ�z�6`���*kK
X�*^c^#0`���,s7X�,�jao�Z-#e�lI	������n������)�*���s�=����>��r����C�I���p�����J^G�������`'g�:9�������v�P����S��d��vrr4c���������bg�Z1C��
i
�r���������6���#�����M�:��o��M�6��u���p����1��>�����v�P�f8N���Ow���9�������K@'g��{�I`w'���^v�U?�����Lz�;�aCs����-�)����Y�� {��E��/�h`m��*,gG�	���P��{��cs��G�����O>�`���|c��W��TP�,�M
FK6%)�MK�)^wR��x�R^u�Q
���;w�\KP�n�TV�,���K�8��@��0��������0P���3���M�>�IP��&��W\1c�����I�U'������I�5�_k����-s������NB�5�k��g�V�D�	|BF������L������v����i���&���{�==������S���Ss&���v��I���M��+�������s{���[z��3� ���M�nw�e�\ 	/�k��a-R��iz�OR[��8w�G��vJ�;��uk�zCyx�E�I�\��w��:q�����t��{w'��X�������8GN��	��-��M4"��%`x��,��������>��|����9#s3��)��]�U����j�@����w�y4�#J��-��R�u�'	k�e=�@����#�hW/��"/!�4���!����h�sqF�D"��C�����_��E�G|U�(�y��*�����v���+���']x���3������`�
��i�"Ac8p����M|������Fl	BNl�A�i��������y�qb)Kpw	��|��JL�����o��f��I�O�[o�5��E��g�]�v��p��C~����C������%6��H��|m�)���X{��������g�}�I'���dJ�_|�c�_b�%���H������
���mk�%\��\���^$��Q�t������(���x���g��*�G�~���i��<y�'���{��i�4f�M�bM����?q���\�*��
j`���B��$b��(K��=���Ub��'�xBy���y���Tb��=+�����k�h�~��6�4�&g����W^�C����O?��}��#�i�h��������_H�k��B��Jq����N;M?����@}Xg���cs�)�D\	��\$���;w�g�y���O<��9R;a���-pa��B��*
X�p����$`���t�����g�������&��T�`�!������7�x�����;J������92Js����93J��bv�b6�&L�>A;F�L��w�������bn������1AXy)�h�pu���@c�����_d1l���M��6��zk/��z����^��Q�4m
��V�u�'n$���R����k���>i��n������@�]�t����e,���kW���I�x����������F�r�b�Z�c��-KVY�TbB�5H/kf
�&�8����:9���,|0�
���)��Y�|�.��l`�uU���-1����0�����i>
�7?>���Z'��&�a�i�%�\e9���4��;����h��G��\���mkV^R���K���([6C�����Y�-�����e/>+/eK.���I��3�a�Gf��!��=��J#�J��l�ke�K��*	Xi&C����^\��y�X��Yd�Et��?����I=\1��V&k���B�\�o������[s�K��'Y�!���[��N��f�V�Z���uYL�?�����5X�����{QZ��E�
\����xU���:�x�y��w28����cE`�
X3�c7U(�*�r49	X_y���=@��.�IDAT�����L\5*&��n�-jF�X�0�Y����w�a������
��\H�l
�rw�y��;��k��P�P3��B>��P`0�����A��\��5m[���(�x1Mg�xg�����x��vgL�qb��t�� ��t��%`�Z��V�6'+&E99'���k��Io�@i�U�	��qS0�P8�� ������A�f=7N���- ���kL�L��rH(�L�Z2�a����Yo%-hfxs��@ �I�;Px�6��qR���P����S'����c�H��������jRR�B~���������k��y��n{��������mC�n��!��B	�V�����������Aa���'���<�q�	��G�,������W_�iq'�>}�h�2[sXkE��s�#g)P^g�u"~x�(|�8f=Q���������k���x��u�w�����7�<e����k���B��*�`���_�I�V�m18����`�e���n�-(8����8M[�z3�x�2�����K����o\���2�Hj�����d���C��H���e�YF��1�C�~����0f��S/V�`	���9��ksJ�z�2`���+i��+k��<����
�L�u�p`m��p��V�����
��p1b�t�<I�9U��;��Y��w0k�lo��%<����4�$���,�������2��4��BG�n��&5_��<���w�j����	2�
x&��J����:�cv^����/�P�C?���|m���PB�b-k@�
r�E��$������l��l��m�m@?�o��7��aOj���������r�����ojv�P��O���C�b�"i�-"����7�X�!��\I'/�v'�4�����K*)N
S�������S�P�l�p��FO[�W�����`=�I�(�em�L����UODj�0e�*�Ds�ZB���An��?N��������m�^j��m�x%78�Yrr'We�p�]G�=<��"U_�����/�r�'D;f�������Tn	����&�"	X��8"�t��I��pT`6qT�f�d��Gvoh	������R����4�%�U��N��N��j�1��E[/W'�?'�6N<���a�jh�X!�:,b�Z�����-��o���;��r�~ ��`y�<v�����$��0`m�!���V�0���-��W��C#�����+[LP�p���$`�ZQ�Q��1`���i���$�6oO�6�@�%`��x��SI�z�<}5`�#{dX(V�)���k)�A�:d�b�%`��d������4`-,#�a0`��@	���R�1`��A�.6Y�Ma]00`��a.��|��!
�-�#��r�-W���P	p��������N�?v���a��pr�;�����u��Ms�����������
N4`�������?��I@cfr�o."���f=����K�3y9��q�q9��u�����A�X���d�YL��hv�*���
������!��1-���C�i��9s�h@ek#�]f��U6�-�� ��+�����h�
���K�b����I���t�J��j�ke�G��&���Q�%"��O?]�jZ��l�_���hd��N:I�i��:r�U�����r�?g#+<�����B�gk��d�o���[c����s5(����?���{����~������$���w�}5�L�+����{���h��4S$������F^5j���an����"�|��'N��:����N�����O?��=���N��i4��8oXb�68���<����{�����D��;��XG���N����4��}��NL�*�|��Sv����$D��pt�~����C�tw{Ue$������H@'N�H;N��?�2�I���r�zok��:�%�W��5��%�Kc�C�5n���y�m�E]�%����<������������7N�kb�%��8�x���y���#>;���?��4.'k{{�����6^�C��U��r���������m�~>��A�4��a
���b��&&��4A�	6. �qMY%���p�M���q�'o4�8��o	�����GE���6^�k�o��FQ��H���p���!o=|��Z���>�)��2�	��g�������o?M#x��S��]����[�D�T�����p�y�E2/Y�
|��Qc��Q>�U��0�I�i�c�=�y��S���` �n����>$��={j��/����� ��_��9��h�M/�����6�x�����O<���	�3c�mm�P{��!����.]���h�<��>�)6��Z�
Xka������*j����G}�N�Ac%}��A�6|�����x����&
I�r]�[s�5#���s�*�BX������y����[��S�j��L�?	0��"h~��Cpb}1/�+���&MR>C��T]?��H�;1��be�s���G6�p�	�l�=��~��;��={������J�$*6�zO7`��_@�����NF��[c%�8�y�NL�*�.��
<X�M�y���FUJw���;�5�G�NK�����?91#�VX!�����D�r����j�@.���7~�x'[D�K/��Dk��G���[�����X?
�Z#k�b�u�����ub�uO>�d���-/n������f<#d]�>}��K/��^�1y�*��Gy��/qb���wX���v*/��6��?�]u�U�V���:�(G�Y��w�m�����������=](�t�k�j]�4�n���v�XY/����$�#i�!�hx����<�%/NK����b�����q�_~�����?S�q�,����):N�������o�������;v��y�BR�}���Z7�m��u��:2b;
�?[w��X|��<Z9������������i��ed9����`@�V����V<e����*��(+f�@87�k��+�h@�-(�AXE���Y����c�F��B4F?}��p[�o^0eS?{������_8W}��G��u�9�$/6"#{�Np���I �X�I����Xek��D�X�����kI�'�E$�Bxs/��(���o�i8EA$�L��4�m��F���B��F�9g�@8H����s������������/!�����Zq�����b!&s=9)��ua��V��:r���S�
�Ql�xY�n(�X��RJ����L�2%���G�@���������jM�uSMX����6���{�S�t��I=^/��2���u� �g�}�����,���;�h9L�� ����P�<M��o�fj����\JL�\�|��W��^�O9��6��p���"��KE��1�R?f_x�a�B������O�I�5Um�m��MlN��������H��5��� O.`e�
����Y�<a<h��;b�����������W�5���s�j�h\��%�S���h����y��^�W��C"�m��n���_�u��{(�����/���q���H:@�cB
H�o��|�//�����\����M��\N���ih���\dH�t�*?�y���H�?�����X���F�,3�Q~����G~����������dR��a+����8�DUQ�`9�"J�3V�:�/�BT�����<��HO��������s�xe�� s�?�x�QI����r�}�N��9#	��'&u7d�'�YOk<�9�*�?K�U���,�*Y�����'����� �t`�[������$`�Zc���6Fj5X�������'�`'k��3[����kMk�;e�Zr�V'C��7ku�J���y�]���V������%���I	������S�%iu24`��q�V7�X�W��Z�k��\��m�Zb�����kMk�;e�Zr�V'C��7ku�J���y�]���V������%���I	������S�%iu24`��q�V7�X�W��Z�k��\��]��*���O<���'g�:N2j9	p"�D�q ��}����pr�e�5��k6`mf�Wiu�U:p�nv�+��	B.����?��~������+B~������k�u�H���3YD�����"���a�Z�C�<
�T`��S�Nur��k��R�"�>p��,1eS���L��0����k�e\5T2�J�q��N;�V�/I�9����RKUP�������V����H6�����o�q�=��{������o�$�x�?��]����%t[�����z�������-�I`q������VZ)���"X%\�#m�-�tL��DT�*���>wm������^��9	5�x��s�����o��f���$����vyS����Kj��G�������id��,��"�,�:"���5�I\'��4��:�����k/G��8I�um/���_}�E�#�cMY�/����l�������������]�*��N_��X�e���O��32	h�Ob����~����3���P�l���Po����2���6��K3�Ai`�[o�U�_t�E^��i��s�=7��*?]
(������6J#>+yw�e��v��5�|z����b���^B��#�8B�i�W\��'��@���%h�8G����k�����*�J_;�0���x��c�9�K7�7q���}�������������Q����3fh����?�!��I�Y������o����?���T����
��w�y	��O8���+ �	��4�"X<1h����ri�k�]�_[<����t���D����`���U�'����>�d��=�d�1c��}0�'��@A�h`�g�e���Q�E��{�����5J�%�	��D.�L��_��^��n�M��u�]^b��?�P�l�@���I`8$|NY;��D��b���Fl�w�����7�x�w��E��{��������������:�~���^4uM#h�h�`@J0��>��j!�<i L��@b���4~�����\E��?�����P�F^�'���g���i��E����5~a�Z�\����H���&	Xw�}w�X��3��[o��W���zk/�M���_}��=ZZ�A��d�F��o���������h�qX��hzb~�&L�>A�8p`�ht�	��}�Fi\���w��M�2E���IL��Ny1������N�x$fvMC��Sx�s�&�����O����M7���o�������z/�C��<��X ��������^�����/��t� �6	�*�_eQ�W��������K%4D���&U4*��@���N�W^yeHRm��������r]`���G�c4.����j�r�������7����"�R�����X����3`��y��{�����o�^x��Q^�y`��S���}�YM�5g-+k��l���S�O=L�h�q��K>@��6I��U�w���Lz!yaa@s��'�K�����S���������O���iUr���t��g��ul����U�V^���Z�k�����X��G�����z��[�Pe��`�}�����~���;�)��S'}6d�-�Xy�I���Jy^p�/��&�8�S����9$�K~�5����@y�b�������<�xm�Z��Z�>��^�U�1��,����$4��$��@�������=�X�+C;���w��1Qg�9���<��������G��_NQ�t�!�����U`e
6h���:2�����:+���}���g��c�s������q��2���d���]�2%`��)�����"+7����%1]��$@��x��I�O.�U�)�r�5���_�������8���f�����m/q�����;��>���	G)�A��Z2e��T�R>^(��|�-�h���?k�C=T��=���J�������z�����+��-�%�s���4�b�����9Na�Z����k��4����t���$b�~&I���$+�gA<h�^{m&��;�s��n������2c��U����}���#a]}�������Gi�.0gRv�-��l���p�=h�����2���q���e����(QV��j���������<��{�QG�$��(il'
�v"��@�4ip�t�%����+�\���i6��'������'��i�~��q�+m��������>����4�B�����~>s�LC����G��j[���k=�r��h��t��$`�shu����N���F��I8h0h�L�8��/]���1��#F��}�a	NN���o����uN�28&_��8P���a�&~�g=1�w5h���D��NCh�����'�����]vY��/����1����jh�X�1�v��Y5z�11O�4)Vd$>x�����
M�����Q/-�	/r�E�
`�<���e0��Sy����l#b{`��X��yq�2`���nZ?��%�-��+t�2g=p]&�"����������^�h�x��r�x*�u^z���������d��Q���qR� G�q���p����y���TOk�D'���lkr��#@���1!�l�q�}�"��Ae�iO������K4b=A���8i	9�<��N���jzxK�9���������a��>�AhTy0`��1���V���@�X[@�YUr6�������F�'���Jl�k%�J������R�Ct�.��XS�y*��NNG�������kK�@u�o�Z�T�V��]�y+ �
��q�#�`�iv��0`mY�WK���2Ren�k�l�kB�51�e��k�E\�V�8Y+[V�-+�j����ZF���4`-���}MH���&����0`-������'ke�J���e�_-��V�H����e���		���0���equTP*`�d"96�=��Cz:Q���TkV!p����s<���	�3����B���;���N�Etr�n�����$`�ZA�Q�Mi���V�V$����b�H~�,�?���z�l��z���&/�l8OX�����4e�3�xh��U�������j���R���zK�.`�8��+�?9�������%��k����GP��o�=
5�/o�gr��������W���o�p7S���F��/$'�j�?�p'`�$zK��e}~��w;�!�$j���3e����w���}���N��5w�u_�����'�J���T��
������%���f�]x��Ab�u�]w����N��9������^}�U�+a���G�2��%�F���e�Otgg���^|�E��K/��V[M��t����s�9��K/������n��9�sy�$w�qG'�����.(��-d$��ch��|kP$M{)O{%nlT��=D�9�������p����u�O�(qa����U�\s�k����&���v��3�\�� ��I���t��iN��z[?�&�|�`���L��vz�I'y	
�%|�����������]�z	K��W��r�y�%6������j������}���5�*�=�V�;�0����SN�������E�����M�������n��`��4��Wq��W'^,��r�}�'��8�!m��=�RK-�&p9�	����'O��U��
�k�y����w�Y���t��Q���qo��k���8 
�N�Xb�h������8q���?�����K'�Z�dk��<�E�-	Xg����"�6��	P.�!Y�i&t��������zD�����w_2���:p��@9�����>I~�]w�|����4�:D	��kx�I��x���O�@?s�L/�H}h����i/��s��J�S��c�i�'�|��q�}�i��=��s�w|���t=�"� kTZ	��V������VG��~%+��������1h�b���P�'�|Ryf;��p�2e��9��#"�\���>b��b�U@�jF>1Ik���&M���C��	&�G�_5m�E��<����c�=V����?���j��G�����&W~XI����E��������;�lq�h�b"�b��d^x���J+����V�������\��7��K.�;ZQ�d�O�1	J����hiIt��k����?~��~�6lXF������{�������S�#�l���� vs�q�i�{��'��}�>}4]�p}����9m�+Z+i�.�lFa��.��c�����A�a=fi��K����2�E���8���S.`]�`�N�L��j�4{�l}&�?P��kq�r��`���=z4x��S'} �!z�&���d�~�Z����}�M;m{��������%�����0`-�<k��k��l�����Xa�t�{S[&m&p��V�(���M8��9r�H��)4�d�>c��Yg�������-X�;6#7�7}���!!	X�8����L�^x�Xq\
�YZ����4c�|Y6*�XK+�Z�f�Z�#[d�����C�I�u�8������(��M7��r~�q�Z�uA�[}����b�@#���:����E�W_}u]S�<��'y0Y�v�(_p�MP8��� 
�w�v�m�e�OHj��X��0L�8`�m/����8L��EG�����/�[84g*����x��l�%��k	�X,X�`��t1	X�����[�� @���W^9�p ���PR������w�%���7�eX3fLT,�6�:��(M�yj9�+�i8��m�V��������_=�i�����y	�����k���h��g��[xd_���F���J��b]�qG�4��lX'�=��*�#�v�aZ�!�����k���?�i�(�X����[<k�h�l����t_-l�,�&��.���0�d�R����n�fa�<����b/!(c4J��
L��-���������@
Z)e�8�<��L���b�E[�C��u]�����5���+��+��a���K>�+�_@��L��(+���V�m6���4������9d���o�����n����{Di�?���L>�t��-���G��Y�������=K��k�l������d�2Jw�����"z��k�Q�qj���R���bJ�1��sF'&�\�Qm��X��%d':4���F����)7�W�����dO��`'fc���S������7W��io������,�������mCz2�L�����J/;y��2�E���8���S�#
��f�������W��;������i�p^����C*4��8@i������%`�Z�#\����F�U�����bb�5L��#k�z�8=9YN4[7�b)�$�Z������|��*��
�K�r�x�f< �����W32�M�I������,2`-�X���k������%`���2��
X�q���f�2�X��XknH��!�������V��Y��_��/�j����G�m6`-�P�e�I������,2`-�X��i5+0���A�@���e���Jn�k%�N3��R���G�=��;wv:t�)9�P�u�x�
=}(g�J��V�`�X��%��$-9���tU����_O7��z�@����S.�
�sp�=���j	'�t��o��<���9�}��~�M6�o��Q���E4�+���w���o��� ��q"��y���a���D ���������g/[�4j�>���	��D�'�-g�� 
�����%�|�3��N���>�l�l�9�P �R������i��4pC>������f��9��I�+��#�wM������_�cW���V��p�;��a��_���^w�uz��Q9���;�?���Q�4G���(���{����i�x�D����i�����1#g�b�����^�vQ��+����#�!����J�~�i��|�-����o��F�����Px�]W_}uvs<�'ByZ�n�����?�A�b����b����E�B{E�DX��~��=���4��\'�����rv���!S�	3(gc������
���[��=~%k}.`�9A��xx��z��#``��r�hRBc�1A�(����;���X�yr]
�Ih��I�Gi��#���C��|�n���Z��w������b�9���L�W]u����r����|'�j������\9�����1kA�#�V�ZeD�Y�*��R�B^��G~��6N���lS���>�x)$R��Y���}�'�����gg���q{��D�������`�hD�!�?x����	�����?�����k��UY[��T��ON�Y?�N��e���_����L��xP�������)SBR�o&�UVYE��eO��{OB����w��<N���,�������=:^m��r�7�b��>e�H$�)���b�f?��O�O���������yi�����SNi���0��M��C��|����8N���P>�e�]<�t��[��=~%k}�����L$�`�mD�C�)D���V
����:u�;6/�k��V��5NW\qE������h)�����O�U������u�{�����;�c�G��o�:?���OZ���]o����I|�B��S��,���������?�o!��d��V�e������0a��'Np>"6/����ku�_�Z_j`����N$�1�����K]7eBg,�.��m�m������;��tL�q*6���m[��cG���^{mu��e��KA��z�+��oJ�u������_W��i@HCA�;���f/�S�����|NF��������q�L��(<��@X�N��|�}���t0����j_��?��zXj`�@������ �Si�Q�mF���x���<��#�d/[{�,�mq*6�%�\��>3	3s
�f�).Z=fC�'ek�����Y�S������P7���M�����/ ���K�Bf�b�7��x���$��W_}U�!�KE�U��O>Q�@q`-�����O�vIFu �:�4],%��>�6�0a����G��zH'�V�4m������/�I�����5Nl� ��;N��?����`�6
��7O]��g��0�L~��$*G�u����3H�\����������������r^��)��7�<���x�X?�a/s�f;�E��-��B���5
�l)�6��jT0`��q.��B����k

k���������o?��{w�X��p$aB���(����={jIO�+��7�pCF-��$�����I����3����K��'��L�������&�����q��k4,���8y�@F9� ^4�u!��Ic�+'s�RA���+�`�i6���-ZPX���E)�*�_|�E6[��a	������|��7-�N�S
@��/��6�Y���t��(�'|�icR^�%1����^9���5�w\��`���b��w��e{|�d�9
��oS>��vZ����oReX���da�{.y��Tc(�8�����?������0����h�qb��~�����?�z���j���{��8;��	��� ��b>`����N�&��o�]�Y������8����=�l��x��\�4&�l�4��
+���+G8�	�t������+��Yg���6A�����l�'��72�gj�<���/�dd��9�'���~���>}���l�R��d���/���/V�\�7���ig{o�YN�������,�.���X������O��lE���Ro�4(cT[0`���lto�+,K-��N8b����A���^{-���U�3&��7�\�29a:�<yrF�R�<��3ZG�.]r���=�������0���b�������������}�'�pb�QG�0�o�m�����Z���P��i�������m��������yIC��S1���������O���������CO���Q���s<x�'fa����c6��� �9<����%�����Y��@IDAT���TE�������kB���#����P��(�ATP�*������0�s���s^�~������^���s����;3wB����s�tWWW�����P��D������'����x�/����9}��{n����)��O?��}�F���~z?l�0���{j�9����L%�������������m�����~��v�d��
����R�����~�������kW���O�aO���s���������O?��M���J���g����w�3�8c�L�l����wr������[n��|���3���L��S��97������O?���=��9�/���9m�c�9|���SO=���k��L������{��$~���;����O��$�x�������]|��n��Q���po���{��w�"�,��^zi��S�����2�,�f�i��|�zp�����7���w�}�g��y�~��g����s�����o�����'��=���Kn��fs+����������C������}������g7���G��\T���������������+I��_J{��KVA�?��C�+��o�s������on�p��~�[c�5��/��6�X��{�7.�f��q9���{7���v�5���[[�������������f��Vl�5�X���'l�k�4d�M���� �X�h����@'[���v��%��D7o�kqY�����h��5��Z �kt�5��0`m�
[��kKts�F����0
��o �X�h����@'[���v��%��D7o�kqY�����h��5��Z �kt�5��0`m�
[��kKts�F����0
��o �X�h����@'[���v��%��D7o�kqY�����h��5��Z �kt�5��0`m�
[��kKts�FV{'*�{����\rI��J+��g�9��W_}�}�Q��V[������0[o���v�is���i�Q5`���=W[�
Xk�������r�-n���v����[l���M7����3'1;���t�����;4�!�x�m�
7��v�`��R4`�Z��Z7�k��}N������w����<x��@�Z�x�����#>+�*A��xv@Z���/���4i����K4"HNevchP
�6h��Xl�+�^�K�=z������^���A��v��]s�5�O�>���O>q.����G}��M���i��5`����W;�
Xk����)�����4���v�����C#�=�\7d���B�?������K���_���4�T0`m���ZcX����b�X���k�@i��������U/���.`�~�����S��?�h��c�M������6�����l\f������Or����������2���[7]�������i�.�Y�����k�k�t������o����ot�
��H��^x��g�}������<�L���nj
�6u�V�q�Sec3J�������
�{���V^ye�����{��'�����V-���]vY]���_�Yf���b���h��5�R,��X���5��m��e0�}9���J�����:��?�Y���=������<���Np�n�ik*�Z��0`m���h�X+���eVXC�X���[o���EY$$��i�e4`��2]�����K}�S8�6Ok�%���4`�Z��Z��k��xJ{
XSc����Xc���T
�������V[k���kyzk�R����)�5`MQ�%�b0`�)�.S5`�����z`��Z�m�-O�����J��Z�����5E1�l�i��5��L��k�jZ��kk����<
����V+e��j=��^��X�i ���2�2U���i����������]t���$ko�t�k��E=Kb�Z��SC��X���^w����v�����+���{o
���o�X o�fJ���OoVX9�������8�7����O�M��4 �id�b�O<��51f�/�`�����2q�D�3b���<�<���~��^���7����������SN������-����z��'���F�Y�Gq�?����|�������������g����~{��2�x9���t�Mmd�'���v�a~��6���o�����������+��c�=��������<�9���Zk��V[�K�&����'�E���~��ff�a��:�x9���w���E��)���L3���%�q���7�x;22
�����?~�-��+���~�������$l�H�Y�>}�j�����>��(!�t`9����=���~��/���K�.�Yh���~�YgU�/��~s��c��-^j~�<���*/��o��&~�=��r�rN���?�~�%?Z����o���/��B*��3�]#�i���G�K���{����<���[l���_|q��j�x=������?m
z�w��I��+����$�O7�t~������{^@����v��3�w3�s����5`����V
��,�m��V��W_}�X�UW]5��@���7��~��E��]w]����������kt������>���{��;���BR��5�\S�+���o�{��Q����g������z�������V���5�?��3�`�������^�l^�.��R�B���m��</XI*E?�!nN���w��W*	/����${����������Imx�R(�u�����3~�X�����O���H�����.�����O��������M)����^zi��RK%����4`��`V-q����p���G�;_|�6�������:�\s���M�D.4����/�$���x�y����7_��9���_T�l�A<����S�_x����R�SX��)�,@���s�����M5��V/`B_$]���l�,���z*�H��=���#����l�M��������~m�[~���4`���K����B�������z�/�K<��P�
�i��������UY`��b���|�m����������<��y���|�s���k����o9�'�tR������X��	�f�QV���S�z�����_��@Q����N�����g=���|��-H,_u�~�������]bm������g�-��`�0a���k�����Y�?�����Qck������b�WXG��s���/��R�M�����z�<�H�EV�$�����5We�J���P�w��:�w��U��s	`�� �a���W^���s]���+y�����(��@�T�~����V[m�������,R�5_h�\�{�����;8����Xq�����p��R��e�G���WU������,��m�����/�d�\�E.���$A3^,X��sO<���-�|[�J����Gm������h��u���U����F�Z����yV�T��q�x
��zp������xY�q��3�4�^'���������k[��M�2%Z�K����yUVVN��o��qQ�\���
KhH
�6d�U^�J+��l�4��}�����;p
m�)�e�,��e�r�-Yp���-�3x���������L�F��_��[M������� #VNU����x�a���=X�I�|�������<+��}��7���nO{�M��BV(��,�c�9y�-F��b+�a������Gy�K����%6�X���#p1`e0�7�����K���6�|s�����@�u�B�,���9��q��n�ii��W���\rIN-aJr%h����
���_?���!C4��`
�_���<|W��q��k,2���R�x���I��Br��r�+'s�TA�
$^�Zq��f���~!��/��"o���b���Yzcj���1���RVV�2�$������m��{y�g���|��S��0/�ku��7��.l��c�=r��g�K��^L��\!����,t�UW)�a���^���U�g�vz��9YJ���V�M���U
�����_Z��Y�{����'@u��A��j������������[������:T��������t����i6���Z��Oren�\{�Y�$���s�9������#�%I:��Tj~�������l�@&,�8�?i���)��,�2^�l/�d$�����H�_����L�����'}��'q���.��x	���pk3��\���r���O�����[�Xe�a��<��������[y����e����T~=U]XyC����.,�`��i�v_}�����U�<c@\y���,�����;'%n���j;��c^v�G���Yg��n�|Tj��&���c�z�#R�]�G�qb��;���H����5�X��*� k%�<�+��G�C������Za�����x��\�Jm/�%:����rXs�"ur���9�����Az(q�������@��oe%w�J�/������r�-Y�X�k����
���
+2spsVa���c��j�$,�0?G>9����Z��I�����|o�l�	su�
��������-�i��f���'�l#2��R�������%��p,�R�e�+h�+4�������N���^f����x&�Yx���,TJ{�m>���m�"�y^x��p�`��%�y~����k^��T*�x���aQVr
#ky�W}
Xt�/1r.KtGD�w�}�-��"N�_s����>	�G,'s���*�@�'��N19�F�����tn�UVq����'�Xj~�(�O'�i�XjN�*�.��m��X�n��gwb
����?z�������E��X����q1*U?��z^J{��KV!vN\��W����x`
��a)�m��_T� /�N�9Y��%���
�v�����,�Z�r'e�9.'��'�����[��-Xk+��VO0`����@Y�X;P�Vuh���:�M4`�����k3�����0`������ks�c�[a��n���ktr�h�Z%6�f�EkC�5`�Zm
7����v�����*4-����
4���Jl������jk����n������n�k�UhZ@�-��h�k��,X���
���k�5��
X�����|�*��:�/��[�_;�3����7'�	����C��H��R������C9��1N����C�y���y�w�w89���Qs����%���C����u��I0qw�1�-/14��I��~NH��b����x�	M����{��c��j���-�������$P���k����3�0��6�:U�?5�jh
�;+�`��m�YtN���p\_�����+��e9����������r���O:��(X���������,����r�X��n��Y��td�w~l�?q[9���!�l��@����&�5�}�v_z���#����:��x�8c����L�4`���N=��4������d\����`�K������L��o�='����������������e�]�<?��#O��d��������5<�z��5�U��P�	_F�@^��A��#�R��]g��|����biI
�&5���i��:�}�`".�`"���X�Y2�
�.��@��%�*��z������\`E�1K4�/���k�i�`��������V��h�;��7�h�����6`��n�{!X���j#`!`�T�Fs
!�p��	K�A��bq��&���A�	
��G�?���O+��p�<g�]�kb��C�����G����?��C�x���s������J�3��y���y]���^&L��	<�����_�[o�U���X�A�����cmR,���o������C�8���/��"���� Ii���_����+����+?����8���od��!����?i_�d�!=��x_��x+x!��E�?~Li���J�Y5���X[����V2����@>N�HxX�|�-��#F�Y5�7s��&�=��������I#���[n�_V�p3����9��c#��b��~^|��5?A���m��e���_~���[�v��'G���mA'�>��C#����������@�G��������������n;
��\q�k/x��k\��^�	�
X�=��tt��k~�WT��������8qb���I`�E�~!������E��i`N�x���g��Q��X�5����w^�M���V'L�:�>����<�H����{Y��e���+�������90`���j��^}��:(�Ybpj���:Jd��1�s�a%�
y���|����8���2�y��:��j\��9�b�P��+����k���<d�%�R�J[%v�������=�+�>�^ ��i��F�/�����
N�U���j���?��3�8��kn�w�g����S����4�����Xy�t�;�X���f��3�>�\�X�X{�'��<�~y�Zj����R'���F���O�,\��N;��
��w��Ki��oI2`��%�c�j��@1`��(��ub<�v�������I?������/�W�����7e��X�q
o����u�b�!kX� I��"�VxJl��X����X����:)��M:�B�M�gv����/�P��H�Oi��I��%�{�W_}u���s�f.=�<HZ�����Lq`�
V�S/9��RgV<q�Wg>�5@6>�o��1z�g����[������I��}>
���J�VT��{� �����\��3�����)S4���y���3�t�]v����5�&+�f�d�����*D
��`���8��������u\��U��wW�����_���/�X��AD�X��,�LH���v�M��_,x�+�>��@91N���>@�8��2�m�5������9������w��2�:�0xR�4v�X���SN�x�~eM����X�h����o@5�X~�a�L�\rIuA�1����[7�����(��������W_E�|W�����.��K�,X�5�t�eV��x#p7������:u����bM�\�@p]�]������
��t��6����k����0`��v�w`e�<0���HQ��J���\�I.
+�t���J[q�����6��yV���Y-`e�z�@�V��X�Ip�����,f�S>0O����n�m�X����jj�����m �Y����8��7�;����D���L�|���I9V�z��74-�
f���o!
������k)�6�jU�N8��(k{V�$������ ��A��>����T����_��V�-s��t��������X��c]�����L���g|U8<�[7K��d!'`�)�����b�+���kXg�������0`m���|V`
��a���������L��x�U����E\w��dE+[MX����������9���pV{�=:$�7��]t�E��g���B�"�`5��Yq
��~�B�k�zp���:��a��-��-*a�
����s��Y�����Z�d�V5��@�_R�Y%��5���������u`�.d��Ox����`U��
t�Ygi������/���*fd`���k��}�,X�Wg>2�LyV%8p���,�#_r�T(���9��F�>�X�i���+�a��W`��z�����>�@-����<���9��X�t�}Vs�����@��O�ae����Xt��Z��
b���^GV����m1����P?�.;:�#��a��2����
�������UW�n����VX!R�����d�G�UVY%��9�9� ����d
�(��X�!?����d�V�Y]��
+t��N��p�
�EP���������^a��������+���V�����P�:�
i�ox�y�W'������4^R���;N_^�����O?]���i����#�b��h������s�����E3'�y,[5��Pl��X�XuNNrbI�Z���N���E��t�X�	�T�w�L��u�VI#����r28����������u�W�"k�r��FL���P����i�f)SJ�nS��Z7�k��}N�K���vch!
��Pg�����P^35`m����TK���ls�5`m��,�5�e��
��X[����T�v(����6SoZ[���ji����6W����Ug[H�-���h�k;��LE
X��7�-���k�4�\|
X��?�n�k����-����v4����k��������ji���Z�m.�����e����l�Y���kuv;�j���5S�FV9������owr\�[i���������$x�����7��$��������e9��y�N�89N��U��.�����+1`m�.�LX�/��"��7�89d�����i|�p���O�#w�����BNb�V���r����k	�������2�u�=u#�k�tE�
R�����X��{l���}�:9��%�5(A��89���9�>$����/���,H������K�0�[o�kY��r�X[���7��u�w�������H���������Dsq;�u��9�M�_<����{��������zl�k=�J��d�Z}�!V"�HX0�F$��Z��#�*q�����=@��an���v6,s��K�8�N�;	e�$����n�%�<����5:��fs,���0')�L�����$�\N�-a���&sX�n�IH�(Q_pk���/��\x���g�.���[-����k��_~�6����B��I8;�x�������O?���q9��O�?�������|�I88'!���e
��;������GZO�������s6�}���N��9	y�yxQ���������3�t6O-��o,G��1`��7b���`d�����B�WJ\�}���Kx1�GJ�U���_�8�9�pr��C������2/X������G-K][n�e�z��w���;��6�lQ=�G`��!~+��e�� ��K��Ey.���sJ���c�j��+��2z.��qK������2M����B��<��[c��T��E���4d�4i�'�,z�W���nXC6_���=��������J<��_|;V,��1y	�N<S��R>�'�8}H�b��Hi|\*:th�������o��_/��^z����@���?5�.
X�����/������2���95
&-�c400��O>��2T�,]*���e���)�u�E�?���6g�&@9$�����#��\������GqD�����,Dpq���i3� 0{��������*���n�i�#�E���2e�l��~����m�����V�������!�X� ��I���K@q����Xe�����K�.^V4�o�����u�u�'�����X��g�K.�����&
X�e�L�l���tq�j:`F>YL��/X0�/��n��9��2@�����n�����`�0a�~'���������e�!^_��VX!*�`ma-�Q��=e�n��\�y�M��St��[ ������*�w����{�����5}��������Y\�Z6+ic�����O>�[��gX���>��S!���
X��K�R ������X��7/s�:(b��	�#�;�<���=VI�p��,���B�XeuiN����%J��\�c�]v��!C����G�����9��c��j�]w�U��8��^�t#W��Uj:�'�s�U��d��������='���K^T��n���|�I���O�9���U��{�L���R��	}@�8��f'��SN��A������QZ���~:J���z�������>���R�V�.?>���n�L/�����b �zV���~����7X|jh�,Z����I@�,�v��;�h���_?-�����>�Hy��_?u��~XI/WV�+�xA`;���?d��j�?-������V�1�����D�2�,���V��)�����{�l��w^��^&�'OV+PVk�/��=�Ee��^�
`��M2���t7��{�9X,��B���EYP��^���#���re�Q�~���Z/;�E^�(*��ex���Y����+�k��MM%K����N�A�����Z},>(
X����t���+�<�\���e�v�i'M����%/��!z� x��Zk-]�����p�r>���+�����+�����H��h�m��6��=�\?b�-�U��<2�a� ���v����5�\���}��%�K���*GSjqW0/Q�7�"�B���:��:����;�V�������|���}�* �|�Ge���*fP�5�A��c�94=� ��@�e�0�]��gd�g��U��'V�^{��Z��?M�}�����k��������U�l�+s��E�Y���U�������������0wH9�lAB��Yu�e���e�����G����k��<Vw��`��d���� ��!��g�a��d��2#{��N;M�������6�l�o��/�n��u�m�Z7]Q����u��N�4`E�����%� 	���h�!��YrX��"���C��Fps�z����d��,�X��5��^��%
�|���i<+�z������`��`c-1���%�	'��s��m��������d����9�8�i�h�����EV�o�UW���m@P9�`e;���q1S{Z�=�����0fn=��9��u/e�Q������������Dlc�����$e���Q�������N^��b#��������l��di���	;����5j���>��dA���T�	<�,D]2���<��TOS0���d�tm�����ZO*���K���I�!!�N��w����8!J��N��]��b�:N��+N��o9Y�PT�,�y��o�c)��r�W��Ao��T�������k��]E%����p���zT!��u����q�F�5`�Z\G��9V��j	��8�
6��������J?�P�y`�%���F�5`�Z\G����~h���jJ�x
�������r�0L�!���V��	/]�J`�Z�}SO���ZO�����v��;�j"�0w'"�`��k��5]7���0`��.Z��������Q����X[��o�4a���t
������W���EK_��t�[�3j��5��Z<�k��B�
X�&��4���t����j�����h�+���~k|F
�fTT�g3`m�@h�k��}��81���^rx��|����;�prTb�D6`����"�����	o�Z9]��i@��9��J�'�>;���FV��\�������0`����K5`m�.��zN�&8	RP�FW�^	��^{�5���%�f�0`����"�����	o�Z9]�3�g�}�u���I<[�����������$�����s���A
�����a+2`m��������y���';j�8�:���v$�{��'5b�7��)I	9�����~s;U#�����n���s?����?�f8���� �.������6xbE�OB�i^��\b�jD��[LyI�4��f�i�Q�?.$\���z%lZ+�X}�A���z�F���Z`����KV�sO�	����n	M��W'��j���
D��z����h�2�7�� ���D�z��h?g�y���%,W�����#a�����c%d`-�P�����#!�T|�����k{~��'�w�����\!O)�fk��}�@cE�3i��b_?3I?�����k����&(�@�t�9�x/��_?�&n�I'�e!���;��1?	^��;x���@�1L��*?V/!�4�x�K,����|�������k��$<1[�yJ�U9H^�}��~�y��r�~x��20F1L�LPvb�O�:$���`���-V|.��m�%�\���������J�%��n���~�W�W\q���������?���g��rv��B����k���U:t��H/�;%-��[<��;f��w���?�R�b����7��g�~;d"x����������	t!��|��)&eJ�7���O�����:Oj�E���U,���g��XL
TAM7�x���D�m[v����P���z��c�9FA"+i�f��J�HZV�@DG��c%���B�|���|��5�K�.^V�z��&�l�i�)P��}�E!�oTe�I�R�/����:�GqD`�8����D�QY��X����CF?~���m{���?��3�
l�@����"�}��z$V�uwlT��q�T�|0JK���,��<=C��C=�%�.I�S���7����:M�i�i��4`=��#u1b�g�����RK-��}�'L�}�Eq�a�)H�n�i�Fo���V�6���UV�jZV�f�w�qQ}��I�����z"-����.�L����GyD�i[�h_�,�a��;��������J\�
�q��kqs��W_�?������u$���m��s(�L!o�;��l�G��G��"0<���T��L�z��v�~�=�X`�M	��u�+���UN��F�`4`s�2r�����_�F�����1���(�ba�<%����g�y���\���s�q#���`�G�v�f����m���	���.�H�2w����$��Np�N�	:E'�C'��H�\�#V�;���tn��b9�SN9�
>\��y��N��N����m�����z����|g�o���N^.��3����g'�h'���+�
���]�7�y4���2+^�;s��������u�����~�i�_�Y��aA�`��g��m1�z�V��7�o����f�������3����X���x����*��l�5�]�����k�ZHi+*��R�1���,>��~�]vI��{��y���7�t��3w(X'�,Vp-���o�by���D^��9su�+y�]w����2���E�������G�h����d�s�kW�C]�XOp\�TC&���%���;�L9���	�#}���0
��Zk�<�,p���F����PE���f��K7���S#^\t��M�H��3e���~3�5�a�N�����4�b�i�*1;u������b���#����^������{����������EFq
�@�gY���F����$Y��/�gV����X�IV�
~Y�c!���U,,b�2���6|C����
�Yd
<��XY(
X��CYY]��
p
�Po��s����'��q*�Y�*��X��`�i0`M�L�����!C��J#m��O�<P��Wd�cN3N�t�"7�>�c�9���~e�9�����T�n���(_�`��A����^�&�@��U�����Jd�c~4>p�k�u��DY�w����GEO�� ��I>Zy���9� ,Vt�j�@i��E��#��J`�oo{�a�-��W ��I�;P�z��v(��	���H���I�S���7����:M�i�i��B���#���{��7g`��Q�g�c���mz����Zd���Y�C��N;��5�\�V���4\,R\���
e�vp�u��U���{n����o?���c��zM��4<Q/����"���^[�:u���Nd���<�g�.��>h,�8X��E�mC�m��;�m)�F�,�B>�/�`��8�d��M�$���m��F�����=��.<y�a�����������b�D��d���~���}��PS	�[os���7����:M�i�i��B��< �>VY�a��8N�V	��4�q�>W��u�-%�5�>�X����J}^x��R~���^J�j��v�P<l�c)������=���u.@L��o�n,en��_���q*V4�&��b�\�z��
��
V�+r�QGi���Yg���bC�#��Bq_L�(c��-P���?+��\rI�'��z�G���wB��n��KA�z�;T��C��G=������f7��|TL�r���5��--���FZ�>
XqQBB�1�*�*E�4$��u��;�D��0�_����2l/����@�|�����|S��������i��6��"'<��e/�)���8�	H����i2*�K����+����b����������	���-���<��8T"�7f�6���~3`��u�c�m�5�(}�M�ts�UW�D[_�U��5
TR�����l^^����%���R��c-�I	��_*+��n��������P���:N���F�����z����
X9hBVl����D��PY�S��k�@e4`�Z=6;�f����������m.9R���4��vc�3
��Y���8�u�1��V�Z�vY}��Jj�����l^^����%����$uY���k�v|��6`-Qa�����Y{��UI
�VR������y�������.���0`m��/���%*�Y��6k�Z�*��Jj�yy�6o����FV9�Gc��q�N�wr�����{I�mOf9�H���������
ZoD�V9^�I�'�[�T<9I��Y���a{�s��X��'��;|�4���4�G����~������VC~B�q���k�x��j�U.��0q��'�|�o���	V^���Z�iXU�6
s;+�i��}
i$`
-%
�Vm`���p�|�;���W`%��u�]�	�]"B���?U��K/�F���V��k��xJ{X���W`�a��Fm����w�Y��W`���7���i��4�M���y�1�5h��[I6�ZeW{��/6�*�m����gk���[x���4M,;��SO���z�-��B�W�^N�I���v������o��t^��O?u/����I,R'q\sx~��':���*s�N-�y�������q=�3�rd�2�,��_~����n$�{�����>M��pg�U'1au�P��9	��$��� �@�@��g��������t9U!�Xp.�����:	u��GB������1w+qZ�D����O��U�I�>'1p�D|q'W������4�&��4y�dm3�����q����?�Y���3�<�IxA��j�9����`u�&Mr��=��C�'�-Iiu~���N�*�d�����+N"�5} !����l����m"���amM�@!�U�h<U,D�����N�aFt��_�x�+���#e`�}����b�F�r�w����!qU7�xc���+�+������G����,���b
�����`����oV�	2�����y��������K�3�G�*���JY(X��vX4�K;(//��?��`�>`�X���}���5��q��RK-�u�����Yg�u|������[F���[o�Uy�n�]w���������n��*�$ �y�Cx=y����W_}��3%�i���u<���CUo�D|W��Z
+�C��O< =�
���&�qw�8���_�k��S%B���vms�i������F_���7�������*Mq�����@�n�����I�k�0�G~���A�k������W�����
H�S>`}��4�9�m!z����
{ ���m�Mq�9�(�"�,�~�a
�M��K�rH,U��#G�=�&@����]vYMC��\s��]�G���PG�'�<}���5��
`�E���POX��T]3�
�`����u�����`E�#�<R���*�\s���,u�~K�����x�@�
���J��]�kc�_��OV�)q��	&DcL�5(��?�x�<�#_V�)���J�l�Yg��b)�������4j�P��2�E>`�y�x��5�����k9�����X�P�-��"'_V��@�]v��<x�&aR|���I�e"�I;����;�u��y��%���������7���5OX8��tt����"j9��(s�"��LD�X�����B��,�/����a�"v����,"�%��R��[�k��]E%��X��2qY&?2w�n���;+0���S�n��|��x`���^���������<�������G^�������&-����w����p����.�6d�����;��c���j>��P.�&��w/�p�C2��� -���t�M~�������}�������q�g��2����@Y�:�M~`}����GXy������\�WJ�������JY�����q�U�6`����������7"�2P2��W�8�`q+(��*Y4)��Jf�����U��?x\����T.�20#3����+�t����m��9R@�Vt
�3_��7z�h��J^Z��f�j��+�q=3���)���5c�?�k�u����*7�=n{���\�*�d�ZQu6��|-eIDAT-3������X?��c�dp��,a�L���@�`-�b���+�'����
O�(��m��I���,��S�5�x�;��xQ�����
+�Y0�L�z�.n������N�49]I]���;���r�����k?|�������k}���l>
����@Y,�R��?z�<�@=$$�W�o�Zi���1`m����������3fLN],�a%*�E����3Na.6k�UW]5���X�,N	�xe�����X����j�S�������p���A�r�]p�Z�l���0����3'�N;�����`�E�.N?b0H�2��v��)�TX�������lC����y�PG>`�5-��B���+O@
���Q��EX�,�F���F��+�9W�^��:�OLy>AQ�5�0`�����
�&��J4!
X/���h �=���B�o��ed��V}���k\�XU���s�B�(+��;���#�����qp�2���>l�	y���.����9I�:~Px�e����Y���X_�_����y�.]���]�	���k��uc�
@�5�N���v�������"������c�6��l����/���a�&)X�,
���2���9���,}x���\22�b�G[!+��4���k�=������I9�&���uT������m������jI���9-�7���r4�P�����<g�<��={$��
/���#��O�-��t��`Y�]�W\�Gu�2u
,��5��7��K@!���na��T8���P�������/�\�8��V�p)����gae0s��[�<^,�'Eg�b��(�g���"�=��7��>ORW�z�����
.��_<�+�U� ���2ey��S����
�l�
m�B��&�P�.^��
0��|a���:�|�� O5�
X�����m��<}���V�z�C
���*cQ.�@���g�
c ��I�z[2���4��}���9���/��7@.�6\��y�P>�7�\�W.�A����Q�y��:�N{�(K_'��B����b�Y�������y�� ���z��iG�+��s��4,GG���b��q}��(����Z���B,����S
�+c����!�������������:WX��������tb�8Y�R�\���8��h�O��v�hD��v[�
����x��eq%wHSX;D�
W�k�uYu�$�rx���� �HK�m9A��:��f��l�q2���*s�����vX�E�����������j�VXeaS�&,��4�M��o�@V
�~W'�42N��EY��������^�����-�$����MM��&��4���Jn�*X��L����kG�@c�o���Tu)
X��b��	4`���X�&��@��P�k#�����0`��h��
X���.�k�Ul4�X��k��(��0`�m/����'��}�r��c���*��V[�0`-Ye-Y���%��m�
X����)�Fu��g:����8@����
6���U�
h���Jl�-��Y�h��EK��#a�D�1`��j�����*jm:��M���5���<���T������
X����7`�������J�_X?���\ s��J����9��Gu[m����oN�������I7-��������/�J]���~�a���uI�T�)�Y4m�����F���qn���|T�m��fn������y������q�1�����_vf��"�R'�I�4�Y��V�d2`
�h�o������������@Z�8B�G���W\q��%��XY*�3�<���n;��J��=��3��y�}�iN���������+�����������e���C������O����>:��z���G�)'`N�5��������;�����<���s3�0�D�]PV��K�ob��3F���B�6����O�%s����,l\�wP��g�X��#:Z�4`���:��@��lLgt�������h�����DG��y��5��"�s�����'iX�������kA�	.���&�,�*�j�$�X�
�r6m$�P�Xy�E6����;w�x�7Ny�A��5ROC\�6D7u�����!@��[�������PA�X���O��2��x ����Z��;�8?a��r�!�����O?E<�}���|�b�����iXI���-i�]v��
:�[�E�����n�e��n�A����U��w�]�^CTy�'��5h�1�
X��:ZJ����:�?
X�x����j�����g�1<��P%�<�q1d�M���k-���\e��c�j��rJ�j��I��0�'�x���;N7�tS/�e�-����{�]�	z�!�k�Dc|�6F?u����uR!`=��3�l���._�Yd�S
��k���H�~���5�wV���Y��c�=
�^{���U6�u�Zk-���s�=�%F��q�	'D�X#U4��kCtS�i���]P���W^�`��
���b��M�����'ON&G�Y�+�f�#��Y`�r-V^6VYe���G��k\��ym����Vk�
Xk��:�/
XW^yeGAX�s�1�.`
MI�X�0�����;���L���|��P�����g�F������Xe[����,r\|��Z���IHQ���F���YdcE0��Yg��s��������;���<��v0`����&�F��
������6�K.�D�����}��G������g�9��m����k���l�����������n�9��cJ?z�h-����j=�������^S��8p��#��N;M����E^4��.���G��jZX�����Ok9��"��*������gd��������8�q�����Q�h���~���%1`�����li������A�U�Gu�.`�C"�Yg���>>����;#��}�]�����`bl!b����Yg������/���3�<x��C!����#�^V)�~����>X������!Z9�R�^Yx��5�l���-[���F~,tf^@�
�/)��r���u���Nz�n��/����X���^;yIF/#�
��$�b��_�y��7R��_��m���-9z�2�,�X�e��SN,=-#.g��S''`=/��������s�N�������
0���%'��S]����K'/N@^����#�[sx��W����3��W��*�;y��:mF�����e������i�`���{�i�e
)O�����J��Z�����5�bd�;����n���������d�-�����v5���]�k����}���s�U�Q����?��b���v�R0`m��.���e���
�����$vr�~N�,���99�v�R0`m��.���e���
�6WZk���������6[����2g�ZJ�-��e7���l�5WA���OkMu4`�Z�6W�f��2�c�Z���XKi�������������*h��\�i���X���f�:
��5[��=�k���v��m���������;��^|�E'GQ�H����h`��'����&+�Q~�'��7Y��9���i@�pvJ���r*mJN�
n�n-�Q�
.]gV��4`�����rZl�Z�����kv�5��0`��J���kSvk��2`-]gV��4`��z}^N�
X��Z�1`m�N�&U\�WiS24`m�n-�Q����J��X[���i�k9Zk�2�M�����k����*mJ��M���7���t�Y����k��y9-6`-GkMX��N�<�M�0����{n�%�t��_�,����z�-w��6�FZ�I�r�{��'�4g�}��j|	�5`-AY-�����;?��B�z����=��������������w7�x������v�m�n���8+�n@
,�����>Pp]c�5�����6zn�ZX�+$�r������~��W�������[k�����j�m���c�=V!)j�����V[����J����j|������w���j=����h�B�-������=��[u�U]�.]����f�i������������QZ#]<����{��������F���dm%=���/�~e1`�����di������e�YFe�{����g��f�a�H������;/����/�P��/���p+.���9�?��cu's������|�A������7��M7�t9y?����������W�[a�r�c9?���n���r���?�g�}���zk7���j�������rK-���p�
u^���W/���g�UVYE-��g�Y�kg+3������/����?e�z%���a~z��6r��:kN{hg��!#}u����K���.�X`7�����E�e�]������[h���	��[�[l1�/^��6�,:S�����{mo���3t�>�e���[����UVp���(#�������O<��6������Yg5�G,<��k����9�/������~��N:I������z^@��y��/���?��y���h��+�����
��O/��	=��3^�w}�N���^��#^b��[n���9��~��1Z�|���e�8�?t�P/.n-#���[n��|�M�]�
d�����l�����_}��^,z��?�Y�����h��N;yq;��O>�����*����)&3/��/���%�\������C�#�d�2�q�\�pI��>���{���G�����K,�D�_/��WH�7�|��g�y���n���w_�' ������9���>'s�o\pA����i��\����u4��h�_����R�g�Q�0N��I�����Xf^Bly���X�|�;�8/��4i���[M'
:��s�>��o���i	��������#^bM�3�8�o��&^�*�v��Y��g��q�W��(M,fM�v�V�
dWA^���mx
Q��}|��'������^�b��1X���u��8����8@A��,�G���^Y����o�����4������X��_���@!_\��D�����_��������������>�������v��e���}WQ�+"Gy�ZY�8���*�Z/�r�O�t;�0�'+O��������/����[N����W_}u/1b�^��
*�r�{��KY�`�bE�i��w�;3Jw����*��eiz@#F��>�<#f\<��#Q��3��A���X��F������le@zP�5c�?P��J�}������.�4,�@X�z�@��wCV�k:�4���ie��n�Zm
7����v����
(�]w]8q�B2'�i�c&?X��$�V^ye���H8���{����E���H�����q��6���{�������@�~�iM���w��O?}Twh[���������|�������������/1t�GV��V(�������+�ftv�)�D�h+i��%����,r�|X�q�UMy\���P�+[�����ls�5`m��,�5i���/����,I�p����+e�/��;Qq\�����Z[d�E4O�PXqG���Z
�����v���N�#Wg��m��,\R��6��[pQ\0����y��"3�5�+����Z�����$��������'��bRh��X��y�������n��X��**y��
P/@����w���3���������dx��M����Z�`��X�@Y���U�Y��
+���mx��W�-�<|�p�KD�^{��I�5j�����J>��UIu�Ef��e��`����zV���GB%�5�����
�q��g%��(K����v�k�5��
X�����4`�:b�k���w$�d�e��{�����^�6(
X���-+�y4�bx��C,�9��C0i��f������f��(���(�Y	Ls����a��!>��=��������|_���7w�.\���B�,2�����G����K/���,�G�Y�U�>i}qWp>=�b�4(�&V5���*���}Z�j��V[������9����HV,FB�U�j_@6Xa�[����.�H�����,�a
�3d����W����4�������,[K���vx�	��z��
�)�[�g�~�W
T��=�<�H�l��6
l���|�6PyB���O4����k��ypw:��V_L���,2��3��a�m������	��l
4p�@�y���!��v�i���[Di��(�V=+�Y�+B��
��{���.��},{M/
Xk�������a�����+
.U�)�H�/��&q������@XY��~�n���<*+��!+C��b,\�Gu�<�:�,-�!��w���=��C�[�hYm+3xE@�}�[���
@���� �/�fx�7P.D�P����r���I��)&3�����c)�R@��=������b,�l���e("+}u�������-
���J�#n_��� �:�����b}�\i�,_�{�jj�yx��K2r9�v��X��L��z^�jN��8qA:���n���N�!�t���#����$@��x�	���i6�"�����"�Y�����Zo���]�'	h�X�����h ���8j��������o���x�B������AO�J�-$3��^O��$#�����|
��W�k2��#u�&�g~3��C����4��M�������(o���\��i�Z�`����q�F��F��k#�V��j��q����k��
,�@�������U��05��kM�����6l�UV�Z�o��d���
`^i<W91���1n��*i���J�m2��M���6��J�t�^�DT,X#�@#h���z��e4`��>�	j�u�P�4�
��Cy-T����:�PS
Xi����O��K���,Zj�<�-����vk����*l	�-���i�Z\G��4`�j��,0`����c���lMl�X����``���\����UNw�����N��s�~$G�9�V��'�q�N�>����w/^Y�s�a�N��:����o��h
����[��$V%�����A3�1��������^rr����5)Mg!�oNY�s��]��I,�,�*�Gh?�Q�zJT�^��1�S��o��>*XQ=4`� �7Z�2���\���G�z�,�8,��b!����X�����/oTU���s-2����Ek 2y���}��g��%����s����v��
����<���O>�7�`���}�&"l��;+����?�9�<i��R�(�zH������_;�����&rH|������3�<*�����#�Z��V��=�*C1`%z�X^,V���^xan-f.���B�����/����`���b:���.���(@@HK���
���bD�Y�4�S\ig����|���}T�oG?3`��h��
X���.e!`=����j���20@���@����p`%�	@PX7�x���k�p�,a�~���x��\
�H<m/D�t���;�������l��X	DN�"���C6-x{���~Y����|n����o��
X���*i!`��=
O�1�{��sd�7����X�X�?�p��g�?���9��C�2�0�����xQ.��=~�x���DeQWw1`��6���Z�9��c���
7�t��M�����w���=PX�y����z����>��c��J�u��@2G���o>�3�<S��=I�(5��>��G}�����!Z=�y��DZ�~��P�<�@&�:�S�o�Zk�1�3`m�~��������u�J�,9��D	^N�rY��	�M�n����}-Kpp�|2x�5��TS�"��C=�y��"S�����!H:qRW\q��9��M���C�!�!N�t�K��=z(O	��qV4(+�C0�P����!�:�8m���+|�>}��8q��:t��7eq��5_\���~�����/�����8�Gq��)VO(�o"_��,���k���l�UW�y�����
X��G�S�����KUX�����*��o�Sr�M
��92��5�:@��4���E3X�PV^��8�B���~�� e!�����S@ w�g�-����r�B%x�q
��3��X���~�������mM;���4
�f�q�`��b���+�����
$���#�������>����G2`��^�?�X��O:D�B�ZL�� ���������&L���Ic�����%��Hg� a��j����7h+�"n�<@'iu)�e;�4h��w+��<+�}��&V-��bZ/@a������J��v-����
���SO�_B��i����&���#^��#FD�N=�T��%��e���P�u�%#P%�5�2h�7.���k \����
�$�������W�A;�Q�{���
�z9��Api7w9|����;�xV��q��Go�n��E+��t�n�g�1z�5>w=L\�k�V��d��_]��i���/���C��m�Z��R2��_�t�D�V�!����aX��,�R�o��������Yy���x]t�E�����/X)M{�L�Rb��f����fO1����[��S���Jk�z�0+��w�����'��|p��k��L}�e�Z_��a�TX�����;�M{8@��4N;�������v����%+�R��K���E�r�Q�}������s���r)+�����be.P����+�n�Q����xgV����������hG��|�����
X����(o���\�����(�A 
�V���Z��e'��)(0�2/���JV������_�W�s��������0'���1�Q����h?'H�	�'=�
f��t�b��w�}����x��A���}������s�f��Y�W������6!��%����{��d2`��>����k����,T��N��� $�b����~�P)�E$�PM?����6q��+��0@�~N@g����>>i,F
$gk�h�V�K/��?���|�m�=��l)��_�oN�B_]�t��\s��-�0	�8��.�lN����_����M.s�8�%��Y(FY<�i���f�m�-����������Rv����d��
$���#dBV��!�aQo��5�c�Zc�7hu�
�q��`e�
�"�dZh!�{�������<4�����EK�m�]2Y����`��>��}�����b�/���/�7W����4o�n�4������3���*���q�:���
NG�R|��s�����~� /��e��K,����p�
����������}�,f�,0}�$�s��E>\��<�U�����&��`e^x�UV��@_t
~R�Z���Z��Y�E������B�m����X^NpF��d�����NV�����+�A���:t�;w����W#��;�|��X���6���J� C����AQ"�%��;��ISNmEi�^d�'+OY���������E;$�E���7\��
�e��Z�Zi��i�c4`��1zo�Z
X���$�k�kl�J�M��Uk�k�T�X�
X��L����k����j5`m�����UR��m*
�6UwV�1�USmc16`m��2i;F���F�����z�J��VI����4`��T�Y���VM�����������
�v���V�F��*�k�Z%�����kSug�c�Z5�6c���/��c4`��1zo�Z
X���$o>`��o��?���W'����'�\����i��85J�utrf��H=N����T��+�A�3`m���������;���L��u�o������*]��3
�j@B�99c�IH=w���O?=5o���J�
^OcqlRWZi���K~�^��"U�y�����/cR��&N����#F���
��C�+��&fek��UJ�|+�% ����{�,V	j���Kx�J�n|��@����7����v[w�!���������1����lVn����%�++�����
������Jmj!�
�;	��$&�[x��5���������s�9N����o�y�q��J�����n��I$a��b�-����(�V[mkD�|uF�"M.�������GuD��	��$@���in������{My	��p�/��R�6�2�,m�N�2�=��s�l��6r��:k�y����^r����c�_w�u��x��|�����'�p}��qD���^{m'����$����wp���:	]�$�F�!�O!*��R��|��=���U��G����5j�;���Q�g�5Qs�W����5�
�����)�r��I0p��g���95hz&��_?��a������������f~�-���2%�w�z�Ai�p��$��]w��#z�����o�bmE��3d*����pQ����� �?����&�7���sN?f�m+A�	6��?D�%T��i���j���e�_���,���zjIy��
�^�zi�Y������Pv^^x4����N�qb=����h����u�������Kj���c�zx�<g$c��b�/E�����z����T=���?j�k��k�f�����!+�@�
�m���28��L/���.i�^{�
s��7*�0�C�Xmh���X�Zf��7���g��v�t�ad�/����e��'�x�,u���7�x���.]��������'�����y��Q�����}�q��i9��������X��D�v@Q,V}� �X��DG��z���E�0N�����!X���������KX�GD�d���l������R���������hG��u��kGj�q�6`m�������U\�9����~:�a}A�n�+����0aB�	��a��y����[�}�#�K�%����&���;�</.�p���,ramR�d��.�L��Vyx�w�}w��;�[���
���#��}r�
�j�,y���j�s�q�E��yGM�n���_�����t���X���Ht/.w����/������)�~�e���~�e��8]x��*�k\+v]�0`����|��u���9�`	2X�����K��M����L�4I���w��9��Ye�U��9��>/��S��u�:���nO�9��S�7���=��3J+t!s�~���S��r��,X��4��g��!*^��~�9�������D���	i����|��e>����q6���O�,�����O8���zn��&M��]dk=�B��`�Z�}T	KVY�����O��,2)(k��N!��X�
���l������g��`��]=V���,y��e�'_>��{f�T&�K/�Te�
<����e]����il�
�?���*���o_|����xP���������[��"�S����?��k���z��i��x}V�K�W^yEBY���
`�\]�n����,\J|��X�B���ba������`�)��N}��\Y����c%bi��k(	���3N��d����X�<����U2�����_���XY�w�}���dep���>K���E�|���g��.Xs�a7
����P���(�IEND�B`�

#67

/messages/by-id/874jp9f5jo.fsf@news-spur.riddles.org.uk

pgsql@j-davis.com

over 2 years ago

In reply to: Daniel Verite (#65)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-05-19 at 21:13 +0200, Daniel Verite wrote:

ISTM that if we want to go that route, we need the make the minimum
changes at the user interface level and not any deeper, so that when
(locale="C" OR locale="POSIX") AND the provider has not been
specified,
then the command (initdb and create database) act as if the user had
specified provider=libc.

If we special case locale=C, but do nothing for locale=fr_FR, then I'm
not sure we've solved the problem. Andrew Gierth raised the issue here,
which he called "maximally confusing":

That's why I feel that we need to make locale apply to whatever the
provider is, not just when it happens to be C.

(3) Support iculocale=C in the ICU provider using the memcmp()
path.

In other words, if provider=icu and iculocale=C, lc_collate_is_c()
and
lc_ctpye_is_c() would both return true.

ICU does not provide a locale that behaves like that, and it doesn't
feel right to pretend it does. It feels like attacking the problem
at the wrong level.

I agree that #3 feels slightly wrong, but I think it's still a viable
option until we have consensus on something better.

(4) Create a new "none" provider (which has no locale and always
memcmp
semantics), and automatically change the provider to "none" if
provider=icu and iculocale=C.

It still uses libc/C for character classification and case changing,
so "no locale" is technically not true.

The provider affects callers that have a pg_locale_t, such as the SQL-
callable lower() function. For those callers, the "none" provider uses
pg_ascii_tolower(), etc., not libc. That's why I called it "none" --
it's using simple internal postgres implementations instead of a
provider.

For callers that don't have a pg_locale_t, they may call libc functions
directly and rely on the server environment. But in those cases,
there's no way to set a provider at all, it's just relying on the
server environment. There aren't many of these cases, and hopefully we
can eliminate the reliance on the server environment over time.

If I'm missing something, let me know what cases you have in mind.

Regards,
Jeff Davis

#68

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#60)

Re: Order changes in PG16 since ICU introduction

On 18.05.23 19:55, Jeff Davis wrote:

On Wed, 2023-05-17 at 19:59 -0400, Jonathan S. Katz wrote:

I did a quicker read through this time. LGTM overall. I like what you
did with the explanations around sensitivity (now it makes sense).

Committed, thank you.

There are a few things I don't understand that would be good to
document better:

* Rules. I still don't quite understand the use case: are these for
people inventing new languages? What is a plausible use case that isn't
covered by the existing locales and collation settings? Do rules make
sense for a database default collation? Are they for language experts
only or might an ordinary developer benefit from using them?

The rules are for setting whatever sort order you like. Maybe you want
to sort + before - or whatever. It's like, if you don't like it, build
your own.

* The collation types "phonebk", "emoji", etc.: are these variants of
particular locales, or do they make sense in multiple locales? I don't
know where they fit in or how to document them.

The k* settings are parametric settings, in that they transform the sort
key in some algorithmic way. The co settings are just everything else.
They are not parametric, they are just some other sort order that
someone spelled out explicitly.

* I don't understand what "kc" means if "ks" is not set to "level1".

There is an example here:
https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings#colcaselevel

#69

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#58)

Re: Order changes in PG16 since ICU introduction

On 18.05.23 00:59, Jeff Davis wrote:

On Tue, 2023-05-16 at 20:23 -0700, Jeff Davis wrote:

Other than that, and I took your suggestions almost verbatim. Patch
attached. Thank you!

Attached new patch with a typo fix and a few other edits. I plan to
commit soon.

Some small follow-up on this patch:

Please put blank lines between

</sect3>
<sect3 ...>

etc., matching existing style.

We usually don't capitalize the collation parameters like

CREATE COLLATION mycollation1 (PROVIDER = icu, LOCALE = 'ja-JP);

elsewhere in the documentation.

Table 24.2. ICU Collation Settings should probably be sorted by key, or
at least by something.

All tables should referenced in the text, like "Table x.y shows this and
that." (Note that a table could float to a different page in some
output formats, so just putting it into a section without some
introductory text isn't sound.)

Table 24.1. ICU Collation Levels shows punctuation as level 4, which is
only true in shifted mode, which isn't the default. The whole business
of treating variable collation elements is getting a bit lost in this
description. The kv option is described as "Classes of characters
ignored during comparison at level 3.", which is effectively true but
not the whole picture.

#70

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#48)

Re: Order changes in PG16 since ICU introduction

On Thu, 2023-05-11 at 13:09 +0200, Peter Eisentraut wrote:

There is also the deterministic flag and the icurules setting.
Depending on what level of detail you imagine the user needs, you
really
do need to look at the whole picture, not some subset of it.

(Nit: all database default collations are deterministic.)

I agree, but I think there should be a way to see the whole picture in
one command. If nothing else, for repro cases sent to the list, it
would be nice to have a single line like:

SELECT show_default_collation_whole_picture();

Right now it involves some back and forth, like checking
datlocprovider, then looking in the right fields and ignoring the wrong
ones.

Regards,
Jeff Davis

#71

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#47)

Re: Order changes in PG16 since ICU introduction

On Thu, 2023-05-11 at 13:07 +0200, Peter Eisentraut wrote:

Here is my proposed patch for this.

The commit message makes it sound like lc_collate/ctype are completely
obsolete, and I don't think that's quite right: they still represent
the server environment, which does still matter in some cases.

I'd just say that they are too confusing (likely to be misused), and
becoming obsolete (or less relevant), or something along those lines.

Otherwise, this is fine with me. I didn't do a detailed review because
it's just mechanical.

Regards,
Jeff Davis

#72

[1]: /messages/by-id/36a6e89689716c2ca1fae8adc8e84601a041121c.camel@j-davis.com
/messages/by-id/36a6e89689716c2ca1fae8adc8e84601a041121c.camel@j-davis.com

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#68)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-05-22 at 14:27 +0200, Peter Eisentraut wrote:

The rules are for setting whatever sort order you like. Maybe you
want
to sort + before - or whatever. It's like, if you don't like it,
build
your own.

A build-your-own feature is fine, but it's not completely zero cost.

There some risk that rules specified for ICU version X fail to load for
ICU version Y. If that happens to your database default collation, you
are in big trouble. The risk of failing to load a language tag in a
later version, especially one returned by uloc_toLanguageTag() in
strict mode, is much lower. We can reduce the risk by allowing rules
only for CREATE COLLATION (not CREATE DATABASE), and see what users do
with it first, and consider adding it to CREATE DATABASE later.

We can also try to explain in the docs that it's a build-it-yourself
kind of feature (use it if you see a purpose, otherwise ignore it),
though I'm not sure quite how we should word it.

And I'm skeptical that we don't have a single plausible end-to-end user
story. I just can't think of any reason someone would need something
like this, given how flexible the collation settings in the language
tags are. The best case I can think of is if someone is trying to make
an ICU collation that matches some non-ICU collation in another system,
which sounds hard; but perhaps it's reasonable to do in cases where it
just needs to work well-enough in some limited case.

Also, do we have an answer as to why specifying the rules as '' is not
the same as not specifying any rules[1]/messages/by-id/36a6e89689716c2ca1fae8adc8e84601a041121c.camel@j-davis.com?

The co settings are just everything else.
They are not parametric, they are just some other sort order that
someone spelled out explicitly.

This sounds like another case where we can't really tell the user why
they would want to use a specific "co" setting; they should only use it
if they already know they want it. Is there some way we can word that
in the documentation so that people don't misuse them?

For instance, one of them is called "emoji". I'm sure a lot of
applications use emoji (or at least might encounter them), should they
always use co-emoji, or would some people who are using emoji not want
it? Can it be combined with "ks" or other "k*" settings?

What I'm trying to avoid is users seeing something in the documentation
and using it without it really being a good fit for their problem. Then
they see something unexpected, and need to rebuild all of their indexes
or something.

* I don't understand what "kc" means if "ks" is not set to
"level1".

There is an example here:
https://peter.eisentraut.org/blog/2023/05/16/overview-of-icu-collation-settings#colcaselevel

Interesting, thank you.

Regards,
Jeff Davis

#73

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#67)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

If we special case locale=C, but do nothing for locale=fr_FR, then I'm
not sure we've solved the problem. Andrew Gierth raised the issue here,
which he called "maximally confusing":

/messages/by-id/874jp9f5jo.fsf@news-spur.riddles.org.uk

That's why I feel that we need to make locale apply to whatever the
provider is, not just when it happens to be C.

While I agree that the LOCALE option in CREATE DATABASE is
counter-intuitive, I find it questionable that blending ICU
and libc locales into it helps that much with the user experience.

Trying the lastest v6-* patches applied on top of 722541ead1
(before the pgindent run), here are a few examples when I
don't think it goes well.

The OS is Ubuntu 22.04 (glibc 2.35, ICU 70.1)

initdb:

Using default ICU locale "fr".
Using language tag "fr" for ICU locale "fr".
The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: fr
LC_COLLATE: fr_FR.UTF-8
LC_CTYPE: fr_FR.UTF-8
LC_MESSAGES: fr_FR.UTF-8
LC_MONETARY: fr_FR.UTF-8
LC_NUMERIC: fr_FR.UTF-8
LC_TIME: fr_FR.UTF-8
The default database encoding has accordingly been set to "UTF8".

postgres=# create database test1 locale='fr_FR.UTF-8';
NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of the
template database (fr)
HINT: Use the same ICU locale as in the template database, or use template0
as template.

That looks like a fairly generic case that doesn't work seamlessly.

postgres=# create database test2 locale='C.UTF-8' template='template0';
NOTICE: using standard form "en-US-u-va-posix" for ICU locale "C.UTF-8"
CREATE DATABASE

en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so
this interpretation is arguably not what a user would expect.

I would expect the ICU warning or error (icu_validation_level) to kick
in instead of that transliteration.

$ grep french /etc/locale.alias
french fr_FR.ISO-8859-1

postgres=# create database test3 locale='french' template='template0'
encoding='LATIN1';
WARNING: ICU locale "french" has unknown language "french"
HINT: To disable ICU locale validation, set parameter icu_validation_level
to DISABLED.
CREATE DATABASE

In practice we're probably getting the "und" ICU locale whereas "fr" would
be appropriate.

I assume that we would find more cases like that if testing on many
operating systems.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#74

/messages/by-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b0165f@manitou-mail.org

pgsql@j-davis.com

over 2 years ago

In reply to: Daniel Verite (#73)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote:

While I agree that the LOCALE option in CREATE DATABASE is
counter-intuitive,

I think it's more than that. As Andreww Gierth pointed out:

$ initdb --locale=fr_FR
...
ICU locale: en-US
...

Is more than just counter-intuitive. I don't think we can ship 16 that
way.

I find it questionable that blending ICU
and libc locales into it helps that much with the user experience.

Thank you for going through some examples here. I agree that it's not
perfect, but we need some path to a reasonable ICU user experience, and
I think we'll have to accept some rough edges to avoid the worst cases,
like above.

initdb:

Using default ICU locale "fr".
Using language tag "fr" for ICU locale "fr".

...

#1

postgres=# create database test1 locale='fr_FR.UTF-8';
NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of

I don't see a problem here. If you specify LOCALE to CREATE DATABASE,
you should either be using "TEMPLATE template0", or you should be
expecting an error if the LOCALE doesn't match exactly.

What would you like to see happen here?

#2

postgres=# create database test2 locale='C.UTF-8'
template='template0';
NOTICE: using standard form "en-US-u-va-posix" for ICU locale
"C.UTF-8"
CREATE DATABASE

en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so
this interpretation is arguably not what a user would expect.

As you pointed out, this is not settled in libc either:

We really can't expect a particular order for a particular locale name,
unless we handle it specially like "C" or "POSIX". If we pass it to the
provider, we have to trust the provider to match our conceptual
expectations for that locale (and ideally version it properly).

I would expect the ICU warning or error (icu_validation_level) to
kick
in instead of that transliteration.

Earlier versions of ICU (<= 63) do this transformation automatically,
and I don't see a reason to throw an error if ICU considers it valid.
The language tag en-US-u-va-posix will be stored in the catalog, and
that will be considered valid in later versions of ICU.

Later versions of ICU (>= 64) consider locales with a language name of
"C" to be obsolete and no longer valid. I added code to do the
transformation without error in these later versions, but I think we
have agreement to remove it.

If a user specifies the locale as "C.UTF-8", we can either pass it to
ICU and see whether that version accepts it or not (and if not, throw a
warning/error); or if we decide that "C.UTF-8" really means "C", we can
handle it in the memcmp() path like C and never send it to ICU.

#3

$ grep french /etc/locale.alias
french fr_FR.ISO-8859-1

postgres=# create database test3 locale='french' template='template0'
encoding='LATIN1';
WARNING: ICU locale "french" has unknown language "french"
HINT: To disable ICU locale validation, set parameter
icu_validation_level
to DISABLED.
CREATE DATABASE

In practice we're probably getting the "und" ICU locale whereas "fr"
would
be appropriate.

This is a good point and illustrates that ICU is not a drop-in
replacement for libc in all cases.

I don't see a solution here that doesn't involve some rough edges,
though. "Locale" is a generic term, and if we continue to insist that
it really means a libc locale, then ICU will never be on an equal
footing with libc, let alone the preferred provider.

Regards,
Jeff Davis

#75

mail@joeconway.com

over 2 years ago

In reply to: Jeff Davis (#74)

Re: Order changes in PG16 since ICU introduction

On 5/24/23 11:39, Jeff Davis wrote:

On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote:

In practice we're probably getting the "und" ICU locale whereas "fr"
would be appropriate.

This is a good point and illustrates that ICU is not a drop-in
replacement for libc in all cases.

I don't see a solution here that doesn't involve some rough edges,
though. "Locale" is a generic term, and if we continue to insist that
it really means a libc locale, then ICU will never be on an equal
footing with libc, let alone the preferred provider.

Huge +1

IMHO the experience should be unified to the degree possible.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#76

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#51)

4 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-05-15 at 12:06 +0200, Peter Eisentraut wrote:

=== 0002: handle some kinds of libc-stlye locale strings

ICU used to handle libc locale strings like 'fr_FR@euro', but
doesn't
in later versions. Handle them in postgres for consistency.

I tend to agree with ICU that these variants are obsolete, and we
don't
need to support them anymore. If this were a tiny patch, then maybe
ok,
but the way it's presented here the whole code is duplicated between
pg_locale.c and initdb.c, which is not great.

I dropped this patch from the series.

=== 0003: reduce icu_validation_level to WARNING

Given that we've seen some inconsistency in which locale names are
accepted in different ICU versions, it seems best not to be too
strict.
Peter Eisentraut suggested that it be set to ERROR originally, but
a
WARNING should be sufficient to see problems without introducing
risks
migrating to version 16.

I'm not sure why this is the conclusion. Presumably, the detection
capabilities of ICU improve over time, so we want to take advantage
of
that? What are some example scenarios where this change would help?

First of all, I missed this message earlier and I apologize for
proceeding with a commit that contradicted you -- that was not
intentional. The change is small and we can go back if needed.

To restate my reasoning: if we error by default, then changes in ICU
versions can result in errors, which seems too strong to me. I was
hoping to escalate the default for this setting to be "error" down the
road, but it feels like a risk to do so immmediately.

Another thing to consider is that initdb also does validation, and
that's not affected by this GUC. Right now, initdb errors if validation
fails.

We've already allowed users to create ICU collations with the C
locale
in the past, which uses the root collation (not memcmp()), and we
need
to keep supporting that for upgraded clusters.

I'm not sure I agree that we need to keep supporting that. The only
way
you could get that in past releases is if you specify explicitly,
"give
me provider ICU and locale C", and then it wouldn't actually even
work
correctly. So nobody should be using that in practice, and nobody
should have stumbled into that combination of settings by accident.

OK, then I'm inclined toward the approach to treat iculocale=C with the
memcmp() semantics. Patch added.

I also added a patch with a pg_upgrade check for previous versions with
iculocale=C, to make sure we don't corrupt indexes in case some user
did make that mistake.

3. Introduce collation provider "none", which is always

This seems most attractive, but I think it's quite invasive at this
point, especially given the dubious premise (see above).

I removed this from the current patch series, and perhaps we should
reconsider it in v17.

=== 0007: Add a GUC to control the default collation provider

Having a GUC would make it easier to migrate to ICU without
surprises.
This only affects the default for CREATE COLLATION, not CREATE
DATABASE
(and obviously not initdb).

It's not clear to me why we would want that. Also not clear why it
should only affect CREATE COLLATION.

Right now the default for CREATE COLLATION is always libc. For CREATE
DATABASE, it defaults to the template.

I included a patch with a different approach that uses the database
default collation's provider as the default for CREATE COLLATION,
unless LC_COLLATE or LC_CTYPE is specified.

Regards,
Jeff Davis

Attachments:

v6-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patchtext/x-patch; charset=UTF-8; name=v6-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patchDownload

From b6cbf986e1f0b32009ed060ad52a145a01e999d0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH v6 1/4] ICU: support locale "C" with the same behavior as
 libc.

The "C" locale doesn't actually use a provider at all, it's a special
locale that uses memcmp() and built-in character classification. Make
it behave the same in ICU as libc (even though it doesn't actually
make use of either provider).

Discussion: https://postgr.es/m/87v8hoexdv.fsf@news-spur.riddles.org.uk
---
 src/backend/commands/collationcmds.c          | 43 ++++++----
 src/backend/commands/dbcommands.c             | 42 ++++++----
 src/backend/utils/adt/pg_locale.c             | 83 ++++++++++++++-----
 .../regress/expected/collate.icu.utf8.out     |  6 ++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 +
 5 files changed, 127 insertions(+), 51 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..dd6cd2682f 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,39 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 						 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the locale string. Otherwise,
-			 * canonicalize to a language tag.
-			 */
-			if (!IsBinaryUpgrade)
+			if (strcmp(colliculocale, "C") == 0 ||
+				strcmp(colliculocale, "POSIX") == 0)
 			{
-				char	   *langtag = icu_language_tag(colliculocale,
-													   icu_validation_level);
-
-				if (langtag && strcmp(colliculocale, langtag) != 0)
+				if (!collisdeterministic)
+					ereport(ERROR,
+							(errmsg("nondeterministic collations not supported for C or POSIX locale")));
+				if (collicurules != NULL)
+					ereport(ERROR,
+							(errmsg("RULES not supported for C or POSIX locale")));
+			}
+			else
+			{
+				/*
+				 * During binary upgrade, preserve the locale
+				 * string. Otherwise, canonicalize to a language tag.
+				 */
+				if (!IsBinaryUpgrade)
 				{
-					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
-									langtag, colliculocale)));
+					char	   *langtag = icu_language_tag(colliculocale,
+														   icu_validation_level);
+
+					if (langtag && strcmp(colliculocale, langtag) != 0)
+					{
+						ereport(NOTICE,
+								(errmsg("using standard form \"%s\" for locale \"%s\"",
+										langtag, colliculocale)));
 
-					colliculocale = langtag;
+						colliculocale = langtag;
+					}
 				}
-			}
 
-			icu_validate_locale(colliculocale);
+				icu_validate_locale(colliculocale);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 99d4080ea9..bfce8dc348 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1058,27 +1058,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
 
-		/*
-		 * During binary upgrade, or when the locale came from the template
-		 * database, preserve locale string. Otherwise, canonicalize to a
-		 * language tag.
-		 */
-		if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
+		if (strcmp(dbiculocale, "C") == 0 ||
+			strcmp(dbiculocale, "POSIX") == 0)
 		{
-			char	   *langtag = icu_language_tag(dbiculocale,
-												   icu_validation_level);
-
-			if (langtag && strcmp(dbiculocale, langtag) != 0)
+			if (dbicurules != NULL)
+				ereport(ERROR,
+						(errmsg("ICU_RULES not supported for C or POSIX locale")));
+		}
+		else
+		{
+			/*
+			 * During binary upgrade, or when the locale came from the
+			 * template database, preserve locale string. Otherwise,
+			 * canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
 			{
-				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
-								langtag, dbiculocale)));
+				char	   *langtag = icu_language_tag(dbiculocale,
+													   icu_validation_level);
+
+				if (langtag && strcmp(dbiculocale, langtag) != 0)
+				{
+					ereport(NOTICE,
+							(errmsg("using standard form \"%s\" for locale \"%s\"",
+									langtag, dbiculocale)));
 
-				dbiculocale = langtag;
+					dbiculocale = langtag;
+				}
 			}
-		}
 
-		icu_validate_locale(dbiculocale);
+			icu_validate_locale(dbiculocale);
+		}
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16ae0..986dcbd2a7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1246,8 +1246,15 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		}
 		else
 		{
-			cache_entry->collate_is_c = false;
-			cache_entry->ctype_is_c = false;
+			Datum		datum;
+			const char *colliculocale;
+
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colliculocale);
+			colliculocale = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(colliculocale, "C") == 0) ||
+										 (strcmp(colliculocale, "POSIX") == 0));
+			cache_entry->ctype_is_c = cache_entry->collate_is_c;
 		}
 
 		cache_entry->flags_valid = true;
@@ -1279,16 +1286,27 @@ lc_collate_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_COLLATE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_COLLATE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_COLLATE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_COLLATE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1332,16 +1350,27 @@ lc_ctype_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_CTYPE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_CTYPE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_CTYPE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_CTYPE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1375,7 +1404,13 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 
-	collator = pg_ucol_open(iculocstr);
+	if (strcmp(iculocstr, "C") == 0 || strcmp(iculocstr, "POSIX") == 0)
+	{
+		Assert(icurules == NULL);
+		collator = NULL;
+	}
+	else
+		collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1650,6 +1685,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (strcmp("C", collcollate) == 0 || strcmp("POSIX", collcollate) == 0)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
@@ -1668,9 +1706,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	else
 #endif
 		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
+			pg_strncasecmp("C.", collcollate, 2) != 0)
 	{
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
@@ -2457,6 +2493,13 @@ pg_ucol_open(const char *loc_str)
 	if (loc_str == NULL)
 		elog(ERROR, "opening default collator is not supported");
 
+	/*
+	 * Must never open special values C or POSIX, which are treated specially
+	 * and not passed to the provider.
+	 */
+	if (strcmp(loc_str, "C") == 0 || strcmp(loc_str, "POSIX") == 0)
+		elog(ERROR, "unexpected ICU locale string: %s", loc_str);
+
 	/*
 	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
 	 * the root locale. If the first component of the locale is "und", replace
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index c658ee1404..bfc28ecfcf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1043,12 +1043,18 @@ ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'C', deterministic = false); -- fails
+ERROR:  nondeterministic collations not supported for C or POSIX locale
+CREATE COLLATION testx (provider = icu, locale = 'C', rules = '&V << w <<< W'); -- fails
+ERROR:  RULES not supported for C or POSIX locale
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'POSIX'); DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 7bd0901281..572dc5a50a 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -379,9 +379,13 @@ CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, nee
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'C', deterministic = false); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'C', rules = '&V << w <<< W'); -- fails
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'POSIX'); DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
-- 
2.34.1

v6-0002-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v6-0002-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From 05654d18cbc4156fcd0c6a0ec1267c60a36294db Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v6 2/4] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 +++--
 doc/src/sgml/ref/createdb.sgml                |  5 +++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++---
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++++----
 src/bin/initdb/initdb.c                       | 11 ++++++---
 src/bin/scripts/createdb.c                    |  5 ++++
 src/bin/scripts/t/020_createdb.pl             |  4 ++--
 src/test/icu/t/010_database.pl                | 23 ++++++++++++-------
 .../regress/expected/collate.icu.utf8.out     | 22 +++++++++---------
 10 files changed, 65 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..844773ff44 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index dd6cd2682f..c165922121 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -288,7 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					if (langtag && strcmp(colliculocale, langtag) != 0)
 					{
 						ereport(NOTICE,
-								(errmsg("using standard form \"%s\" for locale \"%s\"",
+								(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 										langtag, colliculocale)));
 
 						colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index bfce8dc348..a478a2287f 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1080,7 +1087,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				if (langtag && strcmp(dbiculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, dbiculocale)));
 
 					dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 31156e863b..fc27804413 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2163,7 +2163,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2405,7 +2409,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2421,6 +2425,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = pg_strdup(locale);
 	}
 
 	/*
@@ -3296,7 +3302,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..58e98ebb9c 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -219,6 +219,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index d0830a4a1d..c3cd674440 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -137,7 +137,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -145,7 +145,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index d3901f5d3f..ea2be008af 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,17 +51,24 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
 );
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index bfc28ecfcf..fdf07db36e 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1200,9 +1200,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1210,7 +1210,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1227,12 +1227,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1241,7 +1241,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1269,13 +1269,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1325,9 +1325,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

v6-0003-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patchtext/x-patch; charset=UTF-8; name=v6-0003-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patchDownload

From 80f04077dc9224ddad479d373d82c2b1f5a1fc41 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 12:15:33 -0700
Subject: [PATCH v6 3/4] pg_upgrade: check for ICU locale C in versions 15 and
 earlier.

ICU collations with locale "C" (and equivalently "POSIX") were not
supported before version 16, but could still be created. Users with
such collations would not get expected behavior.

Reject upgrading clusters with such collations, so that we can support
"C" and "POSIX" locales for the ICU provider without risk of
corrupting indexes during pg_upgrade.
---
 src/bin/pg_upgrade/check.c | 97 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..17c75cf186 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -26,6 +26,7 @@ static void check_for_tables_with_oids(ClusterInfo *cluster);
 static void check_for_composite_data_type_usage(ClusterInfo *cluster);
 static void check_for_reg_data_type_usage(ClusterInfo *cluster);
 static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
+static void check_icu_c_before_16(ClusterInfo *cluster);
 static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
@@ -164,6 +165,9 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		check_icu_c_before_16(&old_cluster);
+
 	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
 		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
 		check_for_jsonb_9_4_usage(&old_cluster);
@@ -1233,6 +1237,99 @@ check_for_aclitem_data_type_usage(ClusterInfo *cluster)
 		check_ok();
 }
 
+/*
+ * check_icu_c_before_16
+ *
+ *  Version 16 adds support for the ICU C locale, but it was possible to
+ *  (incorrectly) create it in prior versions. Check for this invalid ICU
+ *  locale name in version 15 and earlier.
+ */
+static void
+check_icu_c_before_16(ClusterInfo *cluster)
+{
+	PGresult   *dat_res;
+	PGconn	   *template1_conn = connectToServer(cluster, "template1");
+	int			dat_ntups;
+	int			i_datname;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for ICU collations with locale \"C\"");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "icu_c_before_16.txt");
+
+	/* check pg_database */
+	dat_res = executeQueryOrDie(template1_conn,
+								"SELECT datname "
+								"FROM pg_catalog.pg_database "
+								"WHERE datlocprovider='i' "
+								"AND daticulocale IN ('C','POSIX')");
+
+	i_datname = PQfnumber(dat_res, "datname");
+
+	dat_ntups = PQntuples(dat_res);
+
+	for (int rowno = 0; rowno < dat_ntups; rowno++)
+	{
+		if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+			pg_fatal("could not open file \"%s\": %s",
+					 output_path, strerror(errno));
+		fprintf(script, "default collation for database %s\n",
+				PQgetvalue(dat_res, rowno, i_datname));
+	}
+
+	PQclear(dat_res);
+	PQfinish(template1_conn);
+
+	/* check pg_collation in each database */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *db_conn = connectToServer(cluster, active_db->db_name);
+		PGresult   *coll_res;
+		int			coll_ntups;
+		int			i_collid;
+		int			i_collname;
+
+		coll_res = executeQueryOrDie(db_conn,
+									 "SELECT oid as collid, collname "
+									 "FROM pg_catalog.pg_collation "
+									 "WHERE collprovider='i' "
+									 "AND colliculocale IN ('C','POSIX')");
+
+		i_collid = PQfnumber(coll_res, "collid");
+		i_collname = PQfnumber(coll_res, "collname");
+
+		coll_ntups = PQntuples(coll_res);
+
+		for (int rowno = 0; rowno < coll_ntups; rowno++)
+		{
+			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+				pg_fatal("could not open file \"%s\": %s",
+						 output_path, strerror(errno));
+			fprintf(script, "database %s collation %s (oid=%s)\n",
+					active_db->db_name,
+					PQgetvalue(coll_res, rowno, i_collname),
+					PQgetvalue(coll_res, rowno, i_collid));
+		}
+
+		PQclear(coll_res);
+		PQfinish(db_conn);
+	}
+
+	if (script)
+	{
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains ICU collations with the locale \"C\" or \"POSIX\",\n"
+				 "which are not supported until version 16. Earlier versions using ICU \n"
+				 "collations with the \"C\" or \"POSIX\" locales cannot be upgraded.");
+	}
+
+	check_ok();
+}
+
 /*
  * check_for_jsonb_9_4_usage()
  *
-- 
2.34.1

v6-0004-Use-database-default-collation-s-provider-as-defa.patchtext/x-patch; charset=UTF-8; name=v6-0004-Use-database-default-collation-s-provider-as-defa.patchDownload

From 184644da55144df9295d81ac3b4cd061473d98ce Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 09:53:02 -0700
Subject: [PATCH v6 4/4] Use database default collation's provider as default
 for CREATE COLLATION.

---
 doc/src/sgml/ref/create_collation.sgml           |  9 ++++++---
 src/backend/commands/collationcmds.c             |  7 ++++++-
 src/test/regress/expected/collate.linux.utf8.out | 10 +++++-----
 src/test/regress/sql/collate.linux.utf8.sql      | 10 +++++-----
 4 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..a6927a7d1d 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -121,9 +121,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
       <para>
        Specifies the provider to use for locale services associated with this
        collation.  Possible values are
-       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
-       (if the server was built with ICU support) or <literal>libc</literal>.
-       <literal>libc</literal> is the default.  See <xref
+       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
+       the server was built with ICU support) or <literal>libc</literal>.  If
+       <replaceable>lc_colllate</replaceable> or
+       <replaceable>lc_ctype</replaceable> is specified, the default is
+       <literal>libc</literal>; otherwise, the default is the same as the
+       database default collation's provider.  See <xref
        linkend="locale-providers"/> for details.
       </para>
      </listitem>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c165922121..8fc0ff1903 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -226,7 +226,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				collprovider = default_locale.provider;
+		}
 
 		if (localeEl)
 		{
diff --git a/src/test/regress/expected/collate.linux.utf8.out b/src/test/regress/expected/collate.linux.utf8.out
index 6d34667ceb..6b0cc95ae8 100644
--- a/src/test/regress/expected/collate.linux.utf8.out
+++ b/src/test/regress/expected/collate.linux.utf8.out
@@ -1026,7 +1026,7 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
@@ -1034,7 +1034,7 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
 NOTICE:  collation "test0" already exists, skipping
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 NOTICE:  collation "test0" for encoding "UTF8" already exists, skipping
 do $$
 BEGIN
@@ -1046,7 +1046,7 @@ END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
 ERROR:  parameter "lc_ctype" must be specified
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 ERROR:  could not create locale "nonsense": No such file or directory
 DETAIL:  The operating system could not find any locale data for the locale name "nonsense".
 CREATE COLLATION test4 FROM nonsense;
@@ -1166,8 +1166,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 
 -- nondeterministic collations
 -- (not supported with libc provider)
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 ERROR:  nondeterministic collations not supported with this provider
 -- cleanup
 SET client_min_messages TO warning;
diff --git a/src/test/regress/sql/collate.linux.utf8.sql b/src/test/regress/sql/collate.linux.utf8.sql
index 2b787507c5..cc25f95ac3 100644
--- a/src/test/regress/sql/collate.linux.utf8.sql
+++ b/src/test/regress/sql/collate.linux.utf8.sql
@@ -358,13 +358,13 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
@@ -374,7 +374,7 @@ BEGIN
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
@@ -455,8 +455,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 -- nondeterministic collations
 -- (not supported with libc provider)
 
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 
 
 -- cleanup
-- 
2.34.1

#77

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#69)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-05-22 at 14:34 +0200, Peter Eisentraut wrote:

Please put blank lines between

</sect3>
<sect3 ...>

etc., matching existing style.

We usually don't capitalize the collation parameters like

CREATE COLLATION mycollation1 (PROVIDER = icu, LOCALE = 'ja-JP);

elsewhere in the documentation.

Table 24.2. ICU Collation Settings should probably be sorted by key,
or
at least by something.

All tables should referenced in the text, like "Table x.y shows this
and
that." (Note that a table could float to a different page in some
output formats, so just putting it into a section without some
introductory text isn't sound.)

Thank you, done.

Table 24.1. ICU Collation Levels shows punctuation as level 4, which
is
only true in shifted mode, which isn't the default. The whole
business
of treating variable collation elements is getting a bit lost in this
description. The kv option is described as "Classes of characters
ignored during comparison at level 3.", which is effectively true but
not the whole picture.

I organized the documentation around practical examples and available
options, and less around the conceptual model. I think that's a good
start, but you're right that it over-simplifies in a few areas.

Discussing the model would work better along with an explanation of ICU
rules, where you can make better use of those concepts. I feel like
there are some interesting things that can be done with rules, but I
haven't had a chance to really dig in yet.

Regards,
Jeff Davis

#78

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#74)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

#1

postgres=# create database test1 locale='fr_FR.UTF-8';
NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of

I don't see a problem here. If you specify LOCALE to CREATE DATABASE,
you should either be using "TEMPLATE template0", or you should be
expecting an error if the LOCALE doesn't match exactly.

What would you like to see happen here?

What's odd is that initdb starting in an fr_FR.UTF-8 environment
found that "fr" was the default ICU locale to use, whereas
"create database" reports that "fr" and "fr_FR.UTF-8" refer to
incompatible locales.

To me initdb is wrong when coming up with the less precise "fr"
instead of "fr-FR".

I suggest the attached patch to call uloc_getDefault() instead of the
current code that somehow leaves out the country/region component.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Attachments:

initdb-uloc_getdefault.difftext/x-patch; name=initdb-uloc_getdefault.diffDownload

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 31156e863b..09a5c98cc0 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2354,42 +2354,13 @@ icu_validate_locale(const char *loc_str)
 }
 
 /*
- * Determine default ICU locale by opening the default collator and reading
- * its locale.
- *
- * NB: The default collator (opened using NULL) is different from the collator
- * for the root locale (opened with "", "und", or "root"). The former depends
- * on the environment (useful at initdb time) and the latter does not.
+ * Determine the default ICU locale
  */
 static char *
 default_icu_locale(void)
 {
 #ifdef USE_ICU
-	UCollator  *collator;
-	UErrorCode	status;
-	const char *valid_locale;
-	char	   *default_locale;
-
-	status = U_ZERO_ERROR;
-	collator = ucol_open(NULL, &status);
-	if (U_FAILURE(status))
-		pg_fatal("could not open collator for default locale: %s",
-				 u_errorName(status));
-
-	status = U_ZERO_ERROR;
-	valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE,
-										&status);
-	if (U_FAILURE(status))
-	{
-		ucol_close(collator);
-		pg_fatal("could not determine default ICU locale");
-	}
-
-	default_locale = pg_strdup(valid_locale);
-
-	ucol_close(collator);
-
-	return default_locale;
+	return pg_strdup(uloc_getDefault());
 #else
 	pg_fatal("ICU is not supported in this build");
 #endif

#79

[1]: https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a4efa16db7351e62293f8ef0c37aac8d2
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a4efa16db7351e62293f8ef0c37aac8d2

pgsql@j-davis.com

over 2 years ago

In reply to: Daniel Verite (#78)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-05-26 at 18:24 +0200, Daniel Verite wrote:

To me initdb is wrong when coming up with the less precise "fr"
instead of "fr-FR".

I suggest the attached patch to call uloc_getDefault() instead of the
current code that somehow leaves out the country/region component.

Thank you. I experimented with several ICU versions and different
environmental settings, and it does seem better at preserving the name
that comes from the environment.

There is a warning in the docs: "Do not use unless you know what you
are doing."[1]https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a4efa16db7351e62293f8ef0c37aac8d2 I don't see a reason for the warning or any major risk
from us using it. Perhaps it's because the result is affected by either
the environment or the last uloc_setDefault() call. We don't use
uloc_setDefault(), and we only call uloc_getDefault() once, so I don't
see a risk here.

The fix seems simple enough. Committed.

Regards,
Jeff Davis

#80

pgsql@j-davis.com

over 2 years ago

In reply to: Jeff Davis (#76)

4 attachment(s)

Re: Order changes in PG16 since ICU introduction

New patch series attached. I plan to commit 0001 and 0002 soon, unless
there are objections.

0001 causes the "C" and "POSIX" locales to be treated with
memcmp/pg_ascii semantics in ICU, just like in libc. We also considered
a new "none" provider, but it's more invasive, and we can always
reconsider that in the v17 cycle.

0002 introduces an upgrade check for users who have explicitly
requested provider=icu and iculocale=C on older versions, and rejects
upgrading from v15 in that case to avoid index corruption. Having such
a collation is almost certainly a mistake by the user, because the
collator would not give the expected memcmp semantics.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

v8-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patchtext/x-patch; charset=UTF-8; name=v8-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patchDownload

From b542c98403b382ad634ea4915057ee170434fcea Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH v8 1/4] ICU: support locale "C" with the same behavior as
 libc.

The "C" locale doesn't actually use a provider at all, it's a special
locale that uses memcmp() and built-in character classification. Make
it behave the same in ICU as libc (even though it doesn't actually
make use of either provider).

Discussion: https://postgr.es/m/87v8hoexdv.fsf@news-spur.riddles.org.uk
---
 src/backend/commands/collationcmds.c          | 43 ++++++----
 src/backend/commands/dbcommands.c             | 42 ++++++----
 src/backend/utils/adt/pg_locale.c             | 83 ++++++++++++++-----
 .../regress/expected/collate.icu.utf8.out     |  6 ++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 +
 5 files changed, 127 insertions(+), 51 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..dd6cd2682f 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,39 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 						 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the locale string. Otherwise,
-			 * canonicalize to a language tag.
-			 */
-			if (!IsBinaryUpgrade)
+			if (strcmp(colliculocale, "C") == 0 ||
+				strcmp(colliculocale, "POSIX") == 0)
 			{
-				char	   *langtag = icu_language_tag(colliculocale,
-													   icu_validation_level);
-
-				if (langtag && strcmp(colliculocale, langtag) != 0)
+				if (!collisdeterministic)
+					ereport(ERROR,
+							(errmsg("nondeterministic collations not supported for C or POSIX locale")));
+				if (collicurules != NULL)
+					ereport(ERROR,
+							(errmsg("RULES not supported for C or POSIX locale")));
+			}
+			else
+			{
+				/*
+				 * During binary upgrade, preserve the locale
+				 * string. Otherwise, canonicalize to a language tag.
+				 */
+				if (!IsBinaryUpgrade)
 				{
-					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
-									langtag, colliculocale)));
+					char	   *langtag = icu_language_tag(colliculocale,
+														   icu_validation_level);
+
+					if (langtag && strcmp(colliculocale, langtag) != 0)
+					{
+						ereport(NOTICE,
+								(errmsg("using standard form \"%s\" for locale \"%s\"",
+										langtag, colliculocale)));
 
-					colliculocale = langtag;
+						colliculocale = langtag;
+					}
 				}
-			}
 
-			icu_validate_locale(colliculocale);
+				icu_validate_locale(colliculocale);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 99d4080ea9..bfce8dc348 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1058,27 +1058,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
 
-		/*
-		 * During binary upgrade, or when the locale came from the template
-		 * database, preserve locale string. Otherwise, canonicalize to a
-		 * language tag.
-		 */
-		if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
+		if (strcmp(dbiculocale, "C") == 0 ||
+			strcmp(dbiculocale, "POSIX") == 0)
 		{
-			char	   *langtag = icu_language_tag(dbiculocale,
-												   icu_validation_level);
-
-			if (langtag && strcmp(dbiculocale, langtag) != 0)
+			if (dbicurules != NULL)
+				ereport(ERROR,
+						(errmsg("ICU_RULES not supported for C or POSIX locale")));
+		}
+		else
+		{
+			/*
+			 * During binary upgrade, or when the locale came from the
+			 * template database, preserve locale string. Otherwise,
+			 * canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
 			{
-				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
-								langtag, dbiculocale)));
+				char	   *langtag = icu_language_tag(dbiculocale,
+													   icu_validation_level);
+
+				if (langtag && strcmp(dbiculocale, langtag) != 0)
+				{
+					ereport(NOTICE,
+							(errmsg("using standard form \"%s\" for locale \"%s\"",
+									langtag, dbiculocale)));
 
-				dbiculocale = langtag;
+					dbiculocale = langtag;
+				}
 			}
-		}
 
-		icu_validate_locale(dbiculocale);
+			icu_validate_locale(dbiculocale);
+		}
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16ae0..986dcbd2a7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1246,8 +1246,15 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		}
 		else
 		{
-			cache_entry->collate_is_c = false;
-			cache_entry->ctype_is_c = false;
+			Datum		datum;
+			const char *colliculocale;
+
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colliculocale);
+			colliculocale = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(colliculocale, "C") == 0) ||
+										 (strcmp(colliculocale, "POSIX") == 0));
+			cache_entry->ctype_is_c = cache_entry->collate_is_c;
 		}
 
 		cache_entry->flags_valid = true;
@@ -1279,16 +1286,27 @@ lc_collate_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_COLLATE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_COLLATE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_COLLATE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_COLLATE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1332,16 +1350,27 @@ lc_ctype_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_CTYPE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_CTYPE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_CTYPE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_CTYPE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1375,7 +1404,13 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 
-	collator = pg_ucol_open(iculocstr);
+	if (strcmp(iculocstr, "C") == 0 || strcmp(iculocstr, "POSIX") == 0)
+	{
+		Assert(icurules == NULL);
+		collator = NULL;
+	}
+	else
+		collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1650,6 +1685,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (strcmp("C", collcollate) == 0 || strcmp("POSIX", collcollate) == 0)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
@@ -1668,9 +1706,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	else
 #endif
 		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
+			pg_strncasecmp("C.", collcollate, 2) != 0)
 	{
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
@@ -2457,6 +2493,13 @@ pg_ucol_open(const char *loc_str)
 	if (loc_str == NULL)
 		elog(ERROR, "opening default collator is not supported");
 
+	/*
+	 * Must never open special values C or POSIX, which are treated specially
+	 * and not passed to the provider.
+	 */
+	if (strcmp(loc_str, "C") == 0 || strcmp(loc_str, "POSIX") == 0)
+		elog(ERROR, "unexpected ICU locale string: %s", loc_str);
+
 	/*
 	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
 	 * the root locale. If the first component of the locale is "und", replace
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index c658ee1404..bfc28ecfcf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1043,12 +1043,18 @@ ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'C', deterministic = false); -- fails
+ERROR:  nondeterministic collations not supported for C or POSIX locale
+CREATE COLLATION testx (provider = icu, locale = 'C', rules = '&V << w <<< W'); -- fails
+ERROR:  RULES not supported for C or POSIX locale
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'POSIX'); DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 7bd0901281..572dc5a50a 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -379,9 +379,13 @@ CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, nee
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'C', deterministic = false); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'C', rules = '&V << w <<< W'); -- fails
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'POSIX'); DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
-- 
2.34.1

v8-0002-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patchtext/x-patch; charset=UTF-8; name=v8-0002-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patchDownload

From 0a02ac57d65a81080eb34d9ba028e032857d1188 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 12:15:33 -0700
Subject: [PATCH v8 2/4] pg_upgrade: check for ICU locale C in versions 15 and
 earlier.

ICU collations with locale "C" (and equivalently "POSIX") were not
supported before version 16, but could still be created. Users with
such collations would not get expected behavior.

Reject upgrading clusters with such collations, so that we can support
"C" and "POSIX" locales for the ICU provider without risk of
corrupting indexes during pg_upgrade.
---
 src/bin/pg_upgrade/check.c | 97 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..17c75cf186 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -26,6 +26,7 @@ static void check_for_tables_with_oids(ClusterInfo *cluster);
 static void check_for_composite_data_type_usage(ClusterInfo *cluster);
 static void check_for_reg_data_type_usage(ClusterInfo *cluster);
 static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
+static void check_icu_c_before_16(ClusterInfo *cluster);
 static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
@@ -164,6 +165,9 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		check_icu_c_before_16(&old_cluster);
+
 	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
 		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
 		check_for_jsonb_9_4_usage(&old_cluster);
@@ -1233,6 +1237,99 @@ check_for_aclitem_data_type_usage(ClusterInfo *cluster)
 		check_ok();
 }
 
+/*
+ * check_icu_c_before_16
+ *
+ *  Version 16 adds support for the ICU C locale, but it was possible to
+ *  (incorrectly) create it in prior versions. Check for this invalid ICU
+ *  locale name in version 15 and earlier.
+ */
+static void
+check_icu_c_before_16(ClusterInfo *cluster)
+{
+	PGresult   *dat_res;
+	PGconn	   *template1_conn = connectToServer(cluster, "template1");
+	int			dat_ntups;
+	int			i_datname;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for ICU collations with locale \"C\"");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "icu_c_before_16.txt");
+
+	/* check pg_database */
+	dat_res = executeQueryOrDie(template1_conn,
+								"SELECT datname "
+								"FROM pg_catalog.pg_database "
+								"WHERE datlocprovider='i' "
+								"AND daticulocale IN ('C','POSIX')");
+
+	i_datname = PQfnumber(dat_res, "datname");
+
+	dat_ntups = PQntuples(dat_res);
+
+	for (int rowno = 0; rowno < dat_ntups; rowno++)
+	{
+		if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+			pg_fatal("could not open file \"%s\": %s",
+					 output_path, strerror(errno));
+		fprintf(script, "default collation for database %s\n",
+				PQgetvalue(dat_res, rowno, i_datname));
+	}
+
+	PQclear(dat_res);
+	PQfinish(template1_conn);
+
+	/* check pg_collation in each database */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *db_conn = connectToServer(cluster, active_db->db_name);
+		PGresult   *coll_res;
+		int			coll_ntups;
+		int			i_collid;
+		int			i_collname;
+
+		coll_res = executeQueryOrDie(db_conn,
+									 "SELECT oid as collid, collname "
+									 "FROM pg_catalog.pg_collation "
+									 "WHERE collprovider='i' "
+									 "AND colliculocale IN ('C','POSIX')");
+
+		i_collid = PQfnumber(coll_res, "collid");
+		i_collname = PQfnumber(coll_res, "collname");
+
+		coll_ntups = PQntuples(coll_res);
+
+		for (int rowno = 0; rowno < coll_ntups; rowno++)
+		{
+			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+				pg_fatal("could not open file \"%s\": %s",
+						 output_path, strerror(errno));
+			fprintf(script, "database %s collation %s (oid=%s)\n",
+					active_db->db_name,
+					PQgetvalue(coll_res, rowno, i_collname),
+					PQgetvalue(coll_res, rowno, i_collid));
+		}
+
+		PQclear(coll_res);
+		PQfinish(db_conn);
+	}
+
+	if (script)
+	{
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains ICU collations with the locale \"C\" or \"POSIX\",\n"
+				 "which are not supported until version 16. Earlier versions using ICU \n"
+				 "collations with the \"C\" or \"POSIX\" locales cannot be upgraded.");
+	}
+
+	check_ok();
+}
+
 /*
  * check_for_jsonb_9_4_usage()
  *
-- 
2.34.1

v8-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v8-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From 4959605b5695e03491fc355aa3e13b522103f988 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v8 3/4] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 +++--
 doc/src/sgml/ref/createdb.sgml                |  5 +++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++---
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++++----
 src/bin/initdb/initdb.c                       | 11 ++++++---
 src/bin/scripts/createdb.c                    |  5 ++++
 src/bin/scripts/t/020_createdb.pl             |  4 ++--
 src/test/icu/t/010_database.pl                | 23 ++++++++++++-------
 .../regress/expected/collate.icu.utf8.out     | 22 +++++++++---------
 10 files changed, 65 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..844773ff44 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index dd6cd2682f..c165922121 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -288,7 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					if (langtag && strcmp(colliculocale, langtag) != 0)
 					{
 						ereport(NOTICE,
-								(errmsg("using standard form \"%s\" for locale \"%s\"",
+								(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 										langtag, colliculocale)));
 
 						colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index bfce8dc348..a478a2287f 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1080,7 +1087,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				if (langtag && strcmp(dbiculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, dbiculocale)));
 
 					dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 09a5c98cc0..c7696dda10 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2163,7 +2163,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2376,7 +2380,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2392,6 +2396,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = pg_strdup(locale);
 	}
 
 	/*
@@ -3267,7 +3273,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..58e98ebb9c 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -219,6 +219,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index d0830a4a1d..c3cd674440 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -137,7 +137,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -145,7 +145,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index d3901f5d3f..ea2be008af 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,17 +51,24 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
 );
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index bfc28ecfcf..fdf07db36e 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1200,9 +1200,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1210,7 +1210,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1227,12 +1227,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1241,7 +1241,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1269,13 +1269,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1325,9 +1325,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

v8-0004-Use-database-default-collation-s-provider-as-defa.patchtext/x-patch; charset=UTF-8; name=v8-0004-Use-database-default-collation-s-provider-as-defa.patchDownload

From a02b6d160ddbc4db584b13c842b8ed82644baf9f Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 09:53:02 -0700
Subject: [PATCH v8 4/4] Use database default collation's provider as default
 for CREATE COLLATION.

---
 doc/src/sgml/ref/create_collation.sgml           |  9 ++++++---
 src/backend/commands/collationcmds.c             |  7 ++++++-
 src/test/regress/expected/collate.linux.utf8.out | 10 +++++-----
 src/test/regress/sql/collate.linux.utf8.sql      | 10 +++++-----
 4 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..a6927a7d1d 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -121,9 +121,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
       <para>
        Specifies the provider to use for locale services associated with this
        collation.  Possible values are
-       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
-       (if the server was built with ICU support) or <literal>libc</literal>.
-       <literal>libc</literal> is the default.  See <xref
+       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
+       the server was built with ICU support) or <literal>libc</literal>.  If
+       <replaceable>lc_colllate</replaceable> or
+       <replaceable>lc_ctype</replaceable> is specified, the default is
+       <literal>libc</literal>; otherwise, the default is the same as the
+       database default collation's provider.  See <xref
        linkend="locale-providers"/> for details.
       </para>
      </listitem>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c165922121..8fc0ff1903 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -226,7 +226,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				collprovider = default_locale.provider;
+		}
 
 		if (localeEl)
 		{
diff --git a/src/test/regress/expected/collate.linux.utf8.out b/src/test/regress/expected/collate.linux.utf8.out
index 6d34667ceb..6b0cc95ae8 100644
--- a/src/test/regress/expected/collate.linux.utf8.out
+++ b/src/test/regress/expected/collate.linux.utf8.out
@@ -1026,7 +1026,7 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
@@ -1034,7 +1034,7 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
 NOTICE:  collation "test0" already exists, skipping
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 NOTICE:  collation "test0" for encoding "UTF8" already exists, skipping
 do $$
 BEGIN
@@ -1046,7 +1046,7 @@ END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
 ERROR:  parameter "lc_ctype" must be specified
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 ERROR:  could not create locale "nonsense": No such file or directory
 DETAIL:  The operating system could not find any locale data for the locale name "nonsense".
 CREATE COLLATION test4 FROM nonsense;
@@ -1166,8 +1166,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 
 -- nondeterministic collations
 -- (not supported with libc provider)
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 ERROR:  nondeterministic collations not supported with this provider
 -- cleanup
 SET client_min_messages TO warning;
diff --git a/src/test/regress/sql/collate.linux.utf8.sql b/src/test/regress/sql/collate.linux.utf8.sql
index 2b787507c5..cc25f95ac3 100644
--- a/src/test/regress/sql/collate.linux.utf8.sql
+++ b/src/test/regress/sql/collate.linux.utf8.sql
@@ -358,13 +358,13 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
@@ -374,7 +374,7 @@ BEGIN
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
@@ -455,8 +455,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 -- nondeterministic collations
 -- (not supported with libc provider)
 
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 
 
 -- cleanup
-- 
2.34.1

#81

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#80)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

New patch series attached. I plan to commit 0001 and 0002 soon, unless
there are objections.

0001 causes the "C" and "POSIX" locales to be treated with
memcmp/pg_ascii semantics in ICU, just like in libc. We also
considered a new "none" provider, but it's more invasive, and we can
always reconsider that in the v17 cycle.

FWIW I don't quite see how 0001 improve things or what problem it's
trying to solve.

0001 creates exceptions throughout the code so that when an ICU
collation has a locale name "C" or "POSIX" then it does not behave
like an ICU collation, even though pg_collation.collprovider='i'
To me it's neither desirable nor necessary that a collation that
has collprovider='i' is diverted to non-ICU semantics.

Also in the current state, this diversion does not apply to initdb.

"initdb --icu-locale=C" with 0001 applied reports this:

Using language tag "en-US-u-va-posix" for ICU locale "C".
The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: en-US-u-va-posix
LC_COLLATE: fr_FR.UTF-8
[...]

and "initdb --locale=C" reports this:

Using default ICU locale "fr_FR".
Using language tag "fr-FR" for ICU locale "fr_FR".
The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: fr-FR
LC_COLLATE: C
[...]

Could you elaborate a bit more on what 0001 is meant to achieve, from
the point of view of the user?

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#82

mail@joeconway.com

over 2 years ago

In reply to: Daniel Verite (#81)

Re: Order changes in PG16 since ICU introduction

On 6/6/23 09:09, Daniel Verite wrote:

Jeff Davis wrote:

New patch series attached. I plan to commit 0001 and 0002 soon, unless
there are objections.

0001 causes the "C" and "POSIX" locales to be treated with
memcmp/pg_ascii semantics in ICU, just like in libc. We also
considered a new "none" provider, but it's more invasive, and we can
always reconsider that in the v17 cycle.

0001 creates exceptions throughout the code so that when an ICU
collation has a locale name "C" or "POSIX" then it does not behave
like an ICU collation, even though pg_collation.collprovider='i'
To me it's neither desirable nor necessary that a collation that
has collprovider='i' is diverted to non-ICU semantics.

This discussion makes me wonder (though probably too late for the v16
cycle) if we shouldn't treat "C" and "POSIX" locales to be a third
provider, something like "internal".

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#83

/messages/by-id/5f9bf4a0b040428c5db2dc1f23cc3ad96acb5672.camel@j-davis.com

pgsql@j-davis.com

over 2 years ago

In reply to: Joe Conway (#82)

Re: Order changes in PG16 since ICU introduction

On Tue, 2023-06-06 at 14:11 -0400, Joe Conway wrote:

This discussion makes me wonder (though probably too late for the v16
cycle) if we shouldn't treat "C" and "POSIX" locales to be a third
provider, something like "internal".

That's exactly what I did in v6 of this series: I created a "none"
provider, and when someone specified provider=icu iculocale=C, it would
change the provider to "none":

I'm fine with either approach.

Regards,
Jeff Davis

#84

mail@joeconway.com

over 2 years ago

In reply to: Jeff Davis (#83)

Re: Order changes in PG16 since ICU introduction

On 6/6/23 15:15, Jeff Davis wrote:

On Tue, 2023-06-06 at 14:11 -0400, Joe Conway wrote:

This discussion makes me wonder (though probably too late for the v16
cycle) if we shouldn't treat "C" and "POSIX" locales to be a third
provider, something like "internal".

That's exactly what I did in v6 of this series: I created a "none"
provider, and when someone specified provider=icu iculocale=C, it would
change the provider to "none":

/messages/by-id/5f9bf4a0b040428c5db2dc1f23cc3ad96acb5672.camel@j-davis.com

I'm fine with either approach.

Ha!

Well it seems like I am +1 on that then ;-)

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#85

pgsql@j-davis.com

over 2 years ago

In reply to: Daniel Verite (#81)

4 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Tue, 2023-06-06 at 15:09 +0200, Daniel Verite wrote:

FWIW I don't quite see how 0001 improve things or what problem it's
trying to solve.

The word "locale" is generic, so we need to make LOCALE/--locale apply
to whatever provider is being used. If "locale" only applies to libc,
using ICU will always be confusing and never be on the same level as
libc, let alone the preferred provider.

The locale "C" is a special case, documented as a non-locale. So, if
LOCALE/--locale apply to ICU, then either ICU needs to handle locale
"C" in the expected way (v8 patch series); or when we see locale "C" we
need to somehow change the provider into something that can handle it
(v6 patch series changes it to the "none" provider).

Please let me know if you disagree with the goal or the reasoning here.
If so, please explain where you think we should end up, because the
status quo does not seem great to me.

0001 creates exceptions throughout the code so that when an ICU
collation has a locale name "C" or "POSIX" then it does not behave
like an ICU collation, even though pg_collation.collprovider='i'
To me it's neither desirable nor necessary that a collation that
has collprovider='i' is diverted to non-ICU semantics.

It's not very principled, but it matches what libc does.

Also in the current state, this diversion does not apply to initdb.

"initdb --icu-locale=C" with 0001 applied reports this:

Using language tag "en-US-u-va-posix" for ICU locale "C".

Thank you. I fixed it by skipping the canonicalization for C/POSIX
locales in initdb.

Could you elaborate a bit more on what 0001 is meant to achieve, from
the point of view of the user?

It makes it so the user consistently (regardless of the provider) gets
the "no locale" behavior (as documented and historically expected) when
they specify the C or POSIX locales.

Then that enables us to change LOCALE/--locale to apply to ICU, which
means that a simple command like "initdb --locale=en_US" does a
sensible thing regardless of the default provider.

I understand you are skeptical of trying to apply an arbitrary locale
name to ICU, but if they don't specify the provider, what do you expect
to happen?

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

v9-0002-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patchtext/x-patch; charset=UTF-8; name=v9-0002-pg_upgrade-check-for-ICU-locale-C-in-versions-15-.patchDownload

From 879f68b254f2e3f531b0c92c08bebf20298376bc Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 12:15:33 -0700
Subject: [PATCH v9 2/4] pg_upgrade: check for ICU locale C in versions 15 and
 earlier.

ICU collations with locale "C" (and equivalently "POSIX") were not
supported before version 16, but could still be created. Users with
such collations would not get expected behavior.

Reject upgrading clusters with such collations, so that we can support
"C" and "POSIX" locales for the ICU provider without risk of
corrupting indexes during pg_upgrade.
---
 src/bin/pg_upgrade/check.c | 97 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 64024e3b9e..17c75cf186 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -26,6 +26,7 @@ static void check_for_tables_with_oids(ClusterInfo *cluster);
 static void check_for_composite_data_type_usage(ClusterInfo *cluster);
 static void check_for_reg_data_type_usage(ClusterInfo *cluster);
 static void check_for_aclitem_data_type_usage(ClusterInfo *cluster);
+static void check_icu_c_before_16(ClusterInfo *cluster);
 static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
@@ -164,6 +165,9 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
 		check_for_pg_role_prefix(&old_cluster);
 
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1500)
+		check_icu_c_before_16(&old_cluster);
+
 	if (GET_MAJOR_VERSION(old_cluster.major_version) == 904 &&
 		old_cluster.controldata.cat_ver < JSONB_FORMAT_CHANGE_CAT_VER)
 		check_for_jsonb_9_4_usage(&old_cluster);
@@ -1233,6 +1237,99 @@ check_for_aclitem_data_type_usage(ClusterInfo *cluster)
 		check_ok();
 }
 
+/*
+ * check_icu_c_before_16
+ *
+ *  Version 16 adds support for the ICU C locale, but it was possible to
+ *  (incorrectly) create it in prior versions. Check for this invalid ICU
+ *  locale name in version 15 and earlier.
+ */
+static void
+check_icu_c_before_16(ClusterInfo *cluster)
+{
+	PGresult   *dat_res;
+	PGconn	   *template1_conn = connectToServer(cluster, "template1");
+	int			dat_ntups;
+	int			i_datname;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for ICU collations with locale \"C\"");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "icu_c_before_16.txt");
+
+	/* check pg_database */
+	dat_res = executeQueryOrDie(template1_conn,
+								"SELECT datname "
+								"FROM pg_catalog.pg_database "
+								"WHERE datlocprovider='i' "
+								"AND daticulocale IN ('C','POSIX')");
+
+	i_datname = PQfnumber(dat_res, "datname");
+
+	dat_ntups = PQntuples(dat_res);
+
+	for (int rowno = 0; rowno < dat_ntups; rowno++)
+	{
+		if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+			pg_fatal("could not open file \"%s\": %s",
+					 output_path, strerror(errno));
+		fprintf(script, "default collation for database %s\n",
+				PQgetvalue(dat_res, rowno, i_datname));
+	}
+
+	PQclear(dat_res);
+	PQfinish(template1_conn);
+
+	/* check pg_collation in each database */
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *active_db = &cluster->dbarr.dbs[dbnum];
+		PGconn	   *db_conn = connectToServer(cluster, active_db->db_name);
+		PGresult   *coll_res;
+		int			coll_ntups;
+		int			i_collid;
+		int			i_collname;
+
+		coll_res = executeQueryOrDie(db_conn,
+									 "SELECT oid as collid, collname "
+									 "FROM pg_catalog.pg_collation "
+									 "WHERE collprovider='i' "
+									 "AND colliculocale IN ('C','POSIX')");
+
+		i_collid = PQfnumber(coll_res, "collid");
+		i_collname = PQfnumber(coll_res, "collname");
+
+		coll_ntups = PQntuples(coll_res);
+
+		for (int rowno = 0; rowno < coll_ntups; rowno++)
+		{
+			if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
+				pg_fatal("could not open file \"%s\": %s",
+						 output_path, strerror(errno));
+			fprintf(script, "database %s collation %s (oid=%s)\n",
+					active_db->db_name,
+					PQgetvalue(coll_res, rowno, i_collname),
+					PQgetvalue(coll_res, rowno, i_collid));
+		}
+
+		PQclear(coll_res);
+		PQfinish(db_conn);
+	}
+
+	if (script)
+	{
+		pg_log(PG_REPORT, "fatal");
+		pg_fatal("Your installation contains ICU collations with the locale \"C\" or \"POSIX\",\n"
+				 "which are not supported until version 16. Earlier versions using ICU \n"
+				 "collations with the \"C\" or \"POSIX\" locales cannot be upgraded.");
+	}
+
+	check_ok();
+}
+
 /*
  * check_for_jsonb_9_4_usage()
  *
-- 
2.34.1

v9-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchtext/x-patch; charset=UTF-8; name=v9-0003-Make-LOCALE-apply-to-ICU_LOCALE-for-CREATE-DATABA.patchDownload

From 361b81f1ea5153745b51d8215b3dcaeac4fdeedc Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v9 3/4] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 +++--
 doc/src/sgml/ref/createdb.sgml                |  5 +++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++---
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++++----
 src/bin/initdb/initdb.c                       | 11 ++++++---
 src/bin/scripts/createdb.c                    |  5 ++++
 src/bin/scripts/t/020_createdb.pl             |  4 ++--
 src/test/icu/t/010_database.pl                | 23 ++++++++++++-------
 .../regress/expected/collate.icu.utf8.out     | 22 +++++++++---------
 10 files changed, 65 insertions(+), 35 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..844773ff44 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index dd6cd2682f..c165922121 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -288,7 +288,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					if (langtag && strcmp(colliculocale, langtag) != 0)
 					{
 						ereport(NOTICE,
-								(errmsg("using standard form \"%s\" for locale \"%s\"",
+								(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 										langtag, colliculocale)));
 
 						colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index bfce8dc348..a478a2287f 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1080,7 +1087,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 				if (langtag && strcmp(dbiculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, dbiculocale)));
 
 					dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7aa2d871e3..93b19b952b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2163,7 +2163,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2376,7 +2380,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2392,6 +2396,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = pg_strdup(locale);
 	}
 
 	/*
@@ -3276,7 +3282,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..58e98ebb9c 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -219,6 +219,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index d0830a4a1d..c3cd674440 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -137,7 +137,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -145,7 +145,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index d3901f5d3f..ea2be008af 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,17 +51,24 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
 );
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index bfc28ecfcf..fdf07db36e 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1200,9 +1200,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1210,7 +1210,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1227,12 +1227,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1241,7 +1241,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1269,13 +1269,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1325,9 +1325,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

v9-0004-Use-database-default-collation-s-provider-as-defa.patchtext/x-patch; charset=UTF-8; name=v9-0004-Use-database-default-collation-s-provider-as-defa.patchDownload

From 6a1b9ca2a247f3e5111988292e148ba86d802005 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 09:53:02 -0700
Subject: [PATCH v9 4/4] Use database default collation's provider as default
 for CREATE COLLATION.

---
 doc/src/sgml/ref/create_collation.sgml           |  9 ++++++---
 src/backend/commands/collationcmds.c             |  7 ++++++-
 src/test/regress/expected/collate.linux.utf8.out | 10 +++++-----
 src/test/regress/sql/collate.linux.utf8.sql      | 10 +++++-----
 4 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..a6927a7d1d 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -121,9 +121,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
       <para>
        Specifies the provider to use for locale services associated with this
        collation.  Possible values are
-       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
-       (if the server was built with ICU support) or <literal>libc</literal>.
-       <literal>libc</literal> is the default.  See <xref
+       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
+       the server was built with ICU support) or <literal>libc</literal>.  If
+       <replaceable>lc_colllate</replaceable> or
+       <replaceable>lc_ctype</replaceable> is specified, the default is
+       <literal>libc</literal>; otherwise, the default is the same as the
+       database default collation's provider.  See <xref
        linkend="locale-providers"/> for details.
       </para>
      </listitem>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c165922121..8fc0ff1903 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -226,7 +226,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				collprovider = default_locale.provider;
+		}
 
 		if (localeEl)
 		{
diff --git a/src/test/regress/expected/collate.linux.utf8.out b/src/test/regress/expected/collate.linux.utf8.out
index 6d34667ceb..6b0cc95ae8 100644
--- a/src/test/regress/expected/collate.linux.utf8.out
+++ b/src/test/regress/expected/collate.linux.utf8.out
@@ -1026,7 +1026,7 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
@@ -1034,7 +1034,7 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
 NOTICE:  collation "test0" already exists, skipping
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 NOTICE:  collation "test0" for encoding "UTF8" already exists, skipping
 do $$
 BEGIN
@@ -1046,7 +1046,7 @@ END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
 ERROR:  parameter "lc_ctype" must be specified
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 ERROR:  could not create locale "nonsense": No such file or directory
 DETAIL:  The operating system could not find any locale data for the locale name "nonsense".
 CREATE COLLATION test4 FROM nonsense;
@@ -1166,8 +1166,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 
 -- nondeterministic collations
 -- (not supported with libc provider)
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 ERROR:  nondeterministic collations not supported with this provider
 -- cleanup
 SET client_min_messages TO warning;
diff --git a/src/test/regress/sql/collate.linux.utf8.sql b/src/test/regress/sql/collate.linux.utf8.sql
index 2b787507c5..cc25f95ac3 100644
--- a/src/test/regress/sql/collate.linux.utf8.sql
+++ b/src/test/regress/sql/collate.linux.utf8.sql
@@ -358,13 +358,13 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal(current_setting('lc_collate')) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
@@ -374,7 +374,7 @@ BEGIN
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
@@ -455,8 +455,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 -- nondeterministic collations
 -- (not supported with libc provider)
 
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 
 
 -- cleanup
-- 
2.34.1

v9-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patchtext/x-patch; charset=UTF-8; name=v9-0001-ICU-support-locale-C-with-the-same-behavior-as-li.patchDownload

From 54a171f643504d0f8beb2a1f8f7e29dc9639654e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH v9 1/4] ICU: support locale "C" with the same behavior as
 libc.

The "C" locale doesn't actually use a provider at all, it's a special
locale that uses memcmp() and built-in character classification. Make
it behave the same in ICU as libc (even though it doesn't actually
make use of either provider).

Discussion: https://postgr.es/m/87v8hoexdv.fsf@news-spur.riddles.org.uk
---
 src/backend/commands/collationcmds.c          | 43 ++++++----
 src/backend/commands/dbcommands.c             | 42 ++++++----
 src/backend/utils/adt/pg_locale.c             | 83 ++++++++++++++-----
 src/bin/initdb/initdb.c                       | 25 ++++--
 .../regress/expected/collate.icu.utf8.out     |  6 ++
 src/test/regress/sql/collate.icu.utf8.sql     |  4 +
 6 files changed, 144 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..dd6cd2682f 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,39 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 						 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the locale string. Otherwise,
-			 * canonicalize to a language tag.
-			 */
-			if (!IsBinaryUpgrade)
+			if (strcmp(colliculocale, "C") == 0 ||
+				strcmp(colliculocale, "POSIX") == 0)
 			{
-				char	   *langtag = icu_language_tag(colliculocale,
-													   icu_validation_level);
-
-				if (langtag && strcmp(colliculocale, langtag) != 0)
+				if (!collisdeterministic)
+					ereport(ERROR,
+							(errmsg("nondeterministic collations not supported for C or POSIX locale")));
+				if (collicurules != NULL)
+					ereport(ERROR,
+							(errmsg("RULES not supported for C or POSIX locale")));
+			}
+			else
+			{
+				/*
+				 * During binary upgrade, preserve the locale
+				 * string. Otherwise, canonicalize to a language tag.
+				 */
+				if (!IsBinaryUpgrade)
 				{
-					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
-									langtag, colliculocale)));
+					char	   *langtag = icu_language_tag(colliculocale,
+														   icu_validation_level);
+
+					if (langtag && strcmp(colliculocale, langtag) != 0)
+					{
+						ereport(NOTICE,
+								(errmsg("using standard form \"%s\" for locale \"%s\"",
+										langtag, colliculocale)));
 
-					colliculocale = langtag;
+						colliculocale = langtag;
+					}
 				}
-			}
 
-			icu_validate_locale(colliculocale);
+				icu_validate_locale(colliculocale);
+			}
 		}
 
 		/*
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 99d4080ea9..bfce8dc348 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1058,27 +1058,37 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("ICU locale must be specified")));
 
-		/*
-		 * During binary upgrade, or when the locale came from the template
-		 * database, preserve locale string. Otherwise, canonicalize to a
-		 * language tag.
-		 */
-		if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
+		if (strcmp(dbiculocale, "C") == 0 ||
+			strcmp(dbiculocale, "POSIX") == 0)
 		{
-			char	   *langtag = icu_language_tag(dbiculocale,
-												   icu_validation_level);
-
-			if (langtag && strcmp(dbiculocale, langtag) != 0)
+			if (dbicurules != NULL)
+				ereport(ERROR,
+						(errmsg("ICU_RULES not supported for C or POSIX locale")));
+		}
+		else
+		{
+			/*
+			 * During binary upgrade, or when the locale came from the
+			 * template database, preserve locale string. Otherwise,
+			 * canonicalize to a language tag.
+			 */
+			if (!IsBinaryUpgrade && dbiculocale != src_iculocale)
 			{
-				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
-								langtag, dbiculocale)));
+				char	   *langtag = icu_language_tag(dbiculocale,
+													   icu_validation_level);
+
+				if (langtag && strcmp(dbiculocale, langtag) != 0)
+				{
+					ereport(NOTICE,
+							(errmsg("using standard form \"%s\" for locale \"%s\"",
+									langtag, dbiculocale)));
 
-				dbiculocale = langtag;
+					dbiculocale = langtag;
+				}
 			}
-		}
 
-		icu_validate_locale(dbiculocale);
+			icu_validate_locale(dbiculocale);
+		}
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16ae0..986dcbd2a7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1246,8 +1246,15 @@ lookup_collation_cache(Oid collation, bool set_flags)
 		}
 		else
 		{
-			cache_entry->collate_is_c = false;
-			cache_entry->ctype_is_c = false;
+			Datum		datum;
+			const char *colliculocale;
+
+			datum = SysCacheGetAttrNotNull(COLLOID, tp, Anum_pg_collation_colliculocale);
+			colliculocale = TextDatumGetCString(datum);
+
+			cache_entry->collate_is_c = ((strcmp(colliculocale, "C") == 0) ||
+										 (strcmp(colliculocale, "POSIX") == 0));
+			cache_entry->ctype_is_c = cache_entry->collate_is_c;
 		}
 
 		cache_entry->flags_valid = true;
@@ -1279,16 +1286,27 @@ lc_collate_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_COLLATE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_COLLATE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_COLLATE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_COLLATE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1332,16 +1350,27 @@ lc_ctype_is_c(Oid collation)
 	if (collation == DEFAULT_COLLATION_OID)
 	{
 		static int	result = -1;
-		char	   *localeptr;
-
-		if (default_locale.provider == COLLPROVIDER_ICU)
-			return false;
+		const char *localeptr;
 
 		if (result >= 0)
 			return (bool) result;
-		localeptr = setlocale(LC_CTYPE, NULL);
-		if (!localeptr)
-			elog(ERROR, "invalid LC_CTYPE setting");
+
+		if (default_locale.provider == COLLPROVIDER_ICU)
+		{
+#ifdef USE_ICU
+			localeptr = default_locale.info.icu.locale;
+#else
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("ICU is not supported in this build")));
+#endif
+		}
+		else
+		{
+			localeptr = setlocale(LC_CTYPE, NULL);
+			if (!localeptr)
+				elog(ERROR, "invalid LC_CTYPE setting");
+		}
 
 		if (strcmp(localeptr, "C") == 0)
 			result = true;
@@ -1375,7 +1404,13 @@ make_icu_collator(const char *iculocstr,
 #ifdef USE_ICU
 	UCollator  *collator;
 
-	collator = pg_ucol_open(iculocstr);
+	if (strcmp(iculocstr, "C") == 0 || strcmp(iculocstr, "POSIX") == 0)
+	{
+		Assert(icurules == NULL);
+		collator = NULL;
+	}
+	else
+		collator = pg_ucol_open(iculocstr);
 
 	/*
 	 * If rules are specified, we extract the rules of the standard collation,
@@ -1650,6 +1685,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (strcmp("C", collcollate) == 0 || strcmp("POSIX", collcollate) == 0)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
@@ -1668,9 +1706,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	else
 #endif
 		if (collprovider == COLLPROVIDER_LIBC &&
-			pg_strcasecmp("C", collcollate) != 0 &&
-			pg_strncasecmp("C.", collcollate, 2) != 0 &&
-			pg_strcasecmp("POSIX", collcollate) != 0)
+			pg_strncasecmp("C.", collcollate, 2) != 0)
 	{
 #if defined(__GLIBC__)
 		/* Use the glibc version because we don't have anything better. */
@@ -2457,6 +2493,13 @@ pg_ucol_open(const char *loc_str)
 	if (loc_str == NULL)
 		elog(ERROR, "opening default collator is not supported");
 
+	/*
+	 * Must never open special values C or POSIX, which are treated specially
+	 * and not passed to the provider.
+	 */
+	if (strcmp(loc_str, "C") == 0 || strcmp(loc_str, "POSIX") == 0)
+		elog(ERROR, "unexpected ICU locale string: %s", loc_str);
+
 	/*
 	 * In ICU versions 54 and earlier, "und" is not a recognized spelling of
 	 * the root locale. If the first component of the locale is "und", replace
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 09a5c98cc0..7aa2d871e3 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2428,14 +2428,23 @@ setlocales(void)
 			printf(_("Using default ICU locale \"%s\".\n"), icu_locale);
 		}
 
-		/* canonicalize to a language tag */
-		langtag = icu_language_tag(icu_locale);
-		printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
-			   langtag, icu_locale);
-		pg_free(icu_locale);
-		icu_locale = langtag;
-
-		icu_validate_locale(icu_locale);
+		if (strcmp(icu_locale, "C") == 0 ||
+			strcmp(icu_locale, "POSIX") == 0)
+		{
+			if (icu_rules != NULL)
+				pg_fatal("RULES not supported for C or POSIX locale");
+		}
+		else
+		{
+			/* canonicalize to a language tag */
+			langtag = icu_language_tag(icu_locale);
+			printf(_("Using language tag \"%s\" for ICU locale \"%s\".\n"),
+				   langtag, icu_locale);
+			pg_free(icu_locale);
+			icu_locale = langtag;
+
+			icu_validate_locale(icu_locale);
+		}
 
 		/*
 		 * In supported builds, the ICU locale ID will be opened during
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index c658ee1404..bfc28ecfcf 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1043,12 +1043,18 @@ ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = 'C', deterministic = false); -- fails
+ERROR:  nondeterministic collations not supported for C or POSIX locale
+CREATE COLLATION testx (provider = icu, locale = 'C', rules = '&V << w <<< W'); -- fails
+ERROR:  RULES not supported for C or POSIX locale
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'POSIX'); DROP COLLATION testx;
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 7bd0901281..572dc5a50a 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -379,9 +379,13 @@ CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, nee
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'C', deterministic = false); -- fails
+CREATE COLLATION testx (provider = icu, locale = 'C', rules = '&V << w <<< W'); -- fails
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = 'POSIX'); DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
-- 
2.34.1

#86

mail@joeconway.com

over 2 years ago

In reply to: Jeff Davis (#85)

Re: Order changes in PG16 since ICU introduction

On 6/6/23 15:18, Jeff Davis wrote:

On Tue, 2023-06-06 at 15:09 +0200, Daniel Verite wrote:

FWIW I don't quite see how 0001 improve things or what problem it's
trying to solve.

The word "locale" is generic, so we need to make LOCALE/--locale apply
to whatever provider is being used. If "locale" only applies to libc,
using ICU will always be confusing and never be on the same level as
libc, let alone the preferred provider.

Agree 100%

The locale "C" is a special case, documented as a non-locale. So, if
LOCALE/--locale apply to ICU, then either ICU needs to handle locale
"C" in the expected way (v8 patch series); or when we see locale "C" we
need to somehow change the provider into something that can handle it
(v6 patch series changes it to the "none" provider).

+1 to the latter approach

Please let me know if you disagree with the goal or the reasoning here.
If so, please explain where you think we should end up, because the
status quo does not seem great to me.

also +1

0001 creates exceptions throughout the code so that when an ICU
collation has a locale name "C" or "POSIX" then it does not behave
like an ICU collation, even though pg_collation.collprovider='i'
To me it's neither desirable nor necessary that a collation that
has collprovider='i' is diverted to non-ICU semantics.

It's not very principled, but it matches what libc does.

Makes sense to me

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#87

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Joe Conway (#86)

Re: Order changes in PG16 since ICU introduction

Joe Conway <mail@joeconway.com> writes:

On 6/6/23 15:18, Jeff Davis wrote:

The locale "C" is a special case, documented as a non-locale. So, if
LOCALE/--locale apply to ICU, then either ICU needs to handle locale
"C" in the expected way (v8 patch series); or when we see locale "C" we
need to somehow change the provider into something that can handle it
(v6 patch series changes it to the "none" provider).

+1 to the latter approach

Also +1, except that I find "none" a rather confusing choice of name.
There *is* a provider, it's just PG itself not either libc or ICU.
I thought Joe's suggestion of "internal" made more sense.

regards, tom lane

#88

Robert Haas

robertmhaas@gmail.com

over 2 years ago

In reply to: Tom Lane (#87)

Re: Order changes in PG16 since ICU introduction

On Tue, Jun 6, 2023 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joe Conway <mail@joeconway.com> writes:

On 6/6/23 15:18, Jeff Davis wrote:

The locale "C" is a special case, documented as a non-locale. So, if
LOCALE/--locale apply to ICU, then either ICU needs to handle locale
"C" in the expected way (v8 patch series); or when we see locale "C" we
need to somehow change the provider into something that can handle it
(v6 patch series changes it to the "none" provider).

+1 to the latter approach

Also +1, except that I find "none" a rather confusing choice of name.
There *is* a provider, it's just PG itself not either libc or ICU.
I thought Joe's suggestion of "internal" made more sense.

Or perhaps "builtin" or "postgresql".

I'm just thinking that "internal" as a type name kind of means "you
shouldn't be touching this from SQL" and we don't want to give people
the idea that the "C" locale isn't something you should use.

--
Robert Haas
EDB: http://www.enterprisedb.com

#89

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Robert Haas (#88)

Re: Order changes in PG16 since ICU introduction

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 6, 2023 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also +1, except that I find "none" a rather confusing choice of name.
There *is* a provider, it's just PG itself not either libc or ICU.
I thought Joe's suggestion of "internal" made more sense.

Or perhaps "builtin" or "postgresql".

Either OK by me

regards, tom lane

#90

mail@joeconway.com

over 2 years ago

In reply to: Tom Lane (#89)

Re: Order changes in PG16 since ICU introduction

On 6/6/23 15:55, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 6, 2023 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also +1, except that I find "none" a rather confusing choice of name.
There *is* a provider, it's just PG itself not either libc or ICU.
I thought Joe's suggestion of "internal" made more sense.

Or perhaps "builtin" or "postgresql".

Either OK by me

Same here

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#91

jkatz@postgresql.org

over 2 years ago

In reply to: Joe Conway (#90)

Re: Order changes in PG16 since ICU introduction

On 6/6/23 3:56 PM, Joe Conway wrote:

On 6/6/23 15:55, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 6, 2023 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also +1, except that I find "none" a rather confusing choice of name.
There *is* a provider, it's just PG itself not either libc or ICU.
I thought Joe's suggestion of "internal" made more sense.

Or perhaps "builtin" or "postgresql".

Either OK by me

Same here

Since we're bikeshedding, "postgresql" or "builtin" could make it seem
to a (app) developer that these may be recommended options, as we're
trusting PostgreSQL to make the best choices for us. Granted, v16 is
(theoretically) defaulting to ICU, so that choice is made, but the
unsuspecting developer could make a switch based on that naming.

However, I don't have a strong alternative -- I understand the concern
about "internal", so I'd be OK with "postgresql" unless a better name
appears.

Jonathan

#92

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Jonathan S. Katz (#91)

Re: Order changes in PG16 since ICU introduction

"Jonathan S. Katz" <jkatz@postgresql.org> writes:

Since we're bikeshedding, "postgresql" or "builtin" could make it seem
to a (app) developer that these may be recommended options, as we're
trusting PostgreSQL to make the best choices for us. Granted, v16 is
(theoretically) defaulting to ICU, so that choice is made, but the
unsuspecting developer could make a switch based on that naming.

I don't think this is a problem, as long as any locale name other
than C/POSIX fails when combined with that provider name.

regards, tom lane

#93

andrew@tao11.riddles.org.uk

over 2 years ago

In reply to: Joe Conway (#90)

Re: Order changes in PG16 since ICU introduction

"Joe" == Joe Conway <mail@joeconway.com> writes:

On 6/6/23 15:55, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jun 6, 2023 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also +1, except that I find "none" a rather confusing choice of name.
There *is* a provider, it's just PG itself not either libc or ICU.
I thought Joe's suggestion of "internal" made more sense.

Or perhaps "builtin" or "postgresql".

Either OK by me

Joe> Same here

I like either "internal" or "builtin" because they correctly identify
that no external resources are used. I'm not keen on "postgresql".

--
Andrew.

#94

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#71)

Re: Order changes in PG16 since ICU introduction

On 22.05.23 19:35, Jeff Davis wrote:

On Thu, 2023-05-11 at 13:07 +0200, Peter Eisentraut wrote:

Here is my proposed patch for this.

The commit message makes it sound like lc_collate/ctype are completely
obsolete, and I don't think that's quite right: they still represent
the server environment, which does still matter in some cases.

I'd just say that they are too confusing (likely to be misused), and
becoming obsolete (or less relevant), or something along those lines.

Otherwise, this is fine with me. I didn't do a detailed review because
it's just mechanical.

I have committed this with some tuning of the commit message.

#95

[1]: /messages/by-id/360c90b9-7c20-4cec-aade-38e6e3351c05@manitou-mail.org
/messages/by-id/360c90b9-7c20-4cec-aade-38e6e3351c05@manitou-mail.org

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#85)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

The locale "C" is a special case, documented as a non-locale. So, if
LOCALE/--locale apply to ICU, then either ICU needs to handle locale
"C" in the expected way (v8 patch series); or when we see locale "C" we
need to somehow change the provider into something that can handle it
(v6 patch series changes it to the "none" provider).

Yes it's a special case but when doing initdb --locale=C, a user does
not need or want an ICU locale. They want the same thing than what v15
does with the same arguments: a template0 database with
datlocprovider='c', datcollate='C', datctype='C', dateiculocale=NULL.

The simplest way to obtain that in v16 is to teach initdb that
--locale=C without the --locale-provider option implies that
--locale-provider=libc ([1]/messages/by-id/360c90b9-7c20-4cec-aade-38e6e3351c05@manitou-mail.org)

OTOH what you're proposing with the 0001 patch is much more involved
in terms of tweaking the ICU code so that dateiculocale='C' and
datlocprovider='i' becomes a combination that provides the C semantics
that ICU doesn't have natively.

I don't agree with the reasoning that to make progress with
the other uses of --locale, we need to start by either making ICU
support C/POSIX (0001/0002), or add a new "none/builtin" provider
(previous set of patches).
v16 should not need it any more than v15 did, if v16 does the same as
v15 on locale='C', that is not involve ICU at all.

Then that enables us to change LOCALE/--locale to apply to ICU, which
means that a simple command like "initdb --locale=en_US" does a
sensible thing regardless of the default provider.

I understand you are skeptical of trying to apply an arbitrary locale
name to ICU, but if they don't specify the provider, what do you expect
to happen?

It's a hard question because it depends on what people have in their
locale environment combined with what they try to do.
I think that initdb without any locale option should work well in
the majority of environments, but specifying a locale alone will not work
well in a number of cases, so users might end up concluding that they
need to specify not only the provider but lc_collate/lc_ctype.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#96

peter@eisentraut.org

over 2 years ago

In reply to: Jeff Davis (#80)

Re: Order changes in PG16 since ICU introduction

On 05.06.23 19:54, Jeff Davis wrote:

New patch series attached. I plan to commit 0001 and 0002 soon, unless
there are objections.

0001 causes the "C" and "POSIX" locales to be treated with
memcmp/pg_ascii semantics in ICU, just like in libc. We also considered
a new "none" provider, but it's more invasive, and we can always
reconsider that in the v17 cycle.

0002 introduces an upgrade check for users who have explicitly
requested provider=icu and iculocale=C on older versions, and rejects
upgrading from v15 in that case to avoid index corruption. Having such
a collation is almost certainly a mistake by the user, because the
collator would not give the expected memcmp semantics.

I'm dubious about these two.

0003 seems like the correct direction. In createdb.c, the change you
add makes sense, but you should also remove the existing use of the
locale variable:

- if (locale)
- {
- if (!lc_ctype)
- lc_ctype = locale;
- if (!lc_collate)
- lc_collate = locale;
- }
-

#97

peter@eisentraut.org

over 2 years ago

In reply to: Jeff Davis (#80)

Re: Order changes in PG16 since ICU introduction

On 05.06.23 19:54, Jeff Davis wrote:

New patch series attached.

Could you clarify what here is intended for 16 and what is for later?
This patch set keeps expanding and changing in each iteration.

There is a PG16 open item linked to this thread

* The rules for choosing default ICU locale seem pretty unfriendly

which I think would be addressed by an appropriately fixed up variant of
your patch 0003.

(Or if not, what is the actual issue?)

Everything else appears to be either new feature work or fixing
pre-existing prehavior, so is not in scope for PG16 and should be dealt
with elsewhere, so we can focus here on closing out this release.

#98

pgsql@j-davis.com

over 2 years ago

In reply to: Daniel Verite (#95)

Re: Order changes in PG16 since ICU introduction

On Wed, 2023-06-07 at 23:50 +0200, Daniel Verite wrote:

The simplest way to obtain that in v16 is to teach initdb that
--locale=C without the --locale-provider option implies that
--locale-provider=libc ([1])

As I replied in that subthread, that creates a worse problem: if you
only change the provider when the locale is C, then what about when the
locale is *not* C?

export LANG=en_US.UTF-8
initdb -D data --locale=fr_FR.UTF-8
...
provider: icu
ICU locale: en-US

I believe that case is an order of magnitude worse than the other cases
you brought up in that subthread.

It also leaves the fundamental problem in place that LOCALE only
applies to the libc provider, which multiple people have agreed is not
acceptable.

Regards,
Jeff Davis

#99

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#97)

Re: Order changes in PG16 since ICU introduction

On Thu, 2023-06-08 at 00:11 +0200, Peter Eisentraut wrote:

On 05.06.23 19:54, Jeff Davis wrote:

New patch series attached.

Could you clarify what here is intended for 16 and what is for later?

I apologize about the patch churn here. I implemented several
approaches to see what feedback I get, and now it looks like we're
returning to a previous idea (the "builtin" provider).

In v16:

1. We need LOCALE to apply to all providers.

2. We need LOCALE=C to give the memcmp/pg_ascii behavior in all cases
(unless overridden by a separate LC_COLLATE or LC_CTYPE parameter).

Those are the biggest problems raised in this thread, and the patches
to accomplish those things are in scope for v16.

After we sort those out, there are two loose ends:

* What do we do in the case where the environment has LANG=C.UTF-8 (as
some buildfarm members do)? Is an error acceptable in that case?

* Do we move icu_validation_level back to ERROR?

Regards,
Jeff Davis

#100

Tatsuo Ishii

ishii@sraoss.co.jp

over 2 years ago

In reply to: Jeff Davis (#98)

Re: Order changes in PG16 since ICU introduction

Hi,

On Wed, 2023-06-07 at 23:50 +0200, Daniel Verite wrote:

The simplest way to obtain that in v16 is to teach initdb that
--locale=C without the --locale-provider option implies that
--locale-provider=libc ([1])

As I replied in that subthread, that creates a worse problem: if you
only change the provider when the locale is C, then what about when the
locale is *not* C?

export LANG=en_US.UTF-8
initdb -D data --locale=fr_FR.UTF-8
...
provider: icu
ICU locale: en-US

I believe that case is an order of magnitude worse than the other cases
you brought up in that subthread.

It also leaves the fundamental problem in place that LOCALE only
applies to the libc provider, which multiple people have agreed is not
acceptable.

Daniels comment:

Yes it's a special case but when doing initdb --locale=C, a user does
not need or want an ICU locale. They want the same thing than what v15
does with the same arguments: a template0 database with
datlocprovider='c', datcollate='C', datctype='C', dateiculocale=NULL.

So in this case the only way to keep the same behavior in v16 with "initdb
--locale=C" (--no-locale) in v15 is, bulding PostgreSQL --without-icu?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#101

mail@joeconway.com

over 2 years ago

In reply to: Jeff Davis (#99)

Re: Order changes in PG16 since ICU introduction

On 6/7/23 19:26, Jeff Davis wrote:

* What do we do in the case where the environment has LANG=C.UTF-8 (as
some buildfarm members do)? Is an error acceptable in that case?

If I understand the discussion so far correctly, I think that case
should fall to the provider.

If it supports "C.UTF-8" explicitly as some distributions do, then use it.

If the provider has no such thing, throw an error.

Somewhere we should document that "C.UTF-8" from the provider might not
be as stable or working as they expect.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#102

Tatsuo Ishii

ishii@sraoss.co.jp

over 2 years ago

In reply to: Tatsuo Ishii (#100)

Re: Order changes in PG16 since ICU introduction

As I replied in that subthread, that creates a worse problem: if you
only change the provider when the locale is C, then what about when the
locale is *not* C?

export LANG=en_US.UTF-8
initdb -D data --locale=fr_FR.UTF-8
...
provider: icu
ICU locale: en-US

I believe that case is an order of magnitude worse than the other cases
you brought up in that subthread.

It also leaves the fundamental problem in place that LOCALE only
applies to the libc provider, which multiple people have agreed is not
acceptable.

Note that most of PostgreSQL users in Japan do initdb
--no-locale. Almost never use other than C locale because the users do
not rely on system collation. Most database have an extra column which
represents the pronunciation in Hiragana or Katakana.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#103

daniel@manitou-mail.org

over 2 years ago

In reply to: Tatsuo Ishii (#100)

Re: Order changes in PG16 since ICU introduction

Tatsuo Ishii wrote:

Yes it's a special case but when doing initdb --locale=C, a user does
not need or want an ICU locale. They want the same thing than what v15
does with the same arguments: a template0 database with
datlocprovider='c', datcollate='C', datctype='C', dateiculocale=NULL.

So in this case the only way to keep the same behavior in v16 with "initdb
--locale=C" (--no-locale) in v15 is, bulding PostgreSQL --without-icu?

AFAIK the --no-locale case in v16 is fixed since:

commit 5cd1a5af4d17496a58678c8eb7ab792119c2d723
Author: Jeff Davis <jdavis@postgresql.org>
Date: Fri Apr 21 13:11:18 2023 -0700

Fix initdb --no-locale.

Discussion: /messages/by-id/878relf7cb.fsf@news-spur.riddles.org.uk
Reported-by: Andrew Gierth

The --locale=C case is still being discussed. To me it should
produce the same result than --no-locale and --locale=C in v15, that is,
"ICU is the default" does not apply to that case, but currently
it initializes the cluster with an ICU locale.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#104

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#98)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

As I replied in that subthread, that creates a worse problem: if you
only change the provider when the locale is C, then what about when the
locale is *not* C?

export LANG=en_US.UTF-8
initdb -D data --locale=fr_FR.UTF-8
...
provider: icu
ICU locale: en-US

What you're proposing with the 0003 patch still applies.

In the above case I think we would end up with:

provider=icu
ICU locale=fr-FR
lc_collate=fr_FR.UTF-8
lc_lctype=fr_FR.UTF-8

which is reasonable.

In the following cases we would initialize a libc cluster instead of an
ICU cluster:

- initdb --locale=C
- initdb --locale=POSIX
- LANG=C initdb
- LANG=C.UTF-8 initdb
- LANG=POSIX initdb
- ... possibly other locales that we find are unsuitable for ICU

That is, the rule "ICU by default" really means "ICU unless the locale
that we're being passed or getting from the environment
has semantics that ICU does not provide but we know libc provides,
in which case we fall back to libc".

The user who wants ICU imperatively should invoke
--icu-locale=something or --locale=something --locale-provider=icu
in which case we should not fallback to libc.
We still have to determine lc_collate and lc_ctype either from the
environment or from the locale argument (I think we should
favor the environment), except if the user specifies
--lc-collate=... lc-ctype=...

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#105

pgsql@j-davis.com

over 2 years ago

In reply to: Joe Conway (#101)

Re: Order changes in PG16 since ICU introduction

On Wed, 2023-06-07 at 20:52 -0400, Joe Conway wrote:

If the provider has no such thing, throw an error.

Just to be clear, that implies that users (and buildfarm members) with
LANG=C.UTF-8 in their environment would not be able to run a plain
"initdb -D data"; they'd get an error. It's hard for me to estimate how
many users might be inconvenienced by that, but it sounds like a risk.

Perhaps for this specific case, and only in initdb, we change
C.anything and POSIX.anything to the builtin provider? CREATE DATABASE
and CREATE COLLATION could still reject such locales.

Regards,
Jeff Davis

#106

mail@joeconway.com

over 2 years ago

In reply to: Jeff Davis (#105)

Re: Order changes in PG16 since ICU introduction

On 6/8/23 17:15, Jeff Davis wrote:

On Wed, 2023-06-07 at 20:52 -0400, Joe Conway wrote:

If the provider has no such thing, throw an error.

Just to be clear, that implies that users (and buildfarm members) with
LANG=C.UTF-8 in their environment would not be able to run a plain
"initdb -D data"; they'd get an error. It's hard for me to estimate how
many users might be inconvenienced by that, but it sounds like a risk.

Well, but only if their libc provider does not have C.UTF-8, correct?

I see
----------------
Linux Mint 21.1: /usr/lib/locale/C.utf8
RHEL 8: /usr/lib/locale/C.utf8
RHEL 9: /usr/lib/locale/C.utf8
AL2: /usr/lib/locale/C.utf8

However I do not see it on RHEL 7 :-(

Perhaps for this specific case, and only in initdb, we change
C.anything and POSIX.anything to the builtin provider?

Might be best, with some kind of warning perhaps?

CREATE DATABASE and CREATE COLLATION could still reject such
locales.

Seems to make sense.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#107

pgsql@j-davis.com

over 2 years ago

In reply to: Andrew Gierth (#93)

4 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Tue, 2023-06-06 at 21:37 +0100, Andrew Gierth wrote:

I like either "internal" or "builtin" because they correctly identify
that no external resources are used. I'm not keen on "postgresql".

"builtin" seems to be the winner. New patch series attached with doc
and test updates.

This has been a long discussion (it's a messy problem), but I think
I've addressed the most important concerns raised. If you disagree with
something, please indicate whether it's an objection, or a more minor
difference of opinion that I can weigh against other opinions. Also
please indicate if you think something is out of scope for 16.

Patches 0001, 0002:

These patches implement the built-in provider and automatically change
provider=icu to provider=builtin when the locale is C. Other approaches
were considered:
* Pretend that ICU can support the C locale, and use similar checks
throughout the code like the libc provider does: This was somewhat of a
hack, and had potential issues with upgraded clusters, and several
people seemed to reject it.
* Switch to the libc provider for the C locale: would make the libc
provider even more complicated and had some potential for confusion,
and also has catalog representation problems when --locale is specified
along with --lc-ctype.

Ultimately we need to choose one approach, and the built-in provider
seems the nicest (though most invasive). It reflects the reality that
we don't actually use libc or icu for the C locale, and it's nicer to
document. The builtin provider seemed to get the most support.

Patch 0003:

Makes LOCALE apply to all providers. The overall feel after this patch
is that "locale" now means the collation locale, and
LC_COLLATE/LC_CTYPE are for the server environment. When using libc,
LC_COLLATE and LC_CTYPE still work as they did before, but their
relationship to database collation feels more like a special case of
the libc provider. I believe most people favor this patch and I haven't
seen recent objections.

I didn't find any surprising behaviors, but there are a few that I'd
like to draw attention to:

0. If you initdb with --locale-provider=libc, and don't specify ICU at
any later point, then none of these changes should affect you and
you'll remain on libc. If someone notices otherwise, please let me
know.

1. If you specify --locale-provider=builtin at initdb time, you *must*
specify --locale=C/POSIX, otherwise you get an error.

2. Patch 0004 is possibly out of scope for 16, but it felt consistent
with the other UI changes and low risk. Please try with/without before
objecting.

3. Daniel Verite felt that we should only change the provider from ICU
to "builtin" for the C locale if the provider is defaulting to ICU; not
if it's specified as ICU. I did not differentiate between specifying
ICU and defaulting to ICU because:
a. "libc" unconditionally uses the built-in memcmp() logic for C, it
never actually uses libc
b. If a user really wants the root locale or the en-US-u-va-posix
locale, they can specify those directly
c. I don't see any plausible case where it helps a user to keep
provider=icu when locale=C.

4. Joe Conway and Peter Eisentraut both felt that C.UTF-8 with
provider=icu should not be changed to use the builtin provider, and
instead passed on to ICU. I implemented a compromise where initdb will
change C.UTF-8 to the built-in provider; but CREATE DATABASE/COLLATION
will pass it along to ICU (which may support it as en-US-u-va-posix in
some versions, or may throw an error in other versions). My reasoning
is that initdb is pulling from the environment, and we should try
harder to succeed on any reasonable environmental settings (otherwise
initdb with default settings could fail); whereas we can be more strict
with CREATE DATABASE/COLLATION.

5. For the built-in provider, initdb defaults to UTF-8 rather than
SQL_ASCII. Otherwise, you would be unable to use ICU at all later,
because ICU doesn't support SQL_ASCII.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachments:

v10-0001-Introduce-collation-provider-builtin.patchtext/x-patch; charset=UTF-8; name=v10-0001-Introduce-collation-provider-builtin.patchDownload

From b9a27c9536c4cc2a748a293fb095e985eec9de7b Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 1 May 2023 15:38:29 -0700
Subject: [PATCH v10 1/4] Introduce collation provider "builtin".

Only supports locale "C" (or equivalently, "POSIX"). Provides
locale-unaware semantics that are implemented as fast byte operations
in Postgres, independent of the operating system or any provider
libraries.

Equivalent (in semantics and implementation) to the libc provider with
locale "C", except that LC_COLLATE and LC_CTYPE can be set
independently.

Use provider "builtin" for built-in collation "ucs_basic" instead of
libc.

Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
---
 doc/src/sgml/charset.sgml              |  89 +++++++++++++++++----
 doc/src/sgml/ref/create_collation.sgml |  11 ++-
 doc/src/sgml/ref/create_database.sgml  |   8 +-
 doc/src/sgml/ref/createdb.sgml         |   2 +-
 doc/src/sgml/ref/initdb.sgml           |   7 +-
 src/backend/catalog/pg_collation.c     |   7 +-
 src/backend/commands/collationcmds.c   | 103 +++++++++++++++++++++----
 src/backend/commands/dbcommands.c      |  69 ++++++++++++++---
 src/backend/utils/adt/pg_locale.c      |  27 ++++++-
 src/backend/utils/init/postinit.c      |  10 ++-
 src/bin/initdb/initdb.c                |  24 ++++--
 src/bin/initdb/t/001_initdb.pl         |  47 +++++++++++
 src/bin/pg_dump/pg_dump.c              |  49 +++++++++---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl |  35 +++++++--
 src/bin/psql/describe.c                |   4 +-
 src/bin/scripts/createdb.c             |   2 +-
 src/bin/scripts/t/020_createdb.pl      |  56 ++++++++++++++
 src/include/catalog/catversion.h       |   2 +-
 src/include/catalog/pg_collation.dat   |   3 +-
 src/include/catalog/pg_collation.h     |   3 +
 src/test/icu/t/010_database.pl         |   9 +++
 src/test/regress/expected/collate.out  |  25 +++++-
 src/test/regress/sql/collate.sql       |  10 +++
 23 files changed, 519 insertions(+), 83 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index ed84465996..b38cf82f83 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
    <title>Locale Providers</title>
 
    <para>
-    <productname>PostgreSQL</productname> supports multiple <firstterm>locale
-    providers</firstterm>.  This specifies which library supplies the locale
-    data.  One standard provider name is <literal>libc</literal>, which uses
-    the locales provided by the operating system C library.  These are the
-    locales used by most tools provided by the operating system.  Another
-    provider is <literal>icu</literal>, which uses the external
-    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can
-    only be used if support for ICU was configured when PostgreSQL was built.
+    A locale provider specifies which library defines the locale behavior for
+    collations and character classifications.
    </para>
 
    <para>
     The commands and tools that select the locale settings, as described
-    above, each have an option to select the locale provider.  The examples
-    shown earlier all use the <literal>libc</literal> provider, which is the
-    default.  Here is an example to initialize a database cluster using the
-    ICU provider:
+    above, each have an option to select the locale provider. Here is an
+    example to initialize a database cluster using the ICU provider:
 <programlisting>
 initdb --locale-provider=icu --icu-locale=en
 </programlisting>
@@ -370,12 +362,75 @@ initdb --locale-provider=icu --icu-locale=en
    </para>
 
    <para>
-    Which locale provider to use depends on individual requirements.  For most
-    basic uses, either provider will give adequate results.  For the libc
-    provider, it depends on what the operating system offers; some operating
-    systems are better than others.  For advanced uses, ICU offers more locale
-    variants and customization options.
+    Regardless of the locale provider, the operating system is still used to
+    provide some locale-aware behavior, such as messages (see <xref
+    linkend="guc-lc-messages"/>).
    </para>
+
+   <para>
+    The available locale providers are listed below.
+   </para>
+
+   <sect3 id="locale-provider-builtin">
+    <title>Builtin</title>
+    <para>
+     The <literal>builtin</literal> provider uses simple built-in operations
+     which are not locale-aware. Only the <literal>C</literal> (or
+     equivalently, <literal>POSIX</literal>) locales are supported for this
+     provider.
+    </para>
+    <para>
+     The collation and character classification behavior is equivalent to
+     using the <literal>libc</literal> provider with locale
+     <literal>C</literal>, except that <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently.
+    </para>
+    <note>
+     <para>
+      When using the <literal>builtin</literal> locale provider, behavior may
+      depend on the database encoding.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-icu">
+    <title>ICU</title>
+    <para>
+     The <literal>icu</literal> provider uses the external
+     ICU<indexterm><primary>ICU</primary></indexterm>
+     library. <productname>PostgreSQL</productname> must have been configured
+     with support.
+    </para>
+    <para>
+     ICU provides collation and character classification behavior that is
+     independent of the operating system and database encoding, which is
+     preferable if you expect to transition to other platforms without any
+     change in results. <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
+    </para>
+    <note>
+     <para>
+      For the ICU provider, results may depend on the version of the ICU
+      library used, as it is updated to reflect changes in natural language
+      over time.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-libc">
+    <title>libc</title>
+    <para>
+     The <literal>libc</literal> provider uses the operating system's C
+     library. The collation and character classification behavior is
+     controlled by the settings <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal>, so they cannot be set independently.
+    </para>
+    <note>
+     <para>
+      The same locale name may have different behavior on different platforms
+      when using the libc provider.
+     </para>
+    </note>
+   </sect3>
+
   </sect2>
   <sect2 id="icu-locales">
    <title>ICU Locales</title>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..c63a350c1e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -89,6 +89,11 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
        and <symbol>LC_CTYPE</symbol> at once.  If you specify this,
        you cannot specify either of those parameters.
       </para>
+      <para>
+       If <replaceable>provider</replaceable> is <literal>builtin</literal>,
+       then <replaceable>locale</replaceable> must be specified and set to
+       either <literal>C</literal> or <literal>POSIX</literal>.
+      </para>
      </listitem>
     </varlistentry>
 
@@ -120,9 +125,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
      <listitem>
       <para>
        Specifies the provider to use for locale services associated with this
-       collation.  Possible values are
-       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
-       (if the server was built with ICU support) or <literal>libc</literal>.
+       collation.  Possible values are <literal>builtin</literal>,
+       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
+       the server was built with ICU support) or <literal>libc</literal>.
        <literal>libc</literal> is the default.  See <xref
        linkend="locale-providers"/> for details.
       </para>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..5655d6c823 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -148,6 +148,12 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
         This is a shortcut for setting <symbol>LC_COLLATE</symbol>
         and <symbol>LC_CTYPE</symbol> at once.
        </para>
+       <para>
+        If <xref linkend="create-database-locale-provider"/> is
+        <literal>builtin</literal>, then <replaceable>locale</replaceable>
+        must be specified and set to either <literal>C</literal> or
+        <literal>POSIX</literal>.
+       </para>
        <tip>
         <para>
          The other locale settings <xref linkend="guc-lc-messages"/>, <xref
@@ -212,7 +218,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <listitem>
        <para>
         Specifies the provider to use for the default collation in this
-        database.  Possible values are
+        database.  Possible values are <literal>builtin</literal>,
         <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
         (if the server was built with ICU support) or <literal>libc</literal>.
         By default, the provider is the same as that of the <xref
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..ee4644818d 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -168,7 +168,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         Specifies the locale provider for the database's default collation.
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..96f84dc8ca 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -294,6 +294,11 @@ PostgreSQL documentation
         environment that <command>initdb</command> runs in. Locale
         support is described in <xref linkend="locale"/>.
        </para>
+       <para>
+        If <option>--locale-provider</option> is <literal>builtin</literal>,
+        <option>--locale</option> must be specified and set to
+        <literal>C</literal> (or equivalently, <literal>POSIX</literal>).
+       </para>
       </listitem>
      </varlistentry>
 
@@ -323,7 +328,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry id="app-initdb-option-locale-provider">
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         This option sets the locale provider for databases created in the new
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index fd022e6fc2..0b1ba359b6 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -68,7 +68,12 @@ CollationCreate(const char *collname, Oid collnamespace,
 	Assert(collname);
 	Assert(collnamespace);
 	Assert(collowner);
-	Assert((collcollate && collctype) || colliculocale);
+	Assert((collprovider == COLLPROVIDER_BUILTIN &&
+			!collcollate && !collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_LIBC &&
+			 collcollate &&  collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_ICU &&
+			!collcollate && !collctype &&  colliculocale));
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..4748946499 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -66,6 +66,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	DefElem    *deterministicEl = NULL;
 	DefElem    *rulesEl = NULL;
 	DefElem    *versionEl = NULL;
+	char	   *builtin_locale = NULL;
 	char	   *collcollate;
 	char	   *collctype;
 	char	   *colliculocale;
@@ -215,7 +216,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 
 		if (collproviderstr)
 		{
-			if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			if (pg_strcasecmp(collproviderstr, "builtin") == 0)
+				collprovider = COLLPROVIDER_BUILTIN;
+			else if (pg_strcasecmp(collproviderstr, "icu") == 0)
 				collprovider = COLLPROVIDER_ICU;
 			else if (pg_strcasecmp(collproviderstr, "libc") == 0)
 				collprovider = COLLPROVIDER_LIBC;
@@ -230,7 +233,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 
 		if (localeEl)
 		{
-			if (collprovider == COLLPROVIDER_LIBC)
+			if (collprovider == COLLPROVIDER_BUILTIN)
+			{
+				builtin_locale = defGetString(localeEl);
+			}
+			else if (collprovider == COLLPROVIDER_LIBC)
 			{
 				collcollate = defGetString(localeEl);
 				collctype = defGetString(localeEl);
@@ -245,7 +252,22 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		if (lcctypeEl)
 			collctype = defGetString(lcctypeEl);
 
-		if (collprovider == COLLPROVIDER_LIBC)
+		if (collprovider == COLLPROVIDER_BUILTIN)
+		{
+			if (!builtin_locale)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+						 errmsg("parameter \"locale\" must be specified")));
+
+			if (strcmp(builtin_locale, "C") != 0 &&
+				strcmp(builtin_locale, "POSIX") != 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("collation provider \"builtin\" does not support locale \"%s\"",
+								builtin_locale),
+						 errhint("The built-in collation provider only supports the \"C\" and \"POSIX\" locales.")));
+		}
+		else if (collprovider == COLLPROVIDER_LIBC)
 		{
 			if (!collcollate)
 				ereport(ERROR,
@@ -302,7 +324,17 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("ICU rules cannot be specified unless locale provider is ICU")));
 
-		if (collprovider == COLLPROVIDER_ICU)
+		if (collprovider == COLLPROVIDER_BUILTIN)
+		{
+			/*
+			 * Behavior may be different in different encodings, so set
+			 * collencoding to the current database encoding. No validation is
+			 * required, because the "builtin" provider is compatible with any
+			 * encoding.
+			 */
+			collencoding = GetDatabaseEncoding();
+		}
+		else if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
 			/*
@@ -331,7 +363,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
+	{
+		char *locale;
+
+		if (collprovider == COLLPROVIDER_ICU)
+			locale = colliculocale;
+		else if (collprovider == COLLPROVIDER_LIBC)
+			locale = collcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_BUILTIN */
+
+		collversion = get_collation_actual_version(collprovider, locale);
+	}
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -406,6 +449,7 @@ AlterCollation(AlterCollationStmt *stmt)
 	Form_pg_collation collForm;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 	ObjectAddress address;
@@ -430,8 +474,20 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttrNotNull(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
-	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	if (collForm->collprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_colliculocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (collForm->collprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_collcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_BUILTIN */
+
+	newversion = get_collation_actual_version(collForm->collprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -495,11 +551,18 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
 
-		datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_daticulocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_BUILTIN */
 
 		ReleaseSysCache(dbtup);
 	}
@@ -516,11 +579,19 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
 		Assert(provider != COLLPROVIDER_DEFAULT);
-		datum = SysCacheGetAttrNotNull(COLLOID, colltp,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colliculocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_BUILTIN */
 
 		ReleaseSysCache(colltp);
 	}
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 99d4080ea9..016852644f 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -909,7 +909,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
 
-		if (pg_strcasecmp(locproviderstr, "icu") == 0)
+		if (pg_strcasecmp(locproviderstr, "builtin") == 0)
+			dblocprovider = COLLPROVIDER_BUILTIN;
+		else if (pg_strcasecmp(locproviderstr, "icu") == 0)
 			dblocprovider = COLLPROVIDER_ICU;
 		else if (pg_strcasecmp(locproviderstr, "libc") == 0)
 			dblocprovider = COLLPROVIDER_LIBC;
@@ -1177,9 +1179,17 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 */
 	if (src_collversion && !dcollversion)
 	{
-		char	   *actual_versionstr;
+		char	*actual_versionstr;
+		char	*locale;
 
-		actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_BUILTIN */
+
+		actual_versionstr = get_collation_actual_version(dblocprovider, locale);
 		if (!actual_versionstr)
 			ereport(ERROR,
 					(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
@@ -1207,7 +1217,18 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * collation version, which is normally only the case for template0.
 	 */
 	if (dbcollversion == NULL)
-		dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+	{
+		char *locale;
+
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_BUILTIN */
+
+		dbcollversion = get_collation_actual_version(dblocprovider, locale);
+	}
 
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
@@ -2403,6 +2424,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	ObjectAddress address;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 
@@ -2429,10 +2451,24 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
-	if (isnull)
-		elog(ERROR, "unexpected null in pg_database");
-	newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
+	if (datForm->datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_daticulocale, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datForm->datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_BUILTIN */
+
+	newversion = get_collation_actual_version(datForm->datlocprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -2617,6 +2653,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 	HeapTuple	tp;
 	char		datlocprovider;
 	Datum		datum;
+	char	   *locale;
 	char	   *version;
 
 	tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
@@ -2627,8 +2664,20 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 
 	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
 
-	datum = SysCacheGetAttrNotNull(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-	version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
+	if (datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_daticulocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_BUILTIN */
+
+	version = get_collation_actual_version(datlocprovider, locale);
 
 	ReleaseSysCache(tp);
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 31e3b16ae0..d4d7affba6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1228,7 +1228,12 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			elog(ERROR, "cache lookup failed for collation %u", collation);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+		{
+			cache_entry->collate_is_c = true;
+			cache_entry->ctype_is_c = true;
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 			Datum		datum;
 			const char *collcollate;
@@ -1281,6 +1286,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_BUILTIN)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1334,6 +1342,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_BUILTIN)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1487,8 +1498,10 @@ pg_newlocale_from_collation(Oid collid)
 	{
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return &default_locale;
-		else
+		else if (default_locale.provider == COLLPROVIDER_LIBC)
 			return (pg_locale_t) 0;
+		else
+			elog(ERROR, "cannot open collation with provider \"builtin\"");
 	}
 
 	cache_entry = lookup_collation_cache(collid, false);
@@ -1513,7 +1526,11 @@ pg_newlocale_from_collation(Oid collid)
 		result.provider = collform->collprovider;
 		result.deterministic = collform->collisdeterministic;
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_BUILTIN)
+		{
+			elog(ERROR, "cannot open collation with provider \"builtin\"");
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
 			const char *collcollate;
@@ -1599,6 +1616,7 @@ pg_newlocale_from_collation(Oid collid)
 
 			collversionstr = TextDatumGetCString(datum);
 
+			Assert(collform->collprovider != COLLPROVIDER_BUILTIN);
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
 			actual_versionstr = get_collation_actual_version(collform->collprovider,
@@ -1650,6 +1668,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_BUILTIN)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 561bd13ed2..12c36d12e6 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -461,10 +461,18 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	{
 		char	   *actual_versionstr;
 		char	   *collversionstr;
+		char	   *locale;
 
 		collversionstr = TextDatumGetCString(datum);
 
-		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate);
+		if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			locale = iculocale;
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			locale = collate;
+		else
+			locale = NULL; /* COLLPROVIDER_BUILTIN */
+
+		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, locale);
 		if (!actual_versionstr)
 			/* should not happen */
 			elog(WARNING,
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 09a5c98cc0..6fc19c8d64 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2472,7 +2472,7 @@ usage(const char *progname)
 			 "                            set default locale in the respective category for\n"
 			 "                            new databases (default taken from environment)\n"));
 	printf(_("      --no-locale           equivalent to --locale=C\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={builtin|libc|icu}\n"
 			 "                            set default locale provider for new databases\n"));
 	printf(_("      --pwfile=FILE         read password for the new superuser from file\n"));
 	printf(_("  -T, --text-search-config=CFG\n"
@@ -2622,7 +2622,15 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (locale_provider == COLLPROVIDER_LIBC &&
+	if (locale_provider == COLLPROVIDER_BUILTIN &&
+		strcmp(lc_ctype, "C") == 0 &&
+		strcmp(lc_collate, "C") == 0 &&
+		strcmp(lc_time, "C") == 0 &&
+		strcmp(lc_numeric, "C") == 0 &&
+		strcmp(lc_monetary, "C") == 0 &&
+		strcmp(lc_messages, "C") == 0)
+		printf(_("The database cluster will be initialized with no locale.\n"));
+	else if (locale_provider == COLLPROVIDER_LIBC &&
 		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
@@ -2633,9 +2641,11 @@ setup_locale_encoding(void)
 	else
 	{
 		printf(_("The database cluster will be initialized with this locale configuration:\n"));
-		printf(_("  provider:    %s\n"), collprovider_name(locale_provider));
-		if (icu_locale)
-			printf(_("  ICU locale:  %s\n"), icu_locale);
+		printf(_("  default collation provider:  %s\n"), collprovider_name(locale_provider));
+		if (locale_provider == COLLPROVIDER_BUILTIN)
+			printf(_("  default collation locale:    %s\n"), "C");
+		else if (locale_provider == COLLPROVIDER_ICU)
+			printf(_("  default collation locale:    %s\n"), icu_locale);
 		printf(_("  LC_COLLATE:  %s\n"
 				 "  LC_CTYPE:    %s\n"
 				 "  LC_MESSAGES: %s\n"
@@ -3296,7 +3306,9 @@ main(int argc, char *argv[])
 										 "-c debug_discard_caches=1");
 				break;
 			case 15:
-				if (strcmp(optarg, "icu") == 0)
+				if (strcmp(optarg, "builtin") == 0)
+					locale_provider = COLLPROVIDER_BUILTIN;
+				else if (strcmp(optarg, "icu") == 0)
 					locale_provider = COLLPROVIDER_ICU;
 				else if (strcmp(optarg, "libc") == 0)
 					locale_provider = COLLPROVIDER_LIBC;
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index fa00bb3dab..157d6acfd4 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -154,6 +154,53 @@ else
 		'locale provider ICU fails since no ICU support');
 }
 
+command_ok(
+	[
+		'initdb', '--no-sync', '--locale-provider=builtin', "$tempdir/data6"
+	],
+	'locale provider builtin'
+);
+
+command_ok(
+	[
+		'initdb', '--no-sync', '--locale-provider=builtin', '--locale=C',
+		"$tempdir/data7"
+	],
+	'locale provider builtin with --locale'
+);
+
+command_ok(
+	[
+		'initdb', '--no-sync', '--locale-provider=builtin', '--lc-collate=C',
+		"$tempdir/data8"
+	],
+	'locale provider builtin with --lc-collate'
+);
+
+command_ok(
+	[
+		'initdb', '--no-sync', '--locale-provider=builtin', '--lc-ctype=C',
+		"$tempdir/data9"
+	],
+	'locale provider builtin with --lc-ctype'
+);
+
+command_fails(
+	[
+		'initdb', '--no-sync', '--locale-provider=builtin', '--icu-locale=en',
+		"$tempdir/dataX"
+	],
+	'fails for locale provider builtin with ICU locale'
+);
+
+command_fails(
+	[
+		'initdb', '--no-sync', '--locale-provider=builtin', '--icu-rules=""',
+		"$tempdir/dataX"
+	],
+	'fails for locale provider builtin with ICU rules'
+);
+
 command_fails(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
 	'fails for invalid locale provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5dab1ba9ea..c75818c3a4 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3070,7 +3070,9 @@ dumpDatabase(Archive *fout)
 	}
 
 	appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
-	if (datlocprovider[0] == 'c')
+	if (datlocprovider[0] == 'b')
+		appendPQExpBufferStr(creaQry, "builtin");
+	else if (datlocprovider[0] == 'c')
 		appendPQExpBufferStr(creaQry, "libc");
 	else if (datlocprovider[0] == 'i')
 		appendPQExpBufferStr(creaQry, "icu");
@@ -13432,7 +13434,9 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 					  fmtQualifiedDumpable(collinfo));
 
 	appendPQExpBufferStr(q, "provider = ");
-	if (collprovider[0] == 'c')
+	if (collprovider[0] == 'b')
+		appendPQExpBufferStr(q, "builtin");
+	else if (collprovider[0] == 'c')
 		appendPQExpBufferStr(q, "libc");
 	else if (collprovider[0] == 'i')
 		appendPQExpBufferStr(q, "icu");
@@ -13446,13 +13450,42 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 	if (strcmp(PQgetvalue(res, 0, i_collisdeterministic), "f") == 0)
 		appendPQExpBufferStr(q, ", deterministic = false");
 
-	if (colliculocale != NULL)
+	if (collprovider[0] == 'd')
 	{
+		Assert(colliculocale == NULL);
+		Assert(collicurules == NULL);
+		Assert(collcollate == NULL);
+		Assert(collctype == NULL);
+
+		/* no locale -- cannot be reloaded anyway */
+	}
+	else if (collprovider[0] == 'b')
+	{
+		Assert(colliculocale == NULL);
+		Assert(collicurules == NULL);
+		Assert(collcollate == NULL);
+		Assert(collctype == NULL);
+		appendPQExpBufferStr(q, ", locale = 'C'");
+	}
+	else if (collprovider[0] == 'i')
+	{
+		Assert(colliculocale != NULL);
+		Assert(collcollate == NULL);
+		Assert(collctype == NULL);
+
 		appendPQExpBufferStr(q, ", locale = ");
 		appendStringLiteralAH(q, colliculocale, fout);
+
+		if (collicurules)
+		{
+			appendPQExpBufferStr(q, ", rules = ");
+			appendStringLiteralAH(q, collicurules, fout);
+		}
 	}
-	else
+	else if (collprovider[0] == 'c')
 	{
+		Assert(colliculocale == NULL);
+		Assert(collicurules == NULL);
 		Assert(collcollate != NULL);
 		Assert(collctype != NULL);
 
@@ -13469,12 +13502,8 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 			appendStringLiteralAH(q, collctype, fout);
 		}
 	}
-
-	if (collicurules)
-	{
-		appendPQExpBufferStr(q, ", rules = ");
-		appendStringLiteralAH(q, collicurules, fout);
-	}
+	else
+		pg_fatal("unrecognized collation provider '%c'", collprovider[0]);
 
 	/*
 	 * For binary upgrade, carry over the collation version.  For normal
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 41fce089d6..22d9fc10a0 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -114,22 +114,45 @@ my $original_locale = "C";
 my $original_iculocale = "";
 my $provider_field = "'c' AS datlocprovider";
 my $iculocale_field = "NULL AS daticulocale";
-if ($oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes')
+if ($oldnode->pg_version >= 15)
 {
 	$provider_field = "datlocprovider";
 	$iculocale_field = "daticulocale";
-	$original_provider = "i";
-	$original_iculocale = "fr-CA";
+
+	if ($ENV{with_icu} eq 'yes')
+	{
+		$original_provider = "i";
+		$original_iculocale = "fr-CA";
+	}
+}
+
+# use builtin provider instead of libc, if supported
+if ($oldnode->pg_version >= 16 && $ENV{with_icu} ne 'yes')
+{
+	$original_provider = "b";
 }
 
 my @initdb_params = @custom_opts;
 
 push @initdb_params, ('--encoding', 'UTF-8');
 push @initdb_params, ('--locale', $original_locale);
-if ($original_provider eq "i")
+
+# add --locale-provider, if supported
+if ($oldnode->pg_version >= 15)
 {
-	push @initdb_params, ('--locale-provider', 'icu');
-	push @initdb_params, ('--icu-locale', 'fr-CA');
+	if ($original_provider eq "b")
+	{
+		push @initdb_params, ('--locale-provider', 'builtin');
+	}
+	elsif ($original_provider eq "i")
+	{
+		push @initdb_params, ('--locale-provider', 'icu');
+		push @initdb_params, ('--icu-locale', 'fr-CA');
+	}
+	elsif ($original_provider eq "c")
+	{
+		push @initdb_params, ('--locale-provider', 'libc');
+	}
 }
 
 $node_params{extra} = \@initdb_params;
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 9325a46b8f..5642638dfb 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -932,7 +932,7 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  "  CASE d.datlocprovider WHEN 'b' THEN 'builtin' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
@@ -4873,7 +4873,7 @@ listCollations(const char *pattern, bool verbose, bool showSystem)
 
 	if (pset.sversion >= 100000)
 		appendPQExpBuffer(&buf,
-						  "  CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  "  CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'b' THEN 'builtin' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Provider"));
 	else
 		appendPQExpBuffer(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..41a8de659e 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -299,7 +299,7 @@ help(const char *progname)
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
 	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
 	printf(_("      --icu-rules=RULES        ICU rules setting for the database\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={builtin|libc|icu}\n"
 			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
 	printf(_("  -S, --strategy=STRATEGY      database creation strategy wal_log or file_copy\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index d0830a4a1d..f1d6db0f48 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -94,6 +94,62 @@ else
 		'create database with ICU fails since no ICU support');
 }
 
+$node->command_ok(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'tbuiltin1'
+	],
+	'create database with provider "builtin"'
+);
+
+$node->command_ok(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'--locale=C', 'tbuiltin2'
+	],
+	'create database with provider "builtin" and locale "C"'
+);
+
+$node->command_ok(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'--lc-collate=C', 'tbuiltin3'
+	],
+	'create database with provider "builtin" and LC_COLLATE=C'
+);
+
+$node->command_ok(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'--lc-ctype=C', 'tbuiltin4'
+	],
+	'create database with provider "builtin" and LC_CTYPE=C'
+);
+
+$node->command_fails(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'--icu-locale=en', 'tbuiltin5'
+	],
+	'create database with provider "builtin" and ICU_LOCALE="en"'
+);
+
+$node->command_fails(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'--icu-rules=""', 'tbuiltin6'
+	],
+	'create database with provider "builtin" and ICU_RULES=""'
+);
+
+$node->command_fails(
+	[
+		'createdb', '-T', 'template1', '--locale-provider=builtin',
+		'--locale=C', 'tbuiltin7'
+	],
+	'create database with provider "builtin" not matching template'
+);
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index c784937a0e..52e17baafa 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202305211
+#define CATALOG_VERSION_NO	202306081
 
 #endif
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index b6a69d1d42..cfb53807ed 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -24,8 +24,7 @@
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
 { oid => '962', descr => 'sorts by Unicode code point',
-  collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
-  collcollate => 'C', collctype => 'C' },
+  collname => 'ucs_basic', collprovider => 'b', collencoding => '6' },
 { oid => '963',
   descr => 'sorts using the Unicode Collation Algorithm with default settings',
   collname => 'unicode', collprovider => 'i', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index bfa3568451..4009c4ec93 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -65,6 +65,7 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on
 #ifdef EXPOSE_TO_CLIENT_CODE
 
 #define COLLPROVIDER_DEFAULT	'd'
+#define COLLPROVIDER_BUILTIN	'b'
 #define COLLPROVIDER_ICU		'i'
 #define COLLPROVIDER_LIBC		'c'
 
@@ -73,6 +74,8 @@ collprovider_name(char c)
 {
 	switch (c)
 	{
+		case COLLPROVIDER_BUILTIN:
+			return "builtin";
 		case COLLPROVIDER_ICU:
 			return "icu";
 		case COLLPROVIDER_LIBC:
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index d3901f5d3f..26f71e1155 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -63,5 +63,14 @@ like(
 	qr/ERROR:  ICU locale must be specified/,
 	"ICU locale must be specified for ICU provider: error message");
 
+my ($ret, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu LOCALE_PROVIDER builtin LOCALE 'C' TEMPLATE dbicu}
+);
+isnt($ret, 0,
+	"locale provider must match template: exit code not 0");
+like(
+	$stderr,
+	qr/ERROR:  new locale provider \(builtin\) does not match locale provider of the template database \(icu\)/,
+	"locale provider must match template: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0649564485..5b28de1b47 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -650,6 +650,27 @@ EXPLAIN (COSTS OFF)
 (3 rows)
 
 -- CREATE/DROP COLLATION
+CREATE COLLATION builtin_c ( PROVIDER = builtin, LOCALE = "C" );
+CREATE COLLATION builtin_posix ( PROVIDER = builtin, LOCALE = "POSIX" );
+SELECT b FROM collate_test1 ORDER BY b COLLATE builtin_c;
+  b  
+-----
+ ABD
+ Abc
+ abc
+ bbc
+(4 rows)
+
+CREATE COLLATION builtin2 ( PROVIDER = builtin ); -- fails
+ERROR:  parameter "locale" must be specified
+CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "en_US" ); -- fails
+ERROR:  collation provider "builtin" does not support locale "en_US"
+HINT:  The built-in collation provider only supports the "C" and "POSIX" locales.
+CREATE COLLATION builtin2 ( PROVIDER = builtin, LC_CTYPE = "C", LC_COLLATE = "C" ); -- fails
+ERROR:  parameter "locale" must be specified
+CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "POSIX", LC_CTYPE = "POSIX" ); -- fails
+ERROR:  conflicting or redundant options
+DETAIL:  LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE.
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
@@ -754,7 +775,7 @@ DETAIL:  FROM cannot be specified together with any other options.
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 19 other objects
+NOTICE:  drop cascades to 21 other objects
 DETAIL:  drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
@@ -771,6 +792,8 @@ drop cascades to function dup(anyelement)
 drop cascades to table collate_test20
 drop cascades to table collate_test21
 drop cascades to table collate_test22
+drop cascades to collation builtin_c
+drop cascades to collation builtin_posix
 drop cascades to collation mycoll2
 drop cascades to table collate_test23
 drop cascades to view collate_on_int
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index c3d40fc195..01d5c69fe4 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -244,6 +244,16 @@ EXPLAIN (COSTS OFF)
 
 -- CREATE/DROP COLLATION
 
+CREATE COLLATION builtin_c ( PROVIDER = builtin, LOCALE = "C" );
+CREATE COLLATION builtin_posix ( PROVIDER = builtin, LOCALE = "POSIX" );
+
+SELECT b FROM collate_test1 ORDER BY b COLLATE builtin_c;
+
+CREATE COLLATION builtin2 ( PROVIDER = builtin ); -- fails
+CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "en_US" ); -- fails
+CREATE COLLATION builtin2 ( PROVIDER = builtin, LC_CTYPE = "C", LC_COLLATE = "C" ); -- fails
+CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "POSIX", LC_CTYPE = "POSIX" ); -- fails
+
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
-- 
2.34.1

v10-0002-ICU-for-locale-C-automatically-use-builtin-provi.patchtext/x-patch; charset=UTF-8; name=v10-0002-ICU-for-locale-C-automatically-use-builtin-provi.patchDownload

From c48d91ab80e151093dbae0ada6a9691a2f112c5a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 8 May 2023 13:48:01 -0700
Subject: [PATCH v10 2/4] ICU: for locale "C", automatically use "builtin"
 provider instead.

Postgres expects locale C to be optimizable to simple locale-unaware
byte operations; while ICU does not recognize the locale "C" at all,
and falls back to the root locale.

If the user specifies locale "C" when creating a new collation or a
new database with the ICU provider, automatically switch it to the
"builtin" provider.

If provider is libc, behavior is unchanged.

Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
---
 doc/src/sgml/charset.sgml                     |  6 +++
 doc/src/sgml/ref/create_collation.sgml        |  6 +++
 doc/src/sgml/ref/create_database.sgml         |  5 +++
 doc/src/sgml/ref/createdb.sgml                |  5 +++
 doc/src/sgml/ref/initdb.sgml                  | 23 ++++++-----
 src/backend/commands/collationcmds.c          | 20 ++++++++++
 src/backend/commands/dbcommands.c             | 21 ++++++++++
 src/bin/initdb/initdb.c                       | 17 ++++++++
 src/bin/initdb/t/001_initdb.pl                | 39 +++++++++++++++++++
 src/bin/scripts/t/020_createdb.pl             | 18 +++++++++
 .../regress/expected/collate.icu.utf8.out     | 14 +++++--
 src/test/regress/sql/collate.icu.utf8.sql     |  6 +++
 12 files changed, 168 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index b38cf82f83..9a7f9263ec 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -407,6 +407,12 @@ initdb --locale-provider=icu --icu-locale=en
      change in results. <literal>LC_COLLATE</literal> and
      <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
     </para>
+    <para>
+     The ICU provider does not accept the <literal>C</literal>
+     locale. Commands that create collations or database with the
+     <literal>icu</literal> provider and ICU locale <literal>C</literal> use
+     the provider <literal>builtin</literal> instead.
+    </para>
     <note>
      <para>
       For the ICU provider, results may depend on the version of the ICU
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index c63a350c1e..67ea247b39 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -131,6 +131,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
        <literal>libc</literal> is the default.  See <xref
        linkend="locale-providers"/> for details.
       </para>
+      <para>
+       If the provider is <literal>icu</literal> and the locale is
+       <literal>C</literal> or <literal>POSIX</literal>, the provider is
+       automatically set to <literal>builtin</literal>; as the ICU provider
+       doesn't support an ICU locale of <literal>C</literal>.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 5655d6c823..17550c19bb 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -196,6 +196,11 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
        <para>
         Specifies the ICU locale ID if the ICU locale provider is used.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>builtin</literal>, as the
+        ICU provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index ee4644818d..5352cce744 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -154,6 +154,11 @@ PostgreSQL documentation
         Specifies the ICU locale ID to be used in this database, if the
         ICU locale provider is selected.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>builtin</literal>, as the
+        ICU provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 96f84dc8ca..d9620e5931 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -94,13 +94,14 @@ PostgreSQL documentation
 
   <para>
    By default, <command>initdb</command> uses the ICU library to provide
-   locale services if the server was built with ICU support; otherwise it uses
-   the <literal>libc</literal> locale provider (see <xref
-   linkend="locale-providers"/>). To choose the specific ICU locale ID to
-   apply, use the option <option>--icu-locale</option>.  Note that for
-   implementation reasons and to support legacy code,
-   <command>initdb</command> will still select and initialize libc locale
-   settings when the ICU locale provider is used.
+   locale services if the server was built with ICU support. To choose the
+   specific ICU locale ID to apply, use the option
+   <option>--icu-locale</option>.  Note that for implementation reasons and to
+   support legacy code, <command>initdb</command> will still select and
+   initialize libc locale settings when the ICU locale provider is used. If
+   <literal>--icu-locale</literal> is <literal>C</literal> (or equivalently
+   <literal>POSIX</literal>), the <literal>builtin</literal> provider will be
+   selected instead of ICU.
   </para>
 
   <para>
@@ -109,8 +110,7 @@ PostgreSQL documentation
    <literal>--locale-provider=libc</literal>, or build the server without ICU
    support. The <literal>libc</literal> locale provider takes the locale
    settings from the environment, and determines the encoding from the locale
-   settings.  This is almost always sufficient, unless there are special
-   requirements.
+   settings.
   </para>
 
   <para>
@@ -250,6 +250,11 @@ PostgreSQL documentation
         Specifies the ICU locale when the ICU provider is used. Locale support
         is described in <xref linkend="locale"/>.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>builtin</literal>, as the
+        ICU provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
        <para>
         If this option is not specified, the locale is inherited from the
         environment in which <command>initdb</command> runs. The environment's
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 4748946499..3b2ba38250 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -252,6 +252,26 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		if (lcctypeEl)
 			collctype = defGetString(lcctypeEl);
 
+		/*
+		 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+		 * optimizable to byte operations (memcmp(), pg_ascii_tolower(),
+		 * etc.); use the builtin provider instead of ICU if the locale is C.
+		 *
+		 * Don't change the provider during binary upgrade.
+		 */
+		if (!IsBinaryUpgrade && collprovider == COLLPROVIDER_ICU &&
+			colliculocale &&
+			(strcmp(colliculocale, "C") == 0 ||
+			 strcmp(colliculocale, "POSIX") == 0))
+		{
+			ereport(NOTICE,
+					(errmsg("using locale provider \"builtin\" for ICU locale \"%s\"",
+							colliculocale)));
+			builtin_locale = colliculocale;
+			colliculocale = NULL;
+			collprovider = COLLPROVIDER_BUILTIN;
+		}
+
 		if (collprovider == COLLPROVIDER_BUILTIN)
 		{
 			if (!builtin_locale)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 016852644f..5c8caeac2d 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1043,6 +1043,27 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
 
+	/*
+	 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+	 * optimizable to byte operations (memcmp(), pg_ascii_tolower(), etc.);
+	 * use the builtin provider instead of ICU if the locale is C.
+	 *
+	 * Don't change the provider during binary upgrade or when both the
+	 * provider and ICU locale are unchanged from the template.
+	 */
+	if (!IsBinaryUpgrade && dblocprovider == COLLPROVIDER_ICU &&
+		dbiculocale &&
+		(dblocprovider != src_locprovider ||
+		 strcmp(dbiculocale, src_iculocale) != 0) &&
+		(strcmp(dbiculocale, "C") == 0 || strcmp(dbiculocale, "POSIX") == 0))
+	{
+		ereport(NOTICE,
+				(errmsg("using locale provider \"builtin\" for ICU locale \"%s\"",
+						dbiculocale)));
+		dbiculocale = NULL;
+		dblocprovider = COLLPROVIDER_BUILTIN;
+	}
+
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
 		if (!(is_encoding_supported_by_icu(encoding)))
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 6fc19c8d64..897521cf35 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2394,6 +2394,23 @@ setlocales(void)
 			lc_messages = locale;
 	}
 
+	/*
+	 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+	 * optimizable to byte operations (memcmp(), pg_ascii_tolower(), etc.);
+	 * use the builtin provider instead of ICU if the locale is C.
+	 */
+	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
+		(strcmp(icu_locale, "C") == 0 ||
+		 strcmp(icu_locale, "POSIX") == 0 ||
+		 strncmp(icu_locale, "C.", 2) == 0 ||
+		 strncmp(icu_locale, "POSIX.", 6) == 0))
+	{
+		pg_log_info("using locale provider \"builtin\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = COLLPROVIDER_BUILTIN;
+	}
+
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 157d6acfd4..f3f832a413 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -111,6 +111,45 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'option --icu-locale');
 
+	# transformed to provider=builtin
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			"$tempdir/data4a"
+		],
+		'option --icu-locale=C');
+
+	# transformed to provider=builtin
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--locale=C',
+			"$tempdir/data4b"
+		],
+		'option --icu-locale=C --locale=C');
+
+	# transformed to provider=builtin
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-collate=C',
+			"$tempdir/data4c"
+		],
+		'option --icu-locale=C --lc-collate=C');
+
+	# transformed to provider=builtin
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-ctype=C',
+			"$tempdir/data4d"
+		],
+		'option --icu-locale=C --lc-ctype=C');
+
 	command_fails_like(
 		[
 			'initdb', '--no-sync',
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index f1d6db0f48..121ad5112b 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -86,6 +86,24 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'create database with icu locale from template database with icu provider'
 	);
+
+	# transformed into provider "builtin"
+	$node->command_ok(
+		[
+			'createdb', '-T', 'template0', '--locale-provider=icu',
+			'--icu-locale=C', 'test_builtin_icu1'
+		],
+		'create database with provider "icu" and ICU_LOCALE="C"'
+	);
+
+	# transformed into provider "builtin"
+	$node->command_ok(
+		[
+			'createdb', '-T', 'template0', '--locale-provider=icu',
+			'--icu-locale=C', '--lc-ctype=C', 'test_builtin_icu2'
+		],
+		'create database with provider "icu" and ICU_LOCALE="C" and LC_CTYPE=C'
+	);
 }
 else
 {
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 00dee24549..5ba8f75558 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1035,6 +1035,9 @@ BEGIN
 END
 $$;
 RESET client_min_messages;
+-- uses "builtin" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+NOTICE:  using locale provider "builtin" for ICU locale "C"
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
@@ -1058,8 +1061,11 @@ SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
  test0
  test1
  test5
-(3 rows)
+ testc
+(4 rows)
 
+DROP COLLATION test1;
+CREATE COLLATION test1 (provider = icu, locale = 'und');
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
 ERROR:  collation "test11" already exists in schema "collate_tests"
@@ -1079,7 +1085,8 @@ SELECT collname, nspname, obj_description(pg_collation.oid, 'pg_collation')
  test0    | collate_tests | US English
  test11   | test_schema   | 
  test5    | collate_tests | 
-(3 rows)
+ testc    | collate_tests | 
+(4 rows)
 
 DROP COLLATION test0, test_schema.test11, test5;
 DROP COLLATION test0; -- fail
@@ -1089,7 +1096,8 @@ NOTICE:  collation "test0" does not exist, skipping
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%';
  collname 
 ----------
-(0 rows)
+ testc
+(1 row)
 
 DROP SCHEMA test_schema;
 DROP ROLE regress_test_role;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index a8001b4b8e..122c166966 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -375,6 +375,9 @@ $$;
 
 RESET client_min_messages;
 
+-- uses "builtin" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
@@ -388,6 +391,9 @@ CREATE COLLATION test5 FROM test0;
 
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
 
+DROP COLLATION test1;
+CREATE COLLATION test1 (provider = icu, locale = 'und');
+
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
 ALTER COLLATION test1 RENAME TO test22; -- fail
-- 
2.34.1

v10-0003-CREATE-DATABASE-make-LOCALE-apply-to-all-collati.patchtext/x-patch; charset=UTF-8; name=v10-0003-CREATE-DATABASE-make-LOCALE-apply-to-all-collati.patchDownload

From 7ee9bfc7ded5906f5b27390061513bbab3d87555 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v10 3/4] CREATE DATABASE: make LOCALE apply to all collation
 providers.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE. That could
lead to confusion when the provider is implicit, such as when it is
inherited from the template database, or when ICU was made default at
initdb time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_collation.sgml        | 23 +++++---
 doc/src/sgml/ref/create_database.sgml         | 57 +++++++++++++++----
 doc/src/sgml/ref/createdb.sgml                |  5 +-
 doc/src/sgml/ref/initdb.sgml                  |  7 ++-
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 51 +++++++++++++++--
 src/bin/initdb/initdb.c                       | 27 +++++++--
 src/bin/initdb/t/001_initdb.pl                | 28 +++++----
 src/bin/pg_dump/pg_dump.c                     | 55 ++++++++++++++----
 src/bin/scripts/createdb.c                    | 13 ++---
 src/bin/scripts/t/020_createdb.pl             | 28 +++++++--
 src/test/icu/t/010_database.pl                | 22 ++++---
 .../regress/expected/collate.icu.utf8.out     | 22 +++----
 13 files changed, 251 insertions(+), 89 deletions(-)

diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 67ea247b39..48b62f77cb 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -85,9 +85,16 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
 
      <listitem>
       <para>
-       This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-       and <symbol>LC_CTYPE</symbol> at once.  If you specify this,
-       you cannot specify either of those parameters.
+       The locale name for this collation. See <xref
+       linkend="collation-managing-create-libc"/> and <xref
+       linkend="collation-managing-create-icu"/> for details.
+      </para>
+      <para>
+       If <replaceable>provider</replaceable> is <literal>libc</literal>, this
+       is a shortcut for setting <symbol>LC_COLLATE</symbol> and
+       <symbol>LC_CTYPE</symbol> at once.  If you specify
+       <replaceable>locale</replaceable>, you cannot specify either of those
+       parameters.
       </para>
       <para>
        If <replaceable>provider</replaceable> is <literal>builtin</literal>,
@@ -102,8 +109,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
 
      <listitem>
       <para>
-       Use the specified operating system locale for
-       the <symbol>LC_COLLATE</symbol> locale category.
+       If <replaceable>provider</replaceable> is <literal>libc</literal>, use
+       the specified operating system locale for the
+       <symbol>LC_COLLATE</symbol> locale category.
       </para>
      </listitem>
     </varlistentry>
@@ -113,8 +121,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
 
      <listitem>
       <para>
-       Use the specified operating system locale for
-       the <symbol>LC_CTYPE</symbol> locale category.
+       If <replaceable>provider</replaceable> is <literal>libc</literal>, use
+       the specified operating system locale for the <symbol>LC_CTYPE</symbol>
+       locale category.
       </para>
      </listitem>
     </varlistentry>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 17550c19bb..9f20c93267 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,22 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        Sets the default collation order and character classification in the
+        new database.  Collation affects the sort order applied to strings,
+        e.g., in queries with ORDER BY, as well as the order used in indexes
+        on text columns.  Character classification affects the categorization
+        of characters, e.g., lower, - upper and digit.  Also sets the
+        associated aspects of the operating system environment,
+        <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal>.  The
+        default is the same setting as the template database.  See <xref
+        linkend="collation-managing-create-libc"/> and <xref
+        linkend="collation-managing-create-icu"/> for details.
+       </para>
+       <para>
+        Can be overridden by setting <xref
+        linkend="create-database-lc-collate"/>, <xref
+        linkend="create-database-lc-ctype"/>, or <xref
+        linkend="create-database-icu-locale"/> individually.
        </para>
        <para>
         If <xref linkend="create-database-locale-provider"/> is
@@ -170,11 +184,17 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">lc_collate</replaceable></term>
       <listitem>
        <para>
-        Collation order (<literal>LC_COLLATE</literal>) to use in the new database.
-        This affects the sort order applied to strings, e.g., in queries with
-        ORDER BY, as well as the order used in indexes on text columns.
-        The default is to use the collation order of the template database.
-        See below for additional restrictions.
+        Sets <literal>LC_COLLATE</literal> in the database server's operating
+        system environment.  The default is the setting of <xref
+        linkend="create-database-locale"/> if specified; otherwise the same
+        setting as the template database.  See below for additional
+        restrictions.
+       </para>
+       <para>
+        If <xref linkend="create-database-locale-provider"/> is
+        <literal>libc</literal>, also sets the default collation order to use
+        in the new database, overriding the setting <xref
+        linkend="create-database-locale"/>.
        </para>
       </listitem>
      </varlistentry>
@@ -182,10 +202,17 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">lc_ctype</replaceable></term>
       <listitem>
        <para>
-        Character classification (<literal>LC_CTYPE</literal>) to use in the new
-        database. This affects the categorization of characters, e.g., lower,
-        upper and digit. The default is to use the character classification of
-        the template database. See below for additional restrictions.
+        Sets <literal>LC_CTYPE</literal> in the database server's operating
+        system environment.  The default is the setting of <xref
+        linkend="create-database-locale"/> if specified; otherwise the same
+        setting as the template database.  See below for additional
+        restrictions.
+       </para>
+       <para>
+        If <xref linkend="create-database-locale-provider"/> is
+        <literal>libc</literal>, also sets the default character
+        classification to use in the new database, overriding the setting
+        <xref linkend="create-database-locale"/>.
        </para>
       </listitem>
      </varlistentry>
@@ -194,7 +221,13 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">icu_locale</replaceable></term>
       <listitem>
        <para>
-        Specifies the ICU locale ID if the ICU locale provider is used.
+        Specifies the ICU locale (see <xref
+        linkend="collation-managing-create-icu"/>) for the database default
+        collation order and character classification, overriding the setting
+        <xref linkend="create-database-locale"/>.  The <xref
+        linkend="create-database-locale-provider"/> must be ICU.  The default
+        is the setting of <xref linkend="create-database-locale"/> if
+        specified; otherwise the same setting as the template database.
        </para>
        <para>
         If specified as <literal>C</literal> or <literal>POSIX</literal>, the
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 5352cce744..94937e1d6d 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index d9620e5931..5eabaaec3a 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 3b2ba38250..58316ec19e 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -318,7 +318,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				if (langtag && strcmp(colliculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, colliculocale)));
 
 					colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 5c8caeac2d..3d53e29c7e 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -710,6 +710,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	char	   *dbname = stmt->dbname;
 	char	   *dbowner = NULL;
 	const char *dbtemplate = NULL;
+	char	   *builtin_locale = NULL;
 	char	   *dbcollate = NULL;
 	char	   *dbctype = NULL;
 	char	   *dbiculocale = NULL;
@@ -894,6 +895,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	}
 	if (dlocale && dlocale->arg)
 	{
+		builtin_locale = defGetString(dlocale);
 		dbcollate = defGetString(dlocale);
 		dbctype = defGetString(dlocale);
 	}
@@ -1018,8 +1020,16 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		dbctype = src_ctype;
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
+	if (builtin_locale == NULL && dblocprovider == COLLPROVIDER_BUILTIN &&
+		src_locprovider == COLLPROVIDER_BUILTIN)
+		builtin_locale = "C";
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1033,12 +1043,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1060,11 +1072,38 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		ereport(NOTICE,
 				(errmsg("using locale provider \"builtin\" for ICU locale \"%s\"",
 						dbiculocale)));
+		builtin_locale = dbiculocale;
 		dbiculocale = NULL;
 		dblocprovider = COLLPROVIDER_BUILTIN;
 	}
 
-	if (dblocprovider == COLLPROVIDER_ICU)
+	if (dblocprovider == COLLPROVIDER_BUILTIN)
+	{
+		/* can happen if template is a different provider */
+		if (builtin_locale == NULL)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("LOCALE must be specified")));
+
+		if (dbiculocale)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("ICU locale cannot be specified unless locale provider is ICU")));
+
+		if (dbicurules)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("ICU rules cannot be specified unless locale provider is ICU")));
+
+		if (strcmp(builtin_locale, "C") != 0 &&
+			strcmp(builtin_locale, "POSIX") != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("locale provider \"builtin\" does not support locale \"%s\"",
+							builtin_locale),
+					 errhint("The built-in locale provider only supports the \"C\" and \"POSIX\" locales.")));
+	}
+	else if (dblocprovider == COLLPROVIDER_ICU)
 	{
 		if (!(is_encoding_supported_by_icu(encoding)))
 			ereport(ERROR,
@@ -1079,7 +1118,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		if (!dbiculocale)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("ICU locale must be specified")));
+					 errmsg("LOCALE or ICU_LOCALE must be specified")));
 
 		/*
 		 * During binary upgrade, or when the locale came from the template
@@ -1094,7 +1133,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			if (langtag && strcmp(dbiculocale, langtag) != 0)
 			{
 				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
+						(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 								langtag, dbiculocale)));
 
 				dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 897521cf35..0a9307df2b 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2163,7 +2163,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2375,8 +2379,9 @@ static void
 setlocales(void)
 {
 	char	   *canonname;
+	char	   *builtin_locale = locale;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2392,6 +2397,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	/*
@@ -2407,6 +2414,7 @@ setlocales(void)
 	{
 		pg_log_info("using locale provider \"builtin\" for ICU locale \"%s\"",
 					 icu_locale);
+		builtin_locale = "C";
 		icu_locale = NULL;
 		locale_provider = COLLPROVIDER_BUILTIN;
 	}
@@ -2434,7 +2442,15 @@ setlocales(void)
 	lc_messages = canonname;
 #endif
 
-	if (locale_provider == COLLPROVIDER_ICU)
+	if (locale_provider == COLLPROVIDER_BUILTIN)
+	{
+		if (!builtin_locale)
+			pg_fatal("The parameter --locale must be specified for the built-in provider.");
+		if (strcmp(builtin_locale, "C") != 0 &&
+			strcmp(builtin_locale, "POSIX") != 0)
+			pg_fatal("The built-in locale provider only supports the \"C\" and \"POSIX\" locales.");
+	}
+	else if (locale_provider == COLLPROVIDER_ICU)
 	{
 		char	   *langtag;
 
@@ -2687,7 +2703,9 @@ setup_locale_encoding(void)
 		 * If ctype_enc=SQL_ASCII, it's compatible with any encoding. ICU does
 		 * not support SQL_ASCII, so select UTF-8 instead.
 		 */
-		if (locale_provider == COLLPROVIDER_ICU && ctype_enc == PG_SQL_ASCII)
+		if ((locale_provider == COLLPROVIDER_BUILTIN ||
+			 locale_provider == COLLPROVIDER_ICU)
+			&& ctype_enc == PG_SQL_ASCII)
 			ctype_enc = PG_UTF8;
 
 		if (ctype_enc == -1)
@@ -3294,7 +3312,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index f3f832a413..9b0c2202a3 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -193,11 +193,19 @@ else
 		'locale provider ICU fails since no ICU support');
 }
 
-command_ok(
+command_fails(
 	[
 		'initdb', '--no-sync', '--locale-provider=builtin', "$tempdir/data6"
 	],
-	'locale provider builtin'
+	'locale provider builtin without --locale'
+);
+
+command_fails(
+	[
+		'initdb', '--no-sync', '--locale-provider=builtin', '--locale=en',
+		"$tempdir/data6"
+	],
+	'locale provider builtin with invalid --locale'
 );
 
 command_ok(
@@ -210,32 +218,32 @@ command_ok(
 
 command_ok(
 	[
-		'initdb', '--no-sync', '--locale-provider=builtin', '--lc-collate=C',
-		"$tempdir/data8"
+		'initdb', '--no-sync', '--locale-provider=builtin', '--locale=C',
+		'--lc-collate=C', "$tempdir/data8"
 	],
 	'locale provider builtin with --lc-collate'
 );
 
 command_ok(
 	[
-		'initdb', '--no-sync', '--locale-provider=builtin', '--lc-ctype=C',
-		"$tempdir/data9"
+		'initdb', '--no-sync', '--locale-provider=builtin', '--locale=C',
+		'--lc-ctype=C', "$tempdir/data9"
 	],
 	'locale provider builtin with --lc-ctype'
 );
 
 command_fails(
 	[
-		'initdb', '--no-sync', '--locale-provider=builtin', '--icu-locale=en',
-		"$tempdir/dataX"
+		'initdb', '--no-sync', '--locale-provider=builtin', '--locale=C',
+		'--icu-locale=en', "$tempdir/dataX"
 	],
 	'fails for locale provider builtin with ICU locale'
 );
 
 command_fails(
 	[
-		'initdb', '--no-sync', '--locale-provider=builtin', '--icu-rules=""',
-		"$tempdir/dataX"
+		'initdb', '--no-sync', '--locale-provider=builtin', '--locale=C',
+		'--icu-rules=""', "$tempdir/dataX"
 	],
 	'fails for locale provider builtin with ICU rules'
 );
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c75818c3a4..8dbbd87610 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3080,13 +3080,10 @@ dumpDatabase(Archive *fout)
 		pg_fatal("unrecognized locale provider: %s",
 				 datlocprovider);
 
-	if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
-	{
-		appendPQExpBufferStr(creaQry, " LOCALE = ");
-		appendStringLiteralAH(creaQry, collate, fout);
-	}
-	else
+	if (datlocprovider[0] == 'b')
 	{
+		appendPQExpBufferStr(creaQry, " LOCALE = 'C'");
+
 		if (strlen(collate) > 0)
 		{
 			appendPQExpBufferStr(creaQry, " LC_COLLATE = ");
@@ -3098,15 +3095,51 @@ dumpDatabase(Archive *fout)
 			appendStringLiteralAH(creaQry, ctype, fout);
 		}
 	}
-	if (iculocale)
+	else if (datlocprovider[0] == 'i')
 	{
-		appendPQExpBufferStr(creaQry, " ICU_LOCALE = ");
+		Assert(iculocale);
+
+		appendPQExpBufferStr(creaQry, " LOCALE = ");
 		appendStringLiteralAH(creaQry, iculocale, fout);
+
+		if (strlen(collate) > 0)
+		{
+			appendPQExpBufferStr(creaQry, " LC_COLLATE = ");
+			appendStringLiteralAH(creaQry, collate, fout);
+		}
+		if (strlen(ctype) > 0)
+		{
+			appendPQExpBufferStr(creaQry, " LC_CTYPE = ");
+			appendStringLiteralAH(creaQry, ctype, fout);
+		}
+		if (icurules)
+		{
+			appendPQExpBufferStr(creaQry, " ICU_RULES = ");
+			appendStringLiteralAH(creaQry, icurules, fout);
+		}
 	}
-	if (icurules)
+	else if (datlocprovider[0] == 'c')
 	{
-		appendPQExpBufferStr(creaQry, " ICU_RULES = ");
-		appendStringLiteralAH(creaQry, icurules, fout);
+		Assert(!iculocale);
+
+		if (strlen(collate) > 0 && strcmp(collate, ctype) == 0)
+		{
+			appendPQExpBufferStr(creaQry, " LOCALE = ");
+			appendStringLiteralAH(creaQry, collate, fout);
+		}
+		else
+		{
+			if (strlen(collate) > 0)
+			{
+				appendPQExpBufferStr(creaQry, " LC_COLLATE = ");
+				appendStringLiteralAH(creaQry, collate, fout);
+			}
+			if (strlen(ctype) > 0)
+			{
+				appendPQExpBufferStr(creaQry, " LC_CTYPE = ");
+				appendStringLiteralAH(creaQry, ctype, fout);
+			}
+		}
 	}
 
 	/*
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 41a8de659e..8f8995964c 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
@@ -219,6 +211,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 121ad5112b..3f5719c913 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -115,7 +115,7 @@ else
 $node->command_ok(
 	[
 		'createdb', '-T', 'template0', '--locale-provider=builtin',
-		'tbuiltin1'
+		'--locale=C', 'tbuiltin1'
 	],
 	'create database with provider "builtin"'
 );
@@ -131,7 +131,7 @@ $node->command_ok(
 $node->command_ok(
 	[
 		'createdb', '-T', 'template0', '--locale-provider=builtin',
-		'--lc-collate=C', 'tbuiltin3'
+		'--locale=C', '--lc-collate=C', 'tbuiltin3'
 	],
 	'create database with provider "builtin" and LC_COLLATE=C'
 );
@@ -139,7 +139,7 @@ $node->command_ok(
 $node->command_ok(
 	[
 		'createdb', '-T', 'template0', '--locale-provider=builtin',
-		'--lc-ctype=C', 'tbuiltin4'
+		'--locale=C', '--lc-ctype=C', 'tbuiltin4'
 	],
 	'create database with provider "builtin" and LC_CTYPE=C'
 );
@@ -168,6 +168,22 @@ $node->command_fails(
 	'create database with provider "builtin" not matching template'
 );
 
+$node->command_fails(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'tbuiltin8'
+	],
+	'create database with provider "builtin" and locale unspecified'
+);
+
+$node->command_fails(
+	[
+		'createdb', '-T', 'template0', '--locale-provider=builtin',
+		'--locale=en', 'tbuiltin8'
+	],
+	'create database with provider "builtin" and locale=en'
+);
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
@@ -184,7 +200,7 @@ ALTER TABLE tab_foobar owner to role_foobar;
 CREATE POLICY pol_foobar ON tab_foobar FOR ALL TO role_foobar;');
 $node->issues_sql_like(
 	[ 'createdb', '-l', 'C', '-T', 'foobar2', 'foobar3' ],
-	qr/statement: CREATE DATABASE foobar3 TEMPLATE foobar2/,
+	qr/statement: CREATE DATABASE foobar3 TEMPLATE foobar2 LOCALE 'C'/,
 	'create database with template');
 ($ret, $stdout, $stderr) = $node->psql(
 	'foobar3',
@@ -211,7 +227,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -219,7 +235,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 26f71e1155..6d19488afc 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,17 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
-
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
+# Test that LOCALE='C' works for ICU
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
 );
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 my ($ret, $stdout, $stderr) = $node1->psql('postgres',
 	q{CREATE DATABASE dbicu LOCALE_PROVIDER builtin LOCALE 'C' TEMPLATE dbicu}
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 5ba8f75558..9e49c0d9a7 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1202,9 +1202,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1212,7 +1212,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1229,12 +1229,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1243,7 +1243,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1271,13 +1271,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1327,9 +1327,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

v10-0004-CREATE-COLLATION-default-provider.patchtext/x-patch; charset=UTF-8; name=v10-0004-CREATE-COLLATION-default-provider.patchDownload

From ea72d9f8baae263af0f7ce1b80bf624e0496a571 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 24 May 2023 09:53:02 -0700
Subject: [PATCH v10 4/4] CREATE COLLATION default provider.

If the LC_COLLATE or LC_CTYPE are specified for a new collation, the
default provider is libc. Otherwise, the default provider is the same
as the provider for the database default collation.

Previously, the default provider was always libc.
---
 contrib/citext/expected/create_index_acl.out     |  2 +-
 contrib/citext/sql/create_index_acl.sql          |  2 +-
 doc/src/sgml/ref/create_collation.sgml           | 12 +++++++++---
 src/backend/commands/collationcmds.c             |  7 ++++++-
 src/test/regress/expected/collate.linux.utf8.out | 10 +++++-----
 src/test/regress/sql/collate.linux.utf8.sql      | 10 +++++-----
 6 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/contrib/citext/expected/create_index_acl.out b/contrib/citext/expected/create_index_acl.out
index 33be13a92d..765867d36d 100644
--- a/contrib/citext/expected/create_index_acl.out
+++ b/contrib/citext/expected/create_index_acl.out
@@ -43,7 +43,7 @@ REVOKE ALL ON FUNCTION s.index_row_if FROM PUBLIC;
 -- Even for an empty table, CREATE INDEX checks ii_Predicate permissions.
 GRANT EXECUTE ON FUNCTION s.index_row_if TO regress_minimal;
 -- Non-extension, non-function objects.
-CREATE COLLATION s.coll (LOCALE="C");
+CREATE COLLATION s.coll (PROVIDER=libc, LOCALE="C");
 CREATE TABLE s.x (y s.citext);
 ALTER TABLE s.x OWNER TO regress_minimal;
 -- Empty-table DefineIndex()
diff --git a/contrib/citext/sql/create_index_acl.sql b/contrib/citext/sql/create_index_acl.sql
index 10b5225569..e338ac8799 100644
--- a/contrib/citext/sql/create_index_acl.sql
+++ b/contrib/citext/sql/create_index_acl.sql
@@ -45,7 +45,7 @@ REVOKE ALL ON FUNCTION s.index_row_if FROM PUBLIC;
 -- Even for an empty table, CREATE INDEX checks ii_Predicate permissions.
 GRANT EXECUTE ON FUNCTION s.index_row_if TO regress_minimal;
 -- Non-extension, non-function objects.
-CREATE COLLATION s.coll (LOCALE="C");
+CREATE COLLATION s.coll (PROVIDER=libc, LOCALE="C");
 CREATE TABLE s.x (y s.citext);
 ALTER TABLE s.x OWNER TO regress_minimal;
 -- Empty-table DefineIndex()
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 48b62f77cb..2b25eff44e 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -136,9 +136,15 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
        Specifies the provider to use for locale services associated with this
        collation.  Possible values are <literal>builtin</literal>,
        <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
-       the server was built with ICU support) or <literal>libc</literal>.
-       <literal>libc</literal> is the default.  See <xref
-       linkend="locale-providers"/> for details.
+       the server was built with ICU support) or <literal>libc</literal>. See
+       <xref linkend="locale-providers"/> for details.
+      </para>
+      <para>
+       If <replaceable>lc_colllate</replaceable> or
+       <replaceable>lc_ctype</replaceable> is specified, the default is
+       <literal>libc</literal>; otherwise, the default is the same as the
+       provider for the database default collation (see <xref
+       linkend="sql-createdatabase"/>).
       </para>
       <para>
        If the provider is <literal>icu</literal> and the locale is
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 58316ec19e..f2264000cb 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -229,7 +229,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				collprovider = default_locale.provider;
+		}
 
 		if (localeEl)
 		{
diff --git a/src/test/regress/expected/collate.linux.utf8.out b/src/test/regress/expected/collate.linux.utf8.out
index 01664f7c1b..588198d13e 100644
--- a/src/test/regress/expected/collate.linux.utf8.out
+++ b/src/test/regress/expected/collate.linux.utf8.out
@@ -1026,7 +1026,7 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
@@ -1034,7 +1034,7 @@ CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 ERROR:  collation "test0" already exists
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
 NOTICE:  collation "test0" already exists, skipping
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 NOTICE:  collation "test0" for encoding "UTF8" already exists, skipping
 do $$
 BEGIN
@@ -1046,7 +1046,7 @@ END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
 ERROR:  parameter "lc_ctype" must be specified
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 ERROR:  could not create locale "nonsense": No such file or directory
 DETAIL:  The operating system could not find any locale data for the locale name "nonsense".
 CREATE COLLATION test4 FROM nonsense;
@@ -1166,8 +1166,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 
 -- nondeterministic collations
 -- (not supported with libc provider)
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 ERROR:  nondeterministic collations not supported with this provider
 -- cleanup
 SET client_min_messages TO warning;
diff --git a/src/test/regress/sql/collate.linux.utf8.sql b/src/test/regress/sql/collate.linux.utf8.sql
index 132d13af0a..2d031293d1 100644
--- a/src/test/regress/sql/collate.linux.utf8.sql
+++ b/src/test/regress/sql/collate.linux.utf8.sql
@@ -358,13 +358,13 @@ CREATE SCHEMA test_schema;
 -- We need to do this this way to cope with varying names for encodings:
 do $$
 BEGIN
-  EXECUTE 'CREATE COLLATION test0 (locale = ' ||
+  EXECUTE 'CREATE COLLATION test0 (provider = libc, locale = ' ||
           quote_literal((SELECT datcollate FROM pg_database WHERE datname = current_database())) || ');';
 END
 $$;
 CREATE COLLATION test0 FROM "C"; -- fail, duplicate name
 CREATE COLLATION IF NOT EXISTS test0 FROM "C"; -- ok, skipped
-CREATE COLLATION IF NOT EXISTS test0 (locale = 'foo'); -- ok, skipped
+CREATE COLLATION IF NOT EXISTS test0 (provider = libc, locale = 'foo'); -- ok, skipped
 do $$
 BEGIN
   EXECUTE 'CREATE COLLATION test1 (lc_collate = ' ||
@@ -374,7 +374,7 @@ BEGIN
 END
 $$;
 CREATE COLLATION test3 (lc_collate = 'en_US.utf8'); -- fail, need lc_ctype
-CREATE COLLATION testx (locale = 'nonsense'); -- fail
+CREATE COLLATION testx (provider = libc, locale = 'nonsense'); -- fail
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
@@ -455,8 +455,8 @@ SELECT * FROM collate_test2 ORDER BY b COLLATE UCS_BASIC;
 -- nondeterministic collations
 -- (not supported with libc provider)
 
-CREATE COLLATION ctest_det (locale = 'en_US.utf8', deterministic = true);
-CREATE COLLATION ctest_nondet (locale = 'en_US.utf8', deterministic = false);
+CREATE COLLATION ctest_det (provider = libc, locale = 'en_US.utf8', deterministic = true);
+CREATE COLLATION ctest_nondet (provider = libc, locale = 'en_US.utf8', deterministic = false);
 
 
 -- cleanup
-- 
2.34.1

#108

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#107)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

I implemented a compromise where initdb will
change C.UTF-8 to the built-in provider

This handling of C.UTF-8 would be felt by users as simply broken.

With the v10 patches:

$ initdb --locale=C.UTF-8

initdb: using locale provider "builtin" for ICU locale "C.UTF-8"
The database cluster will be initialized with this locale configuration:
default collation provider: builtin
default collation locale: C
LC_COLLATE: C.UTF-8
LC_CTYPE: C.UTF-8

This setup is not what the user has asked for and leads to that kind of
wrong results:

$ psql -c "select upper('é')"
?column?
----------
é

whereas in v15 we would get the correct result 'É'.

Then once inside that cluster, trying to create a database:

postgres=# create database test locale='C.UTF-8';
ERROR: locale provider "builtin" does not support locale "C.UTF-8"
HINT: The built-in locale provider only supports the "C" and "POSIX"
locales.

That hardly makes sense considering that initdb stated the opposite,
that the "built-in provider" was adequate for C.UTF-8

In general about the evolution of the patchset, your interpretation
of "defaulting to ICU" seems to be "avoid libc at any cost", which IMV
is unreasonably user-hostile.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#109

[1]: /messages/by-id/7de2dc15-5211-45b3-afcb-71dcaf7a08bb@manitou-mail.org
/messages/by-id/7de2dc15-5211-45b3-afcb-71dcaf7a08bb@manitou-mail.org

pgsql@j-davis.com

over 2 years ago

In reply to: Daniel Verite (#108)

Re: Order changes in PG16 since ICU introduction

On Fri, 2023-06-09 at 14:12 +0200, Daniel Verite wrote:

I implemented a compromise where initdb will
change C.UTF-8 to the built-in provider

$ initdb --locale=C.UTF-8

...

This setup is not what the user has asked for and leads to that kind
of
wrong results:

$ psql -c "select upper('é')"
?column?
----------
é

whereas in v15 we would get the correct result 'É'.

I guess where I'm confused is: why would a user actually want their
database collation to be C.UTF-8? It's slower than C, our
implementation doesn't properly version it (as you pointed out), and
the semantics don't seem great ('Z' < 'a').

If the user specifies provider=libc, then of course we should honor
that and C.UTF-8 is a valid locale for libc.

But if they don't specify the provider, isn't it much more likely they
just don't care much about the locale, and would be happier with C?

Perhaps there's some better compromise here than the one I picked, but
I see this as a fairly small problem in comparison to the big problems
that we're solving.

In general about the evolution of the patchset, your interpretation
of "defaulting to ICU" seems to be "avoid libc at any cost", which
IMV
is unreasonably user-hostile.

The user can easily get libc behavior by specifying --locale-
provider=libc, so I don't see how you reached this conclusion.

Let me try to understand and address the points you raised here[1]/messages/by-id/7de2dc15-5211-45b3-afcb-71dcaf7a08bb@manitou-mail.org in
more detail:

It looks like you are fine with 0003 applying LOCALE to whatever
provider is chosen, but you'd like to be smarter about choosing the
provider and to choose libc in at least some cases.

That is actually very much like option #2 in the list I presented
here[2]/messages/by-id/daa9f060aa2349ebc84444515efece49e7b32c5d.camel@j-davis.com, and has the same problems. How should the following behave?

initdb --locale=C --lc-collate=fr_FR.utf8
initdb --locale=en --lc-collate=fr_FR.utf8

If we switch to libc in the first case, then --locale will be ignored
and the collation will be fr_FR.utf8. But we will leave the second case
as ICU and the collation will be "en". I'm sure we can come up with
something there, but it feels like there's more room for confusion
along this path, and the builtin provider seems cleaner.

You also suggested that we consider switching the provider to libc any
time ICU doesn't support something. I'm not sure whether you meant a
static list (C, C.UTF-8, POSIX, ...?) or some kind of dynamic test. I'm
skeptical of being too smart here, but I'd like to hear what you mean.
I'm also not clear whether you think we should abandon the built-in
provider, or still select it for C/POSIX.

Regards,
Jeff Davis

[2]: /messages/by-id/daa9f060aa2349ebc84444515efece49e7b32c5d.camel@j-davis.com
/messages/by-id/daa9f060aa2349ebc84444515efece49e7b32c5d.camel@j-davis.com

#110

daniel@manitou-mail.org

over 2 years ago

In reply to: Jeff Davis (#109)

Re: Order changes in PG16 since ICU introduction

Jeff Davis wrote:

I guess where I'm confused is: why would a user actually want their
database collation to be C.UTF-8? It's slower than C, our
implementation doesn't properly version it (as you pointed out), and
the semantics don't seem great ('Z' < 'a').

Because when LC_CTYPE=C, characters outside of US ASCII are not
categorized properly. upper/lower/regexp matching/... produce
incorrect results.

But if they don't specify the provider, isn't it much more likely they
just don't care much about the locale, and would be happier with C?

Consider a pre-existing script doing initdb --locale=C.UTF-8
Surely it does care about the locale, otherwise it would not specify
it.
Assuming that it would be better off with C is assuming that a
non-Unicode aware locale is better than the Unicode-aware locale
they're asking. I don't think it's reasonable.

The user can easily get libc behavior by specifying --locale-
provider=libc, so I don't see how you reached this conclusion.

What would be user hostile is forcing users that don't need an ICU
locale to change their invocations of initdb/createdb to avoid
regressions with v16. Most people would discover this after
it breaks their apps.

It looks like you are fine with 0003 applying LOCALE to whatever
provider is chosen, but you'd like to be smarter about choosing the
provider and to choose libc in at least some cases.

That is actually very much like option #2 in the list I presented
here[2], and has the same problems. How should the following behave?

initdb --locale=C --lc-collate=fr_FR.utf8
initdb --locale=en --lc-collate=fr_FR.utf8

The same as v15.

If we switch to libc in the first case, then --locale will be ignored
and the collation will be fr_FR.utf8.

$ initdb --locale=C --lc-collate=fr_FR.utf8
v15 does that:

The database cluster will be initialized with this locale configuration:
provider: libc
LC_COLLATE: fr_FR.utf8
LC_CTYPE: C
LC_MESSAGES: C
LC_MONETARY: C
LC_NUMERIC: C
LC_TIME: C
The default database encoding has accordingly been set to "SQL_ASCII".

--locale is not ignored, it's overriden for LC_COLLATE only.

But we will leave the second case as ICU and the collation will be
"en".

Yes. To me the rule for "ICU is the default" in v16 should be: if the
--locale argument points to a locale that we know ICU does not provide,
we fall back to the v15 behavior down to every detail, otherwise we let
ICU be the provider.

You also suggested that we consider switching the provider to libc any
time ICU doesn't support something. I'm not sure whether you meant a
static list (C, C.UTF-8, POSIX, ...?) or some kind of dynamic test.

C, C.*, POSIX. I'm not sure if there are other cases.

I'm also not clear whether you think we should abandon the built-in
provider, or still select it for C/POSIX.

I see it as going in v17, because it came after feature freeze and
is not strictly necessary in v16.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#111

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#107)

Re: Order changes in PG16 since ICU introduction

On 09.06.23 02:36, Jeff Davis wrote:

Patches 0001, 0002:

These patches implement the built-in provider and automatically change
provider=icu to provider=builtin when the locale is C.

I object to adding a new provider for PG16 (patch 0001). This is
clearly a new feature, which wasn't even contemplated a few weeks ago.

* Switch to the libc provider for the C locale: would make the libc
provider even more complicated and had some potential for confusion,
and also has catalog representation problems when --locale is specified
along with --lc-ctype.

I don't follow this concern. This could be done entirely within initdb.
I mean, just change the default for --locale-provider if --locale=C is
given. That's like 10 lines of code in initdb.c.

I don't think I want CREATE DATABASE or CREATE COLLATION to have that
logic, nor do they really need it.

Patch 0003:

Makes LOCALE apply to all providers. The overall feel after this patch
is that "locale" now means the collation locale, and
LC_COLLATE/LC_CTYPE are for the server environment. When using libc,
LC_COLLATE and LC_CTYPE still work as they did before, but their
relationship to database collation feels more like a special case of
the libc provider. I believe most people favor this patch and I haven't
seen recent objections.

This seems reasonable.

1. If you specify --locale-provider=builtin at initdb time, you *must*
specify --locale=C/POSIX, otherwise you get an error.

Shouldn't the --locale option be ignored (or rejected) in that case.
Why insist on it being specified?

2. Patch 0004 is possibly out of scope for 16, but it felt consistent
with the other UI changes and low risk. Please try with/without before
objecting.

Also clearly a new feature. Also the implications of various upgrade,
dump/restore scenarios are not fully explored.

I think it's an interesting idea, to make CREATE DATABASE and CREATE
COLLATION also default to icu of the respective higher scope has been
set to icu. In fact, this makes me wonder now whether changing the
default to icu in *only* initdb is sensible. But again, we'd need to
see the full matrix of upgrade scenarios laid out here.

3. Daniel Verite felt that we should only change the provider from ICU
to "builtin" for the C locale if the provider is defaulting to ICU; not
if it's specified as ICU.

Correct, we shouldn't override what was explicitly specified.

I did not differentiate between specifying
ICU and defaulting to ICU because:
a. "libc" unconditionally uses the built-in memcmp() logic for C, it
never actually uses libc
b. If a user really wants the root locale or the en-US-u-va-posix
locale, they can specify those directly
c. I don't see any plausible case where it helps a user to keep
provider=icu when locale=C.

If the user specifies that, that's up to them to deal with the outcomes.
Just changing it to something different seems wrong.

4. Joe Conway and Peter Eisentraut both felt that C.UTF-8 with
provider=icu should not be changed to use the builtin provider, and
instead passed on to ICU. I implemented a compromise where initdb will
change C.UTF-8 to the built-in provider; but CREATE DATABASE/COLLATION
will pass it along to ICU (which may support it as en-US-u-va-posix in
some versions, or may throw an error in other versions). My reasoning
is that initdb is pulling from the environment, and we should try
harder to succeed on any reasonable environmental settings (otherwise
initdb with default settings could fail); whereas we can be more strict
with CREATE DATABASE/COLLATION.

I'm not objecting to changing anything about C.UTF-8, but I am objecting
to changing anything substantial in PG16.

5. For the built-in provider, initdb defaults to UTF-8 rather than
SQL_ASCII. Otherwise, you would be unable to use ICU at all later,
because ICU doesn't support SQL_ASCII.

Also a behavior change that is not appropriate for PG16 at this stage.

#112

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#111)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote:

Patch 0003:

Makes LOCALE apply to all providers. The overall feel after this
patch
is that "locale" now means the collation locale, and
LC_COLLATE/LC_CTYPE are for the server environment. When using
libc,
LC_COLLATE and LC_CTYPE still work as they did before, but their
relationship to database collation feels more like a special case
of
the libc provider. I believe most people favor this patch and I
haven't
seen recent objections.

This seems reasonable.

Attached a clean patch for this.

It seems to have widespread agreement so I plan to commit to v16 soon.

To clarify, this affects both initdb and CREATE DATABASE.

Regards,
Jeff Davis

Attachments:

v11-0001-CREATE-DATABASE-make-LOCALE-apply-to-all-collati.patchtext/x-patch; charset=UTF-8; name=v11-0001-CREATE-DATABASE-make-LOCALE-apply-to-all-collati.patchDownload

From 4c474b5f0af86611480143dcf89918f9329ba512 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v11] CREATE DATABASE: make LOCALE apply to all collation
 providers.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE. That could
lead to confusion when the provider is ICU.

Discussion: https://postgr.es/m/3391932.1682107209@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_collation.sgml        | 23 +++++---
 doc/src/sgml/ref/create_database.sgml         | 57 +++++++++++++++----
 doc/src/sgml/ref/createdb.sgml                |  5 +-
 doc/src/sgml/ref/initdb.sgml                  |  7 ++-
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 17 ++++--
 src/bin/initdb/initdb.c                       | 10 +++-
 src/bin/initdb/t/001_initdb.pl                | 11 ++++
 src/bin/scripts/createdb.c                    | 13 ++---
 src/bin/scripts/t/020_createdb.pl             | 15 ++++-
 src/test/icu/t/010_database.pl                | 30 +++++++---
 .../regress/expected/collate.icu.utf8.out     | 22 +++----
 12 files changed, 152 insertions(+), 60 deletions(-)

diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..b86a9bbb9c 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -85,9 +85,16 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
 
      <listitem>
       <para>
-       This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-       and <symbol>LC_CTYPE</symbol> at once.  If you specify this,
-       you cannot specify either of those parameters.
+       The locale name for this collation. See <xref
+       linkend="collation-managing-create-libc"/> and <xref
+       linkend="collation-managing-create-icu"/> for details.
+      </para>
+      <para>
+       If <replaceable>provider</replaceable> is <literal>libc</literal>, this
+       is a shortcut for setting <symbol>LC_COLLATE</symbol> and
+       <symbol>LC_CTYPE</symbol> at once.  If you specify
+       <replaceable>locale</replaceable>, you cannot specify either of those
+       parameters.
       </para>
      </listitem>
     </varlistentry>
@@ -97,8 +104,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
 
      <listitem>
       <para>
-       Use the specified operating system locale for
-       the <symbol>LC_COLLATE</symbol> locale category.
+       If <replaceable>provider</replaceable> is <literal>libc</literal>, use
+       the specified operating system locale for the
+       <symbol>LC_COLLATE</symbol> locale category.
       </para>
      </listitem>
     </varlistentry>
@@ -108,8 +116,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
 
      <listitem>
       <para>
-       Use the specified operating system locale for
-       the <symbol>LC_CTYPE</symbol> locale category.
+       If <replaceable>provider</replaceable> is <literal>libc</literal>, use
+       the specified operating system locale for the <symbol>LC_CTYPE</symbol>
+       locale category.
       </para>
      </listitem>
     </varlistentry>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..dab05950ed 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,22 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        Sets the default collation order and character classification in the
+        new database.  Collation affects the sort order applied to strings,
+        e.g., in queries with ORDER BY, as well as the order used in indexes
+        on text columns.  Character classification affects the categorization
+        of characters, e.g., lower, - upper and digit.  Also sets the
+        associated aspects of the operating system environment,
+        <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal>.  The
+        default is the same setting as the template database.  See <xref
+        linkend="collation-managing-create-libc"/> and <xref
+        linkend="collation-managing-create-icu"/> for details.
+       </para>
+       <para>
+        Can be overridden by setting <xref
+        linkend="create-database-lc-collate"/>, <xref
+        linkend="create-database-lc-ctype"/>, or <xref
+        linkend="create-database-icu-locale"/> individually.
        </para>
        <tip>
         <para>
@@ -164,11 +178,17 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">lc_collate</replaceable></term>
       <listitem>
        <para>
-        Collation order (<literal>LC_COLLATE</literal>) to use in the new database.
-        This affects the sort order applied to strings, e.g., in queries with
-        ORDER BY, as well as the order used in indexes on text columns.
-        The default is to use the collation order of the template database.
-        See below for additional restrictions.
+        Sets <literal>LC_COLLATE</literal> in the database server's operating
+        system environment.  The default is the setting of <xref
+        linkend="create-database-locale"/> if specified; otherwise the same
+        setting as the template database.  See below for additional
+        restrictions.
+       </para>
+       <para>
+        If <xref linkend="create-database-locale-provider"/> is
+        <literal>libc</literal>, also sets the default collation order to use
+        in the new database, overriding the setting <xref
+        linkend="create-database-locale"/>.
        </para>
       </listitem>
      </varlistentry>
@@ -176,10 +196,17 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">lc_ctype</replaceable></term>
       <listitem>
        <para>
-        Character classification (<literal>LC_CTYPE</literal>) to use in the new
-        database. This affects the categorization of characters, e.g., lower,
-        upper and digit. The default is to use the character classification of
-        the template database. See below for additional restrictions.
+        Sets <literal>LC_CTYPE</literal> in the database server's operating
+        system environment.  The default is the setting of <xref
+        linkend="create-database-locale"/> if specified; otherwise the same
+        setting as the template database.  See below for additional
+        restrictions.
+       </para>
+       <para>
+        If <xref linkend="create-database-locale-provider"/> is
+        <literal>libc</literal>, also sets the default character
+        classification to use in the new database, overriding the setting
+        <xref linkend="create-database-locale"/>.
        </para>
       </listitem>
      </varlistentry>
@@ -188,7 +215,13 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">icu_locale</replaceable></term>
       <listitem>
        <para>
-        Specifies the ICU locale ID if the ICU locale provider is used.
+        Specifies the ICU locale (see <xref
+        linkend="collation-managing-create-icu"/>) for the database default
+        collation order and character classification, overriding the setting
+        <xref linkend="create-database-locale"/>.  The <xref
+        linkend="create-database-locale-provider"/> must be ICU.  The default
+        is the setting of <xref linkend="create-database-locale"/> if
+        specified; otherwise the same setting as the template database.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..e4647d5ce7 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..f850dc404d 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..efb8b4d289 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -276,7 +276,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				if (langtag && strcmp(colliculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, colliculocale)));
 
 					colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 99d4080ea9..09f1ab41ad 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1017,7 +1017,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1031,12 +1036,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1056,7 +1063,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 		if (!dbiculocale)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("ICU locale must be specified")));
+					 errmsg("LOCALE or ICU_LOCALE must be specified")));
 
 		/*
 		 * During binary upgrade, or when the locale came from the template
@@ -1071,7 +1078,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			if (langtag && strcmp(dbiculocale, langtag) != 0)
 			{
 				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
+						(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 								langtag, dbiculocale)));
 
 				dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 09a5c98cc0..71a3d26c37 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2163,7 +2163,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2376,7 +2380,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale)
 	{
@@ -2392,6 +2396,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	/*
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index fa00bb3dab..cf55a84cd1 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -111,6 +111,17 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'option --icu-locale');
 
+	command_like(
+		[
+			'initdb', '--no-sync', '-A', 'trust',
+			'--locale-provider=icu', '--locale=und',
+			'--lc-collate=C', '--lc-ctype=C', '--lc-messages=C',
+			'--lc-numeric=C', '--lc-monetary=C', '--lc-time=C',
+			"$tempdir/data4"
+		],
+		qr/^\s+ICU locale:\s+und\n/ms,
+		'options --locale-provider=icu --locale=und --lc-*=C');
+
 	command_fails_like(
 		[
 			'initdb', '--no-sync',
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..9ca86a3e53 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
@@ -219,6 +211,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index d0830a4a1d..694ec56804 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -86,6 +86,15 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'create database with icu locale from template database with icu provider'
 	);
+
+	$node2->command_ok(
+		[
+			'createdb', '-T', 'template0', '--locale-provider', 'icu',
+			'--locale', 'en', '--lc-collate', 'C', '--lc-ctype', 'C',
+			'foobar57'
+		],
+		'create database with locale as ICU locale'
+	);
 }
 else
 {
@@ -110,7 +119,7 @@ ALTER TABLE tab_foobar owner to role_foobar;
 CREATE POLICY pol_foobar ON tab_foobar FOR ALL TO role_foobar;');
 $node->issues_sql_like(
 	[ 'createdb', '-l', 'C', '-T', 'foobar2', 'foobar3' ],
-	qr/statement: CREATE DATABASE foobar3 TEMPLATE foobar2/,
+	qr/statement: CREATE DATABASE foobar3 TEMPLATE foobar2 LOCALE 'C'/,
 	'create database with template');
 ($ret, $stdout, $stderr) = $node->psql(
 	'foobar3',
@@ -137,7 +146,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -145,7 +154,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index d3901f5d3f..b417b1a409 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,17 +51,33 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
+);
+is($ret1, 0,
+	"C locale works for ICU");
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
+# Test that LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE
+# are specified
+my $ret2 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE '@colStrength=primary'
+      LC_COLLATE='C' LC_CTYPE='C' TEMPLATE template0 ENCODING UTF8}
 );
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+is($ret2, 0,
+	"LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE are specified");
+
+# Test that ICU-specific LOCALE without LC_COLLATE and LC_CTYPE must
+# be specified with ICU_LOCALE
+my ($ret3, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary'
+      TEMPLATE template0 ENCODING UTF8});
+isnt($ret3, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 00dee24549..dc96e590f7 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1194,9 +1194,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1204,7 +1204,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1221,12 +1221,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1235,7 +1235,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1263,13 +1263,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1319,9 +1319,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

#113

pgsql@j-davis.com

over 2 years ago

In reply to: Peter Eisentraut (#111)

Re: Order changes in PG16 since ICU introduction

On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote:

I object to adding a new provider for PG16 (patch 0001).

Added to July CF for 17.

2. Patch 0004 is possibly out of scope for 16

Also clearly a new feature.

Added to July CF for 17.

Regards,
Jeff Davis

#114

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Jeff Davis (#112)

1 attachment(s)

Re: Order changes in PG16 since ICU introduction

On 14.06.23 23:24, Jeff Davis wrote:

On Mon, 2023-06-12 at 23:04 +0200, Peter Eisentraut wrote:

Patch 0003:

Makes LOCALE apply to all providers. The overall feel after this
patch
is that "locale" now means the collation locale, and
LC_COLLATE/LC_CTYPE are for the server environment. When using
libc,
LC_COLLATE and LC_CTYPE still work as they did before, but their
relationship to database collation feels more like a special case
of
the libc provider. I believe most people favor this patch and I
haven't
seen recent objections.

This seems reasonable.

Attached a clean patch for this.

It seems to have widespread agreement so I plan to commit to v16 soon.

To clarify, this affects both initdb and CREATE DATABASE.

This looks good to me.

Attached is small fixup patch with some documentation tweaks and
simplifying some test code (also includes pgperltidy).

Attachments:

0001-fixup-CREATE-DATABASE-make-LOCALE-apply-to-all-colla.patchtext/plain; charset=UTF-8; name=0001-fixup-CREATE-DATABASE-make-LOCALE-apply-to-all-colla.patchDownload

From 0cd2154f364999091aba52136a139df75f58d1b7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 16 Jun 2023 16:46:36 +0200
Subject: [PATCH] fixup! CREATE DATABASE: make LOCALE apply to all collation
 providers.

---
 doc/src/sgml/ref/create_database.sgml | 12 ++++++------
 src/test/icu/t/010_database.pl        | 23 +++++++++++++----------
 2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index dab05950ed..b2c8aef1ad 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -147,9 +147,9 @@ <title>Parameters</title>
        <para>
         Sets the default collation order and character classification in the
         new database.  Collation affects the sort order applied to strings,
-        e.g., in queries with ORDER BY, as well as the order used in indexes
+        e.g., in queries with <literal>ORDER BY</literal>, as well as the order used in indexes
         on text columns.  Character classification affects the categorization
-        of characters, e.g., lower, - upper and digit.  Also sets the
+        of characters, e.g., lower, upper, and digit.  Also sets the
         associated aspects of the operating system environment,
         <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal>.  The
         default is the same setting as the template database.  See <xref
@@ -180,7 +180,7 @@ <title>Parameters</title>
        <para>
         Sets <literal>LC_COLLATE</literal> in the database server's operating
         system environment.  The default is the setting of <xref
-        linkend="create-database-locale"/> if specified; otherwise the same
+        linkend="create-database-locale"/> if specified, otherwise the same
         setting as the template database.  See below for additional
         restrictions.
        </para>
@@ -198,7 +198,7 @@ <title>Parameters</title>
        <para>
         Sets <literal>LC_CTYPE</literal> in the database server's operating
         system environment.  The default is the setting of <xref
-        linkend="create-database-locale"/> if specified; otherwise the same
+        linkend="create-database-locale"/> if specified, otherwise the same
         setting as the template database.  See below for additional
         restrictions.
        </para>
@@ -218,8 +218,8 @@ <title>Parameters</title>
         Specifies the ICU locale (see <xref
         linkend="collation-managing-create-icu"/>) for the database default
         collation order and character classification, overriding the setting
-        <xref linkend="create-database-locale"/>.  The <xref
-        linkend="create-database-locale-provider"/> must be ICU.  The default
+        <xref linkend="create-database-locale"/>.  The <link
+        linkend="create-database-locale-provider">locale provider</link> must be ICU.  The default
         is the setting of <xref linkend="create-database-locale"/> if
         specified; otherwise the same setting as the template database.
        </para>
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index b417b1a409..cbe5467f3c 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -52,27 +52,30 @@
 
 
 # Test that LOCALE='C' works for ICU
-my $ret1 = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
-);
-is($ret1, 0,
+is( $node1->psql(
+		'postgres',
+		q{CREATE DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8}
+	),
+	0,
 	"C locale works for ICU");
 
 # Test that LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE
 # are specified
-my $ret2 = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE '@colStrength=primary'
+is( $node1->psql(
+		'postgres',
+		q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE '@colStrength=primary'
       LC_COLLATE='C' LC_CTYPE='C' TEMPLATE template0 ENCODING UTF8}
-);
-is($ret2, 0,
+	),
+	0,
 	"LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE are specified");
 
 # Test that ICU-specific LOCALE without LC_COLLATE and LC_CTYPE must
 # be specified with ICU_LOCALE
-my ($ret3, $stdout, $stderr) = $node1->psql('postgres',
+my ($ret, $stdout, $stderr) = $node1->psql(
+	'postgres',
 	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary'
       TEMPLATE template0 ENCODING UTF8});
-isnt($ret3, 0,
+isnt($ret, 0,
 	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-- 
2.41.0

#115