pgsql: Handle the "und" locale in ICU versions 54 and older.

Started by Jeff Davisover 3 years ago3 messagescomitters
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

Handle the "und" locale in ICU versions 54 and older.

The "und" locale is an alternative spelling of the root locale, but it
was not recognized until ICU 55. To maintain common behavior across
all supported ICU versions, check for "und" and replace with "root"
before opening.

Previously, the lack of support for "und" was dangerous, because
versions 54 and older fall back to the environment when a locale is
not found. If the user specified "und" for the language (which is
expected and documented), it could not only resolve to the wrong
collator, but it could unexpectedly change (which could lead to
corrupt indexes).

This effectively reverts commit d72900bded, which worked around the
problem for the built-in "unicode" collation, and is no longer
necessary.

Discussion: /messages/by-id/60da0cecfb512a78b8666b31631a636215d8ce73.camel@j-davis.com
Discussion: /messages/by-id/0c6fa66f2753217d2a40480a96bd2ccf023536a1.camel@j-davis.com
Reviewed-by: Peter Eisentraut

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/3b50275b12950280fb07193e24a4f400ed8a9fef

Modified Files
--------------
src/backend/utils/adt/pg_locale.c | 34 ++++++++++++++++++++++++++
src/bin/initdb/initdb.c | 2 +-
src/test/regress/expected/collate.icu.utf8.out | 7 ++++++
src/test/regress/sql/collate.icu.utf8.sql | 2 ++
4 files changed, 44 insertions(+), 1 deletion(-)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#1)
Re: pgsql: Handle the "und" locale in ICU versions 54 and older.

Jeff Davis <jdavis@postgresql.org> writes:

Handle the "und" locale in ICU versions 54 and older.
The "und" locale is an alternative spelling of the root locale, but it
was not recognized until ICU 55. To maintain common behavior across
all supported ICU versions, check for "und" and replace with "root"
before opening.

Previously, the lack of support for "und" was dangerous, because
versions 54 and older fall back to the environment when a locale is
not found. If the user specified "und" for the language (which is
expected and documented), it could not only resolve to the wrong
collator, but it could unexpectedly change (which could lead to
corrupt indexes).

Hmm, should we back-patch this? Seems like existing branches would
be even more at risk than v16, because more likely to be built with
old ICU. OTOH, we do also run the risk of breaking installations
that weren't broken before.

regards, tom lane

#3Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#2)
Re: pgsql: Handle the "und" locale in ICU versions 54 and older.

On Thu, 2023-03-23 at 13:25 -0400, Tom Lane wrote:

Hmm, should we back-patch this?  Seems like existing branches would
be even more at risk than v16, because more likely to be built with
old ICU.  OTOH, we do also run the risk of breaking installations
that weren't broken before.

I wondered the same thing[1]/messages/by-id/9afa6dbe0d31053ad265aeba488fde784fd5b7ab.camel@j-davis.com but ultimately figured the risk outweighed
the reward. My reasoning (which I didn't post before) was:

If a user currently has a collation with locale 'und', and ICU <= 54,
they are getting their actual locale from the environment. If we
backpatch, it will silently change their locale to be the root locale,
which could be different, and break their indexes. That seems too
dangerous for a minor release.

For a major release it's more tolerable to put something like that in
the release notes.

Fortunately, I think most users now are probably using the built-in
collations, or using the empty string before an "@" to specify the root
locale, which works in all ICU versions. Users would only specify it as
"und" if they happen to know about language tags.

Regards,
Jeff Davis

[1]: /messages/by-id/9afa6dbe0d31053ad265aeba488fde784fd5b7ab.camel@j-davis.com
/messages/by-id/9afa6dbe0d31053ad265aeba488fde784fd5b7ab.camel@j-davis.com