PostgreSQL 10 (latest beta) and older ICU

Started by Victor Wagnerover 8 years ago10 messages
#1Victor Wagner
vitus@wagner.pp.ru

Collegues,

PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
(because it searches for libicu using pkgconfig, and pgkconfig support
significantly changed in ICU version 4.6)

However, there are some reasons to build PostgreSQL with older versions
of ICU.

For instance such Linux distributions as RHEL 6 ships ICU 4.2, and SLES
11 also ships older ICU. Sometimes, it is not feasible to compile newer
ICU from sources on the servers with these (and derived) distributions.

It is possible to compile PostgreSQL 10 with these versions of ICU
by specifying ICU_CFLAGS and ICU_LIBS explicitely.

However, backend startup fails with errors on some Spanish and
Singalese locales.

It turns out, that PostgreSQL enumerates collations for all ICU locales
and passes it into uloc_toLanguageTag function with strict argument of
this function set to TRUE. But for some locales
(es*@collation=tradtiional, si*@collation=dictionary) only call with
strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
can be resolved with strict=TRUE.

ICU docs state that if strict=FALSE, some locale fields can be
incomplete.

What solition is better:

1. Just skip locale/collation combinations which cannot be resolved
with strict=TRUE and issue warning instead of error if
uloc_toLanguageTag fails on some locale.

It would cause some ICU collations be missiong from the databases
running on the systems with old ICU.

2. If we see ICU version earlier than 4.6, pass strict=FALSE to
ucol_toLanguageTag. It would cause databases on such systems use
fallback collation sequence for these collations.

Personally I prefer first solition, because I doubt that any of my
users would ever need Singalese collation, and probably they can
survive without traditional Spanish too.

--
Victor Wagner <vitus@wagner.pp.ru>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Victor Wagner (#1)
Re: PostgreSQL 10 (latest beta) and older ICU

Victor Wagner <vitus@wagner.pp.ru> writes:

PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
(because it searches for libicu using pkgconfig, and pgkconfig support
significantly changed in ICU version 4.6)

However, there are some reasons to build PostgreSQL with older versions
of ICU.

No doubt, but making that happen seems like a feature, and IMO we're too
late in the v10 beta test cycle to be taking new features on board,
especially ones with inherently-large portability risks. We could maybe
consider patches for this for v11 ... but will anyone still care by then?

(Concretely, my concern is that the alpha and beta testing that's happened
so far hasn't covered pre-4.6 ICU at all. I'd be surprised if the
incompatibility you found so far is the only one.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Victor Wagner
vitus@wagner.pp.ru
In reply to: Tom Lane (#2)
Re: PostgreSQL 10 (latest beta) and older ICU

On Tue, 25 Jul 2017 16:50:38 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Victor Wagner <vitus@wagner.pp.ru> writes:

PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
(because it searches for libicu using pkgconfig, and pgkconfig
support significantly changed in ICU version 4.6)

However, there are some reasons to build PostgreSQL with older
versions of ICU.

No doubt, but making that happen seems like a feature, and IMO we're
too late in the v10 beta test cycle to be taking new features on

May be PGDG people could integrate it as a patch for RPMS for
particular systems which are affected by the problem.

I'd rather say that it is bugfix. Relaxing too strict checking.
If we choose approach to allow non-strict language tags, it is oneline
patch.

If we choose more safe approach to ignore non-strict collations, it
would take about five lines
1. Replace one ereport(ERROR with ereport(WARINING
2. Return special value (NULL) after issuing this warning
3. Handle this special value in calling function by continuing to next
loop iteration
4,5 - curly braces around 1 and 2. see patch at the end

board, especially ones with inherently-large portability risks. We

I don't think that there are so large portability risks. Patch can be
done such way that it would behave exactly as now if ICU version is 4.6
or above. Moreover, it is easy to make old ICU support separate
configure option (because another configure test is needed anyway).

could maybe consider patches for this for v11 ... but will anyone
still care by then?

Undoubtedly will. v11 is expedted to be released in 2018. And RHEL 6
is supported until 2020, and extended support would end in Nov 2023.

And it is possible that some derived distribution would last longer.

(Concretely, my concern is that the alpha and beta testing that's
happened so far hasn't covered pre-4.6 ICU at all. I'd be surprised
if the incompatibility you found so far is the only one.)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index b6c14c9..9e5da98 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -442,11 +442,13 @@ get_icu_language_tag(const char *localename)
        status = U_ZERO_ERROR;
        uloc_toLanguageTag(localename, buf, sizeof(buf), TRUE, &status);
-       if (U_FAILURE(status))
-               ereport(ERROR,
+       if (U_FAILURE(status)) 
+       {
+               ereport(WARNING,
                                (errmsg("could not convert locale name \"%s\" to language tag: %s",
                                                localename, u_errorName(status))));
-
+               return NULL;
+       }
        return pstrdup(buf);
 }

@@ -733,6 +735,10 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
char *localeid = psprintf("%s@collation=%s", name, val);

                                langtag = get_icu_language_tag(localeid);
+                               if (langtag == NULL) {
+                                       /* No such collation in library, skip */
+                                       continue;
+                               }
                                collcollate = U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : localeid;

/*

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Victor Wagner (#1)
Re: PostgreSQL 10 (latest beta) and older ICU

On 7/25/17 15:20, Victor Wagner wrote:

It turns out, that PostgreSQL enumerates collations for all ICU locales
and passes it into uloc_toLanguageTag function with strict argument of
this function set to TRUE. But for some locales
(es*@collation=tradtiional, si*@collation=dictionary) only call with
strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
can be resolved with strict=TRUE.

We are only calling uloc_toLanguageTag() with keyword/value combinations
that ICU itself previously told us were supported. So just ignoring
errors doesn't seem proper in this case.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Victor Wagner
vitus@wagner.pp.ru
In reply to: Peter Eisentraut (#4)
Re: PostgreSQL 10 (latest beta) and older ICU

On Mon, 31 Jul 2017 19:42:30 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 7/25/17 15:20, Victor Wagner wrote:

It turns out, that PostgreSQL enumerates collations for all ICU
locales and passes it into uloc_toLanguageTag function with strict
argument of this function set to TRUE. But for some locales
(es*@collation=tradtiional, si*@collation=dictionary) only call with
strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all
locales can be resolved with strict=TRUE.

We are only calling uloc_toLanguageTag() with keyword/value
combinations that ICU itself previously told us were supported. So
just ignoring errors doesn't seem proper in this case.

We know that this version of ICU is broken. But what choice we have?
Locale code in the glibc is much more broken.
So we just have to work around bugs in underlaying libraries anyway.
In case of ICU we just need to omit some collations.

It might cause incompatibility with newer systems where these
collations are used, but if we fall back to glibc locale, there would
be much more incompatibilites. And also there is bug in strxfrm, which
prevents use of abbreviated keys.

--

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Victor Wagner (#5)
Re: PostgreSQL 10 (latest beta) and older ICU

On 8/1/17 02:12, Victor Wagner wrote:

We are only calling uloc_toLanguageTag() with keyword/value
combinations that ICU itself previously told us were supported. So
just ignoring errors doesn't seem proper in this case.

We know that this version of ICU is broken. But what choice we have?

I don't know that we can already reach that conclusion. Maybe the APIs
have changed or we are not using them correctly. Are there any bug
reports or changelog entries related to this? I don't think we should
just give up and ignore errors.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Victor Wagner
vitus@wagner.pp.ru
In reply to: Peter Eisentraut (#6)
Re: PostgreSQL 10 (latest beta) and older ICU

On Tue, 1 Aug 2017 08:16:54 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 8/1/17 02:12, Victor Wagner wrote:

We are only calling uloc_toLanguageTag() with keyword/value
combinations that ICU itself previously told us were supported. So
just ignoring errors doesn't seem proper in this case.

We know that this version of ICU is broken. But what choice we
have?

I don't know that we can already reach that conclusion. Maybe the

Because it was fixed in subsequent versions.

And 4.2 is first version where this function appeared.
So, we still have problems with SLES11 which use even older version and
have to be supported at least two more years.

APIs have changed or we are not using them correctly. Are there any
bug reports or changelog entries related to this? I don't think we
should just give up and ignore errors.

We also can provide configuration option or command-line option for
initdb which would restrict list of languages for which collation
sequences would be created. This would help for all but two languages.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Victor Wagner (#7)
Re: PostgreSQL 10 (latest beta) and older ICU

Victor Wagner <vitus@wagner.pp.ru> writes:

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

I don't know that we can already reach that conclusion. Maybe the

Because it was fixed in subsequent versions.

And 4.2 is first version where this function appeared.
So, we still have problems with SLES11 which use even older version and
have to be supported at least two more years.

I think the answer is very clear: we do not need to support old broken
versions of ICU, especially not in our first release. We'll have enough
to do making the code stable and performant with good versions of ICU.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Victor Wagner (#7)
1 attachment(s)
Re: PostgreSQL 10 (latest beta) and older ICU

On 8/1/17 08:28, Victor Wagner wrote:

On Tue, 1 Aug 2017 08:16:54 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 8/1/17 02:12, Victor Wagner wrote:

We are only calling uloc_toLanguageTag() with keyword/value
combinations that ICU itself previously told us were supported. So
just ignoring errors doesn't seem proper in this case.

We know that this version of ICU is broken. But what choice we
have?

I don't know that we can already reach that conclusion. Maybe the

Because it was fixed in subsequent versions.

I propose the attached patch to address this.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Add-support-for-ICU-4.2.patchtext/plain; charset=UTF-8; name=0001-Add-support-for-ICU-4.2.patch; x-mac-creator=0; x-mac-type=0Download
From a9c89b7b1b98f0a0e3dc6d7571dfc7e7f146ed8d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Tue, 1 Aug 2017 10:49:55 -0400
Subject: [PATCH] Add support for ICU 4.2

Supporting ICU 4.2 seems useful because it ships with CentOS 6.

Versions before ICU 4.6 don't support pkg-config, so document an
installation method without using pkg-config.

In ICU 4.2, ucol_getKeywordsForLocale() sometimes returns values that
will not be accepted by uloc_toLanguageTag().  Skip loading keyword
variants in that version.

Reported-by: Victor Wagner <vitus@wagner.pp.ru>
---
 doc/src/sgml/installation.sgml       | 22 +++++++++++++++++++---
 src/backend/commands/collationcmds.c | 11 +++++++++++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index fa0d05efe6..12866b4bf7 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -774,10 +774,26 @@ <title>Configuration</title>
          Build with support for
          the <productname>ICU</productname><indexterm><primary>ICU</></>
          library.  This requires the <productname>ICU4C</productname> package
-         as well
-         as <productname>pkg-config</productname><indexterm><primary>pkg-config</></>
          to be installed.  The minimum required version
-         of <productname>ICU4C</productname> is currently 4.6.
+         of <productname>ICU4C</productname> is currently 4.2.
+        </para>
+
+        <para>
+         By default,
+         <productname>pkg-config</productname><indexterm><primary>pkg-config</></>
+         will be used to find the required compilation options.  This is
+         supported for <productname>ICU4C</productname> version 4.6 and later.
+         For older versions, or if <productname>pkg-config</productname> is
+         not available, the variables <envar>ICU_CFLAGS</envar>
+         and <envar>ICU_LIBS</envar> can be specified
+         to <filename>configure</filename>, like in this example:
+<programlisting>
+./configure ... --with-icu ICU_CFLAGS='-I/some/where/include' ICU_LIBS='-L/some/where/lib -licui18n -licuuc -licudata'
+</programlisting>
+         (If <productname>ICU4C</productname> is in the default search path
+         for the compiler, then you still need to specify a nonempty string in
+         order to avoid use of <productname>pkg-config</productname>, for
+         example, <literal>ICU_CFLAGS=' '</literal>.)
         </para>
        </listitem>
       </varlistentry>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index b6c14c920d..a7b5871466 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -718,7 +718,17 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 
 			/*
 			 * Add keyword variants
+			 *
+			 * In ICU 4.2, ucol_getKeywordsForLocale() sometimes returns
+			 * values that will not be accepted by uloc_toLanguageTag().  Skip
+			 * loading keyword variants in that version.  (Both
+			 * ucol_getKeywordValuesForLocale() and uloc_toLanguageTag() are
+			 * new in ICU 4.2, so older versions are not supported at all.)
+			 *
+			 * XXX We have no information about ICU 4.3 through 4.7, but we
+			 * know the below works with 4.8.
 			 */
+#if U_ICU_VERSION_MAJOR_NUM > 4 || (U_ICU_VERSION_MAJOR_NUM == 4 && U_ICU_VERSION_MINOR_NUM > 2)
 			status = U_ZERO_ERROR;
 			en = ucol_getKeywordValuesForLocale("collation", name, TRUE, &status);
 			if (U_FAILURE(status))
@@ -765,6 +775,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 						(errmsg("could not get keyword values for locale \"%s\": %s",
 								name, u_errorName(status))));
 			uenum_close(en);
+#endif							/* ICU >4.2 */
 		}
 	}
 #endif							/* USE_ICU */
-- 
2.13.3

#10Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Peter Eisentraut (#9)
Re: PostgreSQL 10 (latest beta) and older ICU

On 8/1/17 11:28, Peter Eisentraut wrote:

On 8/1/17 08:28, Victor Wagner wrote:

On Tue, 1 Aug 2017 08:16:54 -0400
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 8/1/17 02:12, Victor Wagner wrote:

We are only calling uloc_toLanguageTag() with keyword/value
combinations that ICU itself previously told us were supported. So
just ignoring errors doesn't seem proper in this case.

We know that this version of ICU is broken. But what choice we
have?

I don't know that we can already reach that conclusion. Maybe the

Because it was fixed in subsequent versions.

I propose the attached patch to address this.

committed

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers