Solaris versus our NLS files

Started by Tom Lane5 months ago16 messageshackers

tgl@sss.pgh.pa.us

5 months ago

I idly tried the NLS-testing patch at [1]/messages/by-id/247596.1765300108@sss.pgh.pa.us on a Solaris image
(actually OpenIndiana), and was not really astonished to find that
it fails. Everything compiles cleanly, but the test shows that no
message translation happens, and manual checking confirms that.

After some quality time with Google, I learned why: with Solaris's
apparently-locally-hacked version of gettext, it's not good enough
to have $INSTALLATION/share/locale/ subdirectories named like
"es", "fr", etc. They have to be named after the
fully-spelled-out locale names like "es_ES.UTF-8".

At least Solaris is kind enough to let you do that with
symlinks [2]https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html, so that after

cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8

translation starts working for that particular value of
lc_messages.

This policy dictates making a rather large number of symlinks
in that directory, which we've never done TTBOMK. It's a
bit sad that nobody has complained about this --- one must
conclude that the non-anglophone population of Solaris PG
users is nearly empty.

Anybody feel like doing something about this? I'm not
super excited about it myself, but if we don't, it's
probably a blocker for adding the test proposed at [1]/messages/by-id/247596.1765300108@sss.pgh.pa.us.
We do have Solaris BF animals that would start failing.

regards, tom lane

[1]: /messages/by-id/247596.1765300108@sss.pgh.pa.us
[2]: https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html

Thomas Munro

thomas.munro@gmail.com

5 months ago

In reply to: Tom Lane (#1)

Re: Solaris versus our NLS files

On Wed, Dec 10, 2025 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

After some quality time with Google, I learned why: with Solaris's
apparently-locally-hacked version of gettext, it's not good enough
to have $INSTALLATION/share/locale/ subdirectories named like
"es", "fr", etc. They have to be named after the
fully-spelled-out locale names like "es_ES.UTF-8".

Is it really locally hacked, or is it just Sun's libc[1]https://github.com/illumos/illumos-gate/tree/master/usr/src/lib/libc/port/i18n, which
invented gettext() in the first place, and then later added GNU's
extensions and .mo format after GNU's reimplementation became
widespread? From some (very) limited research on the topic, one thing
that GNU's reimplementation added that Sun's never had is the ability
to open a .mo with the wrong encoding and transcode it. Perhaps that
explains Sun's insistence on finding an exact match, and I guess that
might mean that you could get either mojibake or some kind of error if
you create codesetless symlinks (which I guess it would normally only
use when your locale's name doesn't have the codeset suffix, and then
I guess it would expect Latin-9 or whatever it thinks "es_ES" has)?

[1]: https://github.com/illumos/illumos-gate/tree/master/usr/src/lib/libc/port/i18n

Thomas Munro

thomas.munro@gmail.com

5 months ago

In reply to: Thomas Munro (#2)

Re: Solaris versus our NLS files

On Wed, Dec 10, 2025 at 10:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:

if you create codesetless symlinks

Oops, wrote that too fast... you want to add the suffixes. Well then
it's the other way around, and you'd have to generate new files with
the right encoding and suffixes (which means knowing which
combinations the target system has), instead of making symlinks, and
make sure that the "en_US" one is in the appropriate encoding, maybe?

Nico Williams

nico@cryptonector.com

5 months ago

In reply to: Thomas Munro (#3)

Re: Solaris versus our NLS files

On Wed, Dec 10, 2025 at 11:03:00AM +1300, Thomas Munro wrote:

On Wed, Dec 10, 2025 at 10:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:

if you create codesetless symlinks

Oops, wrote that too fast... you want to add the suffixes. Well then
it's the other way around, and you'd have to generate new files with
the right encoding and suffixes (which means knowing which
combinations the target system has), instead of making symlinks, and
make sure that the "en_US" one is in the appropriate encoding, maybe?

How about supporting only UTF-8 locales?

Nico
--

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Thomas Munro (#2)

Re: Solaris versus our NLS files

Thomas Munro <thomas.munro@gmail.com> writes:

On Wed, Dec 10, 2025 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

After some quality time with Google, I learned why: with Solaris's
apparently-locally-hacked version of gettext, it's not good enough
to have $INSTALLATION/share/locale/ subdirectories named like
"es", "fr", etc. They have to be named after the
fully-spelled-out locale names like "es_ES.UTF-8".

Is it really locally hacked, or is it just Sun's libc[1], which
invented gettext() in the first place, and then later added GNU's
extensions and .mo format after GNU's reimplementation became
widespread?

Sorry, I was imprecise there. This is Solaris' libc implementation:
configure reports

configure:18402: checking for library containing bind_textdomain_codeset
configure:18450: result: none required

and I don't see any libintl listed in "ldd postgres" either.

From some (very) limited research on the topic, one thing
that GNU's reimplementation added that Sun's never had is the ability
to open a .mo with the wrong encoding and transcode it. Perhaps that
explains Sun's insistence on finding an exact match, and I guess that
might mean that you could get either mojibake or some kind of error if
you create codesetless symlinks (which I guess it would normally only
use when your locale's name doesn't have the codeset suffix, and then
I guess it would expect Latin-9 or whatever it thinks "es_ES" has)?

Like some other platforms, it flat out won't accept codeset-less
lc_messages settings:

postgres=# SET lc_messages = 'es_ES';
ERROR: invalid value for parameter "lc_messages": "es_ES"
postgres=# SET lc_messages = 'es_ES.UTF-8';
SET
postgres=# select 1/0;
ERROR: división por cero

This is with the symlink in place. Yes I did try making a symlink
named "es_ES", but apparently there's some central source of truth
about what the valid locale names are.

It apparently is possible to install GNU gettext on top of Solaris,
although you then get into some fun about conflicts between GNU-
and OS-supplied headers. But I've not tried that here.

If you're right about Sun not doing transcoding, then I guess we would
only need to create symlinks matching the encodings used in our .po
files, which'd remove the symlink bloat problem and replace it with
how-do-we-extract-that-encoding-name ... although it looks like all
but one is in UTF-8, so maybe we should just decree they have to be
in UTF-8? The lone exception is src/bin/pg_config/po/nb.po, which
seems not to have been touched since 2013.

regards, tom lane

Thomas Munro

thomas.munro@gmail.com

5 months ago

In reply to: Tom Lane (#1)

Re: Solaris versus our NLS files

On Wed, Dec 10, 2025 at 11:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If you're right about Sun not doing transcoding, then I guess we would
only need to create symlinks matching the encodings used in our .po
files, which'd remove the symlink bloat problem and replace it with
how-do-we-extract-that-encoding-name ... although it looks like all
but one is in UTF-8, so maybe we should just decree they have to be
in UTF-8? The lone exception is src/bin/pg_config/po/nb.po, which
seems not to have been touched since 2013.

Import Notes

Reply to msg id not found: 299454.1765318999@sss.pgh.pa.us

Thomas Munro

thomas.munro@gmail.com

5 months ago

In reply to: Nico Williams (#4)

Re: Solaris versus our NLS files

On Wed, Dec 10, 2025 at 11:22 AM Nico Williams <nico@cryptonector.com> wrote:

On Wed, Dec 10, 2025 at 11:03:00AM +1300, Thomas Munro wrote:

On Wed, Dec 10, 2025 at 10:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:

if you create codesetless symlinks

Oops, wrote that too fast... you want to add the suffixes. Well then
it's the other way around, and you'd have to generate new files with
the right encoding and suffixes (which means knowing which
combinations the target system has), instead of making symlinks, and
make sure that the "en_US" one is in the appropriate encoding, maybe?

How about supporting only UTF-8 locales?

Yeah, if nobody noticed this wasn't working at all, then it makes
sense to defer the generation of .mo files for non-UTF-8 codesets
until someone eventually does notice that it still doesn't work in
legacy locales and feels inclined to do something about it, ie
forever. Tom's goal of having basic tests pass will be satisfied by
UTF-8-only.

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Thomas Munro (#7)

Re: Solaris versus our NLS files

Thomas Munro <thomas.munro@gmail.com> writes:

On Wed, Dec 10, 2025 at 11:22 AM Nico Williams <nico@cryptonector.com> wrote:

How about supporting only UTF-8 locales?

Yeah, if nobody noticed this wasn't working at all, then it makes
sense to defer the generation of .mo files for non-UTF-8 codesets
until someone eventually does notice that it still doesn't work in
legacy locales and feels inclined to do something about it, ie
forever. Tom's goal of having basic tests pass will be satisfied by
UTF-8-only.

Right. For the moment I only care about verifying that (a) some
translation happens and (b) the PRI* macros work as-expected.
Since we've already discovered platform-specific failures on both
points, this seems like a very worthwhile exercise.

Encoding-specific behaviors might be worth testing later, but
I'm not excited about that personally.

regards, tom lane

Peter Eisentraut

peter_e@gmx.net

5 months ago

In reply to: Tom Lane (#1)

Re: Solaris versus our NLS files

On 09.12.25 22:22, Tom Lane wrote:

At least Solaris is kind enough to let you do that with
symlinks [2], so that after

cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8

translation starts working for that particular value of
lc_messages.

This policy dictates making a rather large number of symlinks
in that directory, which we've never done TTBOMK.

How would one know all the country codes to create links for?

#10

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Peter Eisentraut (#9)

Re: Solaris versus our NLS files

Peter Eisentraut <peter@eisentraut.org> writes:

On 09.12.25 22:22, Tom Lane wrote:

At least Solaris is kind enough to let you do that with
symlinks [2], so that after
cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8
translation starts working for that particular value of
lc_messages.

How would one know all the country codes to create links for?

Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that. What's unclear is whether we should do
that on-the-fly to match the build machine, or do it once to
produce a curated list that could be subject to maintenance.
The former is like what we do to populate pg_collation
(although we do that at initdb not build time). But the latter
seems like it might be wiser policy.

regards, tom lane

#11

Nico Williams

nico@cryptonector.com

5 months ago

In reply to: Peter Eisentraut (#9)

Re: Solaris versus our NLS files

On Wed, Dec 10, 2025 at 05:02:14PM +0100, Peter Eisentraut wrote:

On 09.12.25 22:22, Tom Lane wrote:

At least Solaris is kind enough to let you do that with
symlinks [2], so that after

cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8

translation starts working for that particular value of
lc_messages.

This policy dictates making a rather large number of symlinks
in that directory, which we've never done TTBOMK.

How would one know all the country codes to create links for?

Does OpenIndiance really require this? Oh, I guess it does:

https://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/port/i18n/gettext_util.c?r=00ae5933&fi=mk_msgfile#mk_msgfile

That's a bummer.

Well, a list of country codes can probably be hardcoded into PG's build.
Or... the installation packaging could check at install time what
locales are installed and create these symlinks (but this is
unsatisfying because what if the locales in question get installed after
PG?).

Maybe PG should contribute a fix to Illumos :joy:

Nico
--

#12

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Tom Lane (#10)

Re: Solaris versus our NLS files

I wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

How would one know all the country codes to create links for?

Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that. What's unclear is whether we should do
that on-the-fly to match the build machine, or do it once to
produce a curated list that could be subject to maintenance.

It turns out to be a fairly minor patch to do it on-the-fly.
With the attached, I've gotten my NLS-testing patch to pass
on OpenIndiana. We end up with this in $INSTALL/share/locale/:

drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 cs
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 cs_CZ.UTF-8 -> cs
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_AT.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_BE.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_CH.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_DE.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_LI.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_LU.UTF-8 -> de
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 el
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 el_CY.UTF-8 -> el
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 el_GR.UTF-8 -> el
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_AR.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_BO.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_CL.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_CO.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_CR.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_DO.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_EC.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_ES.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_GQ.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_GT.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_HN.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_MX.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_NI.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PA.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PE.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PR.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PY.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_SV.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_US.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_UY.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_VE.UTF-8 -> es
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_BE.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_CA.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_CF.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_CH.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_FR.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_GN.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_LU.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_MC.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_MG.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_ML.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_NE.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_SN.UTF-8 -> fr
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 he
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 he_IL.UTF-8 -> he
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 id
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 id_ID.UTF-8 -> id
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 it
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 it_CH.UTF-8 -> it
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 it_IT.UTF-8 -> it
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ja
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ja_JP.UTF-8 -> ja
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ka
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ka_GE.UTF-8 -> ka
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ko
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ko_KR.UTF-8 -> ko
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 nb
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 nb_NO.UTF-8 -> nb
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 pl
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 pl_PL.UTF-8 -> pl
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 pt_BR
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ro
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ro_MD.UTF-8 -> ro
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ro_RO.UTF-8 -> ro
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ru
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ru_MD.UTF-8 -> ru
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ru_RU.UTF-8 -> ru
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ru_UA.UTF-8 -> ru
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 sv
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 sv_FI.UTF-8 -> sv
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 sv_SE.UTF-8 -> sv
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ta
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 ta_IN.UTF-8 -> ta
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 ta_LK.UTF-8 -> ta
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 tr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 tr_TR.UTF-8 -> tr
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 uk
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 uk_UA.UTF-8 -> uk
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 vi
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 vi_VN.UTF-8 -> vi
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 zh_CN
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 zh_TW

That's probably more links than we'd bother with in a curated
list, but it's not a huge number.

The attached patch only addresses the autoconf/make case.
I might be missing something, but so far as I can find,
meson wraps up the entire build/installation process for
translation files into a black box without user-serviceable
parts. So the lack of translation on Solaris is their bug
to fix.

I'm inclined to commit this and call it good. Probably
we should back-patch, too, despite the lack of field
complaints.

regards, tom lane

#13

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Tom Lane (#8)

Re: Solaris versus our NLS files

I wrote:

Encoding-specific behaviors might be worth testing later, but
I'm not excited about that personally.

Despite that disclaimer, I experimented with the questionable nb.po
file on my OpenIndiana installation, and it seems to Just Work
after creating the nb_NO.UTF-8 -> nb locale symlink:

tgl@openindiana:~$ LANG=nb_NO.UTF-8 pg_config bogus
pg_config: ugyldig argument: bogus
Prøv «pg_config --help» for mer informasjon.

"od -c" confirms that that output is UTF-8 encoded:

tgl@openindiana:~$ LANG=nb_NO.UTF-8 pg_config bogus 2>&1 | od -c
0000000 p g _ c o n f i g : u g y l d
0000020 i g a r g u m e n t : b o g
0000040 u s \n P r 303 270 v 302 253 p g _ c o
0000060 n f i g - - h e l p 302 273 f o
0000100 r m e r i n f o r m a s j o
0000120 n . \n
0000123

I checked further, and found that nb.mo has been transcoded to UTF8!
So this case didn't require any runtime transcoding anyway.

It doesn't stop there though. I soon realized that configure
had seized on GNU msgfmt not Sun's, and convert-to-UTF8 is
the default behavior of GNU msgfmt. I then forced the build
to use /usr/bin/msgfmt, and confirmed that now nb.mo remains in
LATIN1. But I still get exactly the above output!

So apparently, "Sun's gettext can't transcode" is obsolete
information. It works fine in this setup, despite using symlinks that
may not be telling the truth about the encoding of the .mo files.

I also verified that I could get LATIN1 output from UTF-8 .mo files,
so long as LANG is set to a name that "locale -a" admits the existence
of and I add a matching symlink. So we don't actually have to confine
our support for this to the UTF-8 locales. But I'm inclined to just
do that much for now and wait to see if anyone complains. (I agree
with Munro that that's likely to be never.)

This also means that this project doesn't require adoption of a
"UTF8-only" policy for our .po files. I'm inclined to think that
such a policy might be a good idea anyway, but it's not forced by
wanting to support Sun's gettext().

regards, tom lane

#14

Peter Eisentraut

peter_e@gmx.net

4 months ago

In reply to: Tom Lane (#10)

Re: Solaris versus our NLS files

On 10.12.25 17:14, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

On 09.12.25 22:22, Tom Lane wrote:

At least Solaris is kind enough to let you do that with
symlinks [2], so that after
cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8
translation starts working for that particular value of
lc_messages.

How would one know all the country codes to create links for?

Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that. What's unclear is whether we should do
that on-the-fly to match the build machine, or do it once to
produce a curated list that could be subject to maintenance.
The former is like what we do to populate pg_collation
(although we do that at initdb not build time). But the latter
seems like it might be wiser policy.

I wonder how other gettext-using projects handle this on Solaris. Most
of those will use a higher-level build system such as Automake or Meson,
and I don't see any facilities there to expand languages into full
locale names on installation. So either this is broken for everyone
else, too, or perhaps this is typically addressed on the packaging level
(or there is some other explanation we're not seeing yet). In either
case, I doubt that fixing this locally in PostgreSQL is the most
appropriate solution.

#15

Tom Lane

tgl@sss.pgh.pa.us

4 months ago

In reply to: Peter Eisentraut (#14)

Re: Solaris versus our NLS files

Peter Eisentraut <peter@eisentraut.org> writes:

On 10.12.25 17:14, Tom Lane wrote:

Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that.

I wonder how other gettext-using projects handle this on Solaris. Most
of those will use a higher-level build system such as Automake or Meson,
and I don't see any facilities there to expand languages into full
locale names on installation. So either this is broken for everyone
else, too, or perhaps this is typically addressed on the packaging level
(or there is some other explanation we're not seeing yet). In either
case, I doubt that fixing this locally in PostgreSQL is the most
appropriate solution.

I suspect that the answer for most non-Solaris-specific projects has
been "use GNU gettext". I don't want to rely on that answer, though,
because it will break every one of our Solaris/illumos buildfarm
animals, all of which are linking to libc gettext:

checking for library containing bind_textdomain_codeset... none required

Now it does appear that they all have (portions of?) GNU gettext
installed:

checking for msgfmt... /usr/gnu/bin/msgfmt

and so does my OpenIndiana image, which apparently means that GNU
gettext is pulled in by "sudo pkg install build-essential", because
that's all I did to install stuff. So maybe we should just say we
don't support the libc flavor of gettext on that platform, which
would require figuring out how to force linking to libintl instead.
I can look into that if it seems like a more acceptable solution.
I'm worried though that it amounts to adding a new dependency on
that platform.

regards, tom lane

#16

Tom Lane

tgl@sss.pgh.pa.us

4 months ago

In reply to: Tom Lane (#15)

Re: Solaris versus our NLS files

I wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

I wonder how other gettext-using projects handle this on Solaris.

I suspect that the answer for most non-Solaris-specific projects has
been "use GNU gettext".

I poked into that and it seems a lot messier than I hoped. At least
on OpenIndiana, what's actually installed by the "GNU gettext" package
is just the GNU flavors of the gettext command line tools, not a
replacement libintl. There is a /lib/libintl.so.1, but forcing our
code to link to that doesn't change the problematic behavior. So
I feel that asking users to install GNU gettext is not going to be
a practical solution.

I notice in Oracle's docs [1]https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html that creating symlinks in the install
tree is just one way to implement a mapping from locale names to
message catalogs: you can also do it at runtime by setting an
environment variable. So what I'm now theorizing is that users have
just learned to set that variable, which solves the problem across
all packages despite the lack of symlinks.

Hence, what I now propose to get my NLS-testing patch to work on
Solaris is for the test to forcibly set that environment variable
before invoking bindtextdomain:

/*
* Solaris' built-in gettext is not bright about associating locales
* with message catalogs that are named after just the language.
* Apparently the customary workaround is for users to set the
* LANGUAGE environment variable to provide a mapping. Do so here to
* ensure that the nls.sql regression test will work.
*/
#if defined(__sun__)
setenv("LANGUAGE", "es_ES.UTF-8:es", 1);
#endif
pg_bindtextdomain(TEXTDOMAIN);

This is surely a hack, but it's nicely localized and can be readily
undone if anyone comes up with a better answer.

I've verified that the attached v7 passes on current OpenIndiana.

regards, tom lane

[1]: https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html

Solaris versus our NLS files

Attachments:

Attachments: