Solaris versus our NLS files
I idly tried the NLS-testing patch at [1]/messages/by-id/247596.1765300108@sss.pgh.pa.us on a Solaris image
(actually OpenIndiana), and was not really astonished to find that
it fails. Everything compiles cleanly, but the test shows that no
message translation happens, and manual checking confirms that.
After some quality time with Google, I learned why: with Solaris's
apparently-locally-hacked version of gettext, it's not good enough
to have $INSTALLATION/share/locale/ subdirectories named like
"es", "fr", etc. They have to be named after the
fully-spelled-out locale names like "es_ES.UTF-8".
At least Solaris is kind enough to let you do that with
symlinks [2]https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html, so that after
cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8
translation starts working for that particular value of
lc_messages.
This policy dictates making a rather large number of symlinks
in that directory, which we've never done TTBOMK. It's a
bit sad that nobody has complained about this --- one must
conclude that the non-anglophone population of Solaris PG
users is nearly empty.
Anybody feel like doing something about this? I'm not
super excited about it myself, but if we don't, it's
probably a blocker for adding the test proposed at [1]/messages/by-id/247596.1765300108@sss.pgh.pa.us.
We do have Solaris BF animals that would start failing.
regards, tom lane
[1]: /messages/by-id/247596.1765300108@sss.pgh.pa.us
[2]: https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html
On Wed, Dec 10, 2025 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
After some quality time with Google, I learned why: with Solaris's
apparently-locally-hacked version of gettext, it's not good enough
to have $INSTALLATION/share/locale/ subdirectories named like
"es", "fr", etc. They have to be named after the
fully-spelled-out locale names like "es_ES.UTF-8".
Is it really locally hacked, or is it just Sun's libc[1]https://github.com/illumos/illumos-gate/tree/master/usr/src/lib/libc/port/i18n, which
invented gettext() in the first place, and then later added GNU's
extensions and .mo format after GNU's reimplementation became
widespread? From some (very) limited research on the topic, one thing
that GNU's reimplementation added that Sun's never had is the ability
to open a .mo with the wrong encoding and transcode it. Perhaps that
explains Sun's insistence on finding an exact match, and I guess that
might mean that you could get either mojibake or some kind of error if
you create codesetless symlinks (which I guess it would normally only
use when your locale's name doesn't have the codeset suffix, and then
I guess it would expect Latin-9 or whatever it thinks "es_ES" has)?
[1]: https://github.com/illumos/illumos-gate/tree/master/usr/src/lib/libc/port/i18n
On Wed, Dec 10, 2025 at 10:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:
if you create codesetless symlinks
Oops, wrote that too fast... you want to add the suffixes. Well then
it's the other way around, and you'd have to generate new files with
the right encoding and suffixes (which means knowing which
combinations the target system has), instead of making symlinks, and
make sure that the "en_US" one is in the appropriate encoding, maybe?
On Wed, Dec 10, 2025 at 11:03:00AM +1300, Thomas Munro wrote:
On Wed, Dec 10, 2025 at 10:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:
if you create codesetless symlinks
Oops, wrote that too fast... you want to add the suffixes. Well then
it's the other way around, and you'd have to generate new files with
the right encoding and suffixes (which means knowing which
combinations the target system has), instead of making symlinks, and
make sure that the "en_US" one is in the appropriate encoding, maybe?
How about supporting only UTF-8 locales?
Nico
--
On Wed, Dec 10, 2025 at 11:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
If you're right about Sun not doing transcoding, then I guess we would
only need to create symlinks matching the encodings used in our .po
files, which'd remove the symlink bloat problem and replace it with
how-do-we-extract-that-encoding-name ... although it looks like all
but one is in UTF-8, so maybe we should just decree they have to be
in UTF-8? The lone exception is src/bin/pg_config/po/nb.po, which
seems not to have been touched since 2013.
+1
Import Notes
Reply to msg id not found: 299454.1765318999@sss.pgh.pa.us
On Wed, Dec 10, 2025 at 11:22 AM Nico Williams <nico@cryptonector.com> wrote:
On Wed, Dec 10, 2025 at 11:03:00AM +1300, Thomas Munro wrote:
On Wed, Dec 10, 2025 at 10:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:
if you create codesetless symlinks
Oops, wrote that too fast... you want to add the suffixes. Well then
it's the other way around, and you'd have to generate new files with
the right encoding and suffixes (which means knowing which
combinations the target system has), instead of making symlinks, and
make sure that the "en_US" one is in the appropriate encoding, maybe?How about supporting only UTF-8 locales?
Yeah, if nobody noticed this wasn't working at all, then it makes
sense to defer the generation of .mo files for non-UTF-8 codesets
until someone eventually does notice that it still doesn't work in
legacy locales and feels inclined to do something about it, ie
forever. Tom's goal of having basic tests pass will be satisfied by
UTF-8-only.
Thomas Munro <thomas.munro@gmail.com> writes:
On Wed, Dec 10, 2025 at 11:22 AM Nico Williams <nico@cryptonector.com> wrote:
How about supporting only UTF-8 locales?
Yeah, if nobody noticed this wasn't working at all, then it makes
sense to defer the generation of .mo files for non-UTF-8 codesets
until someone eventually does notice that it still doesn't work in
legacy locales and feels inclined to do something about it, ie
forever. Tom's goal of having basic tests pass will be satisfied by
UTF-8-only.
Right. For the moment I only care about verifying that (a) some
translation happens and (b) the PRI* macros work as-expected.
Since we've already discovered platform-specific failures on both
points, this seems like a very worthwhile exercise.
Encoding-specific behaviors might be worth testing later, but
I'm not excited about that personally.
regards, tom lane
On 09.12.25 22:22, Tom Lane wrote:
At least Solaris is kind enough to let you do that with
symlinks [2], so that aftercd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8translation starts working for that particular value of
lc_messages.This policy dictates making a rather large number of symlinks
in that directory, which we've never done TTBOMK.
How would one know all the country codes to create links for?
Peter Eisentraut <peter@eisentraut.org> writes:
On 09.12.25 22:22, Tom Lane wrote:
At least Solaris is kind enough to let you do that with
symlinks [2], so that after
cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8
translation starts working for that particular value of
lc_messages.
How would one know all the country codes to create links for?
Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that. What's unclear is whether we should do
that on-the-fly to match the build machine, or do it once to
produce a curated list that could be subject to maintenance.
The former is like what we do to populate pg_collation
(although we do that at initdb not build time). But the latter
seems like it might be wiser policy.
regards, tom lane
On Wed, Dec 10, 2025 at 05:02:14PM +0100, Peter Eisentraut wrote:
On 09.12.25 22:22, Tom Lane wrote:
At least Solaris is kind enough to let you do that with
symlinks [2], so that aftercd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8translation starts working for that particular value of
lc_messages.This policy dictates making a rather large number of symlinks
in that directory, which we've never done TTBOMK.How would one know all the country codes to create links for?
Does OpenIndiance really require this? Oh, I guess it does:
That's a bummer.
Well, a list of country codes can probably be hardcoded into PG's build.
Or... the installation packaging could check at install time what
locales are installed and create these symlinks (but this is
unsatisfying because what if the locales in question get installed after
PG?).
Maybe PG should contribute a fix to Illumos :joy:
Nico
--
I wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
How would one know all the country codes to create links for?
Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that. What's unclear is whether we should do
that on-the-fly to match the build machine, or do it once to
produce a curated list that could be subject to maintenance.
It turns out to be a fairly minor patch to do it on-the-fly.
With the attached, I've gotten my NLS-testing patch to pass
on OpenIndiana. We end up with this in $INSTALL/share/locale/:
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 cs
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 cs_CZ.UTF-8 -> cs
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_AT.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_BE.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_CH.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_DE.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_LI.UTF-8 -> de
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 de_LU.UTF-8 -> de
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 el
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 el_CY.UTF-8 -> el
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 el_GR.UTF-8 -> el
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_AR.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_BO.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_CL.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_CO.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_CR.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_DO.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_EC.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_ES.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_GQ.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_GT.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_HN.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_MX.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_NI.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PA.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PE.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PR.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_PY.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_SV.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_US.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_UY.UTF-8 -> es
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 es_VE.UTF-8 -> es
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_BE.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_CA.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_CF.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_CH.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_FR.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_GN.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_LU.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_MC.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_MG.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_ML.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_NE.UTF-8 -> fr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 fr_SN.UTF-8 -> fr
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 he
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 he_IL.UTF-8 -> he
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 id
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 id_ID.UTF-8 -> id
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 it
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 it_CH.UTF-8 -> it
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 it_IT.UTF-8 -> it
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ja
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ja_JP.UTF-8 -> ja
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ka
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ka_GE.UTF-8 -> ka
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ko
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ko_KR.UTF-8 -> ko
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 nb
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 nb_NO.UTF-8 -> nb
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 pl
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 pl_PL.UTF-8 -> pl
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 pt_BR
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ro
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ro_MD.UTF-8 -> ro
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ro_RO.UTF-8 -> ro
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ru
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ru_MD.UTF-8 -> ru
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ru_RU.UTF-8 -> ru
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 ru_UA.UTF-8 -> ru
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 sv
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 sv_FI.UTF-8 -> sv
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 sv_SE.UTF-8 -> sv
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 ta
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 ta_IN.UTF-8 -> ta
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:18 ta_LK.UTF-8 -> ta
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 tr
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 tr_TR.UTF-8 -> tr
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 uk
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 uk_UA.UTF-8 -> uk
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 vi
lrwxrwxrwx 1 tgl staff 2 Dec 10 18:19 vi_VN.UTF-8 -> vi
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 zh_CN
drwxr-xr-x 3 tgl staff 3 Dec 10 18:18 zh_TW
That's probably more links than we'd bother with in a curated
list, but it's not a huge number.
The attached patch only addresses the autoconf/make case.
I might be missing something, but so far as I can find,
meson wraps up the entire build/installation process for
translation files into a black box without user-serviceable
parts. So the lack of translation on Solaris is their bug
to fix.
I'm inclined to commit this and call it good. Probably
we should back-patch, too, despite the lack of field
complaints.
regards, tom lane
Attachments:
I wrote:
Encoding-specific behaviors might be worth testing later, but
I'm not excited about that personally.
Despite that disclaimer, I experimented with the questionable nb.po
file on my OpenIndiana installation, and it seems to Just Work
after creating the nb_NO.UTF-8 -> nb locale symlink:
tgl@openindiana:~$ LANG=nb_NO.UTF-8 pg_config bogus
pg_config: ugyldig argument: bogus
Prøv «pg_config --help» for mer informasjon.
"od -c" confirms that that output is UTF-8 encoded:
tgl@openindiana:~$ LANG=nb_NO.UTF-8 pg_config bogus 2>&1 | od -c
0000000 p g _ c o n f i g : u g y l d
0000020 i g a r g u m e n t : b o g
0000040 u s \n P r 303 270 v 302 253 p g _ c o
0000060 n f i g - - h e l p 302 273 f o
0000100 r m e r i n f o r m a s j o
0000120 n . \n
0000123
I checked further, and found that nb.mo has been transcoded to UTF8!
So this case didn't require any runtime transcoding anyway.
It doesn't stop there though. I soon realized that configure
had seized on GNU msgfmt not Sun's, and convert-to-UTF8 is
the default behavior of GNU msgfmt. I then forced the build
to use /usr/bin/msgfmt, and confirmed that now nb.mo remains in
LATIN1. But I still get exactly the above output!
So apparently, "Sun's gettext can't transcode" is obsolete
information. It works fine in this setup, despite using symlinks that
may not be telling the truth about the encoding of the .mo files.
I also verified that I could get LATIN1 output from UTF-8 .mo files,
so long as LANG is set to a name that "locale -a" admits the existence
of and I add a matching symlink. So we don't actually have to confine
our support for this to the UTF-8 locales. But I'm inclined to just
do that much for now and wait to see if anyone complains. (I agree
with Munro that that's likely to be never.)
This also means that this project doesn't require adoption of a
"UTF8-only" policy for our .po files. I'm inclined to think that
such a policy might be a good idea anyway, but it's not forced by
wanting to support Sun's gettext().
regards, tom lane
On 10.12.25 17:14, Tom Lane wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
On 09.12.25 22:22, Tom Lane wrote:
At least Solaris is kind enough to let you do that with
symlinks [2], so that after
cd $INSTALLATION/share/locale
ln -s es es_ES.UTF-8
translation starts working for that particular value of
lc_messages.How would one know all the country codes to create links for?
Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that. What's unclear is whether we should do
that on-the-fly to match the build machine, or do it once to
produce a curated list that could be subject to maintenance.
The former is like what we do to populate pg_collation
(although we do that at initdb not build time). But the latter
seems like it might be wiser policy.
I wonder how other gettext-using projects handle this on Solaris. Most
of those will use a higher-level build system such as Automake or Meson,
and I don't see any facilities there to expand languages into full
locale names on installation. So either this is broken for everyone
else, too, or perhaps this is typically addressed on the packaging level
(or there is some other explanation we're not seeing yet). In either
case, I doubt that fixing this locally in PostgreSQL is the most
appropriate solution.
Peter Eisentraut <peter@eisentraut.org> writes:
On 10.12.25 17:14, Tom Lane wrote:
Yeah, I've been wrestling with that question. The best idea
I have at the moment is to look at "locale -a" output to see
which country codes Solaris thinks there are for each language,
and duplicate that.
I wonder how other gettext-using projects handle this on Solaris. Most
of those will use a higher-level build system such as Automake or Meson,
and I don't see any facilities there to expand languages into full
locale names on installation. So either this is broken for everyone
else, too, or perhaps this is typically addressed on the packaging level
(or there is some other explanation we're not seeing yet). In either
case, I doubt that fixing this locally in PostgreSQL is the most
appropriate solution.
I suspect that the answer for most non-Solaris-specific projects has
been "use GNU gettext". I don't want to rely on that answer, though,
because it will break every one of our Solaris/illumos buildfarm
animals, all of which are linking to libc gettext:
checking for library containing bind_textdomain_codeset... none required
Now it does appear that they all have (portions of?) GNU gettext
installed:
checking for msgfmt... /usr/gnu/bin/msgfmt
and so does my OpenIndiana image, which apparently means that GNU
gettext is pulled in by "sudo pkg install build-essential", because
that's all I did to install stuff. So maybe we should just say we
don't support the libc flavor of gettext on that platform, which
would require figuring out how to force linking to libintl instead.
I can look into that if it seems like a more acceptable solution.
I'm worried though that it amounts to adding a new dependency on
that platform.
regards, tom lane