Let's add a test for NLS translation of PRI* macros
This was discussed a few weeks ago in [1]/messages/by-id/20250331.152829.1921392690375275165.horikyota.ntt@gmail.com, but I'm starting a
new thread so as not to confuse things with the latest patches in
that thread. The issue is that it seems like it'd be a good idea to
have specific regression testing of whether translation of the
<inttypes.h> PRI* macros works properly, since that is an aspect
of gettext() behavior that didn't use to work everywhere.
I don't know whether the attached will pass on Windows: we might
not be able to assume that "es_ES" is the right LC_MESSAGES
setting to use. But it works for me on Linux.
regards, tom lane
[1]: /messages/by-id/20250331.152829.1921392690375275165.horikyota.ntt@gmail.com
Attachments:
On 12/8/2025 1:23 PM, Tom Lane wrote:
This was discussed a few weeks ago in [1], but I'm starting a
new thread so as not to confuse things with the latest patches in
that thread. The issue is that it seems like it'd be a good idea to
have specific regression testing of whether translation of the
<inttypes.h> PRI* macros works properly, since that is an aspect
of gettext() behavior that didn't use to work everywhere.I don't know whether the attached will pass on Windows: we might
not be able to assume that "es_ES" is the right LC_MESSAGES
setting to use. But it works for me on Linux.regards, tom lane
[1] /messages/by-id/20250331.152829.1921392690375275165.horikyota.ntt@gmail.com
gettext() will be fine with that if you are using < 0.20. After that
version it expects you to send it the windows locale, not the
IsoLocalName() converted one-- it will fail to find "es-ES" (by
enumerating through ~259 window locales) and use a fallback to translate
messages. Since gettext() will not cache the "not found" locale, the
expensive enumeration call will happen everytime [1]https://savannah.gnu.org/bugs/?67781. I am in the
middle of writing some patches to take care of that problem and a couple
of others involving that area of the code and gettext().
[1]: https://savannah.gnu.org/bugs/?67781
--
Bryan Green
EDB: https://www.enterprisedb.com
Bryan Green <dbryan.green@gmail.com> writes:
On 12/8/2025 1:23 PM, Tom Lane wrote:
I don't know whether the attached will pass on Windows: we might
not be able to assume that "es_ES" is the right LC_MESSAGES
setting to use. But it works for me on Linux.
gettext() will be fine with that if you are using < 0.20. After that
version it expects you to send it the windows locale, not the
IsoLocalName() converted one-- it will fail to find "es-ES" (by
enumerating through ~259 window locales) and use a fallback to translate
messages. Since gettext() will not cache the "not found" locale, the
expensive enumeration call will happen everytime [1]. I am in the
middle of writing some patches to take care of that problem and a couple
of others involving that area of the code and gettext().
Cool; we'll worry about that later then. In the meantime, the
cfbot discovered two other problems with this patch:
1. FreeBSD and NetBSD don't like "es_ES" either: they want a codeset
specification appended, and it had better be spelled just so
(eg, "UTF-8" not "utf-8" or "utf8").
2. FreeBSD and macOS translate PRIdMAX as "jd" which causes our
snprintf.c to spit up. We might have noticed this earlier, but
the only use of PRI?MAX in our tree at the moment is in zic.c's
error reports, a code path we don't ordinarily exercise.
v4-0001 attached tries to deal with #1 by extracting a codeset
name from pg_database.datctype. That seems to work for me locally,
but I'll be interested to see what cfbot thinks.
v4-0002 attached deals with #2 by making snprintf.c support the "j"
width modifier, as required by POSIX for years now. We'd likely
have had to do that at some point anyway, so might as well be now.
(0002 probably ought to get committed first, but I wrote them
in this order so here they are.)
regards, tom lane
Attachments:
Sigh ... I should know better than to assume meson code will work
without testing it. One-line fix in v5-0002.
regards, tom lane
Attachments:
I wrote:
v4-0001 attached tries to deal with #1 by extracting a codeset
name from pg_database.datctype. That seems to work for me locally,
but I'll be interested to see what cfbot thinks.
cfbot didn't like that :-(. Upon further review, it seems that
the locale name needs to include a valid encoding spec, but that
encoding spec doesn't actually have to match the database encoding
(checked here on Linux, macOS, all three major BSDen). Perhaps
there would be some issues with character set conversion if not,
but the test case I'm proposing is intentionally chosen to not
involve any non-ASCII characters, so it shouldn't matter.
Hence, new version attached that just hard-codes es_ES.UTF-8.
We'll see if cfbot agrees with my local testing.
(I went ahead and pushed 0002, since that seemed pretty
uncontroversial as well as fixing a demonstrated bug.)
regards, tom lane