Thread-safe nl_langinfo() and localeconv()

Started by Thomas Munroover 1 year ago21 messages
#1Thomas Munro
Thomas Munro
thomas.munro@gmail.com
3 attachment(s)

Hi,

Over on the discussion thread about remaining setlocale() work[1]/messages/by-id/4c5da86af36a0d5e430eee3f60ce5e06f1b5cd34.camel@j-davis.com, I
wrote out some long boring theories about $SUBJECT. Here are some
draft patches to try those theories out, and make a commitfest entry.
nl_langinfo_l() is a trivial drop-in replacement, and
pg_localeconv_r() has 4 different implementation strategies:

1. Windows, with ugly _configthreadlocale() and thread-local result.
2. Glibc, with nice nl_langinfo_l() extensions.
3. macOS/*BSD, with nice localeconv_l().
4. Baseline POSIX: uselocale() + localeconv() + honking great lock.

In reality it'd just be Solaris running #4 (and AIX if it comes back).
Whether they truly implement it as pessimally as the standard allows,
who knows... you could drop the lock if you somehow knew that they
returned a pointer to thread-local storage or a member of the locale_t
object.

[1]: /messages/by-id/4c5da86af36a0d5e430eee3f60ce5e06f1b5cd34.camel@j-davis.com

Attachments:

v1-0001-All-POSIX-systems-have-langinfo.h-and-CODESET.patchtext/x-patch; charset=US-ASCII; name=v1-0001-All-POSIX-systems-have-langinfo.h-and-CODESET.patch
v1-0002-Use-thread-safe-nl_langinfo_l-not-nl_langinfo.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Use-thread-safe-nl_langinfo_l-not-nl_langinfo.patch
v1-0003-Provide-thread-safe-pg_localeconv_r.patchtext/x-patch; charset=US-ASCII; name=v1-0003-Provide-thread-safe-pg_localeconv_r.patch
#2Heikki Linnakangas
Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Thomas Munro (#1)
Re: Thread-safe nl_langinfo() and localeconv()

On 13/08/2024 08:45, Thomas Munro wrote:

Hi,

Over on the discussion thread about remaining setlocale() work[1], I
wrote out some long boring theories about $SUBJECT. Here are some
draft patches to try those theories out, and make a commitfest entry.
nl_langinfo_l() is a trivial drop-in replacement, and
pg_localeconv_r() has 4 different implementation strategies:

1. Windows, with ugly _configthreadlocale() and thread-local result.
2. Glibc, with nice nl_langinfo_l() extensions.
3. macOS/*BSD, with nice localeconv_l().
4. Baseline POSIX: uselocale() + localeconv() + honking great lock.

In reality it'd just be Solaris running #4 (and AIX if it comes back).
Whether they truly implement it as pessimally as the standard allows,
who knows... you could drop the lock if you somehow knew that they
returned a pointer to thread-local storage or a member of the locale_t
object.

Patches 1 and 2 look good to me.

Patch 3 makes sense too, some comments on the details:

The #ifdefs and the LCONV_MEMBER stuff makes it a bit hard to follow
what happens in each implementation strategy. I wonder if it would be
more clear to duplicate more code.

There's a comment at the top of pg_locale.c ("!!! NOW HEAR THIS !!!")
that needs to be removed or adjusted now.

* The POSIX standard explicitly says that it is undefined what happens if
* LC_MONETARY or LC_NUMERIC imply an encoding (codeset) different from
* that implied by LC_CTYPE. In practice, all Unix-ish platforms seem to
* believe that localeconv() should return strings that are encoded in the
* codeset implied by the LC_MONETARY or LC_NUMERIC locale name. Hence,
* once we have successfully collected the localeconv() results, we will
* convert them from that codeset to the desired server encoding.

The patch loses this comment, leaving just a much shorter comment in the
WIN32 implementation. But it still seems like a relevant comment for the
!WIN32 implementation too.

This gets rid of some setlocale() calls and makes the returned value
unclobberable with a defined lifetime. The remaining call to
setlocale() is only a query of the name of the current local (in a

typo: local -> locale

multi-threaded future this would have to be changed, perhaps to use a
per-database or per-backend locale_t instead of LC_GLOBAL_LOCALE).

All known non-Windows targets have nl_langinfo_l(), from POSIX 2018.

I think that's supposed to be POSIX 2008

--
Heikki Linnakangas
Neon (https://neon.tech)

#3Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: Thread-safe nl_langinfo() and localeconv()

On Tue, Aug 13, 2024 at 6:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 13/08/2024 08:45, Thomas Munro wrote:
Patches 1 and 2 look good to me.

Thanks. I went ahead and pushed these ones. A couple of Macs in the
build farm are failing, as if they didn't include <xlocale.h> and
haven't seen the type locale_t. CI's macOS build is OK, and my own
local Mac is building master OK, and animal "indri" is OK... hmm,
those are all using MacPorts, but I don't yet see why that would be
it...

#4Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#3)
Re: Thread-safe nl_langinfo() and localeconv()

On Tue, Aug 13, 2024 at 11:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Aug 13, 2024 at 6:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 13/08/2024 08:45, Thomas Munro wrote:
Patches 1 and 2 look good to me.

Thanks. I went ahead and pushed these ones. A couple of Macs in the
build farm are failing, as if they didn't include <xlocale.h> and
haven't seen the type locale_t. CI's macOS build is OK, and my own
local Mac is building master OK, and animal "indri" is OK... hmm,
those are all using MacPorts, but I don't yet see why that would be
it...

Ah, got it. It was OK under meson but not autoconf for my Mac, so I
guess it must be transitive headers coming from somewhere making it
work for some systems. I just have a typo in an #ifdef macro. Will
fix.

#5Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#4)
1 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

Here's another mystery from Windows + MinGW. Although "fairywren" is
green, that is because it lacks ICU, which would activate extra tests.
CI is green too, but the optional CI task "Windows - Server 2019,
MinGW64 - Meson" has ICU and it is now failing if you trigger it[1]https://cirrus-ci.com/task/5928104793735168
after commit 35eeea62, in initdb/001_initdb:

[05:43:49.764] | 146/305 - options --locale-provider=icu --locale=und
--lc-*=C: no stderr FAIL

... because it logs a warning to stderr:

WARNING: no usable system locales were found

I can only assume there was some extra dependency on setlocale()
global state changes in the removed code. I don't quite get it, but
whatever the reason, it's less than helpful to have different
compilers taking different code paths on our weirdest OS that most of
us don't use, so I propose to push this change to take the regular
MSVC code path for MinGW too, when looking up code pages. Somehow,
this fixes that, though it'd probably take someone with a local MinGW
setup to dig into what exactly is happening there.

(There are plenty more places where we do something different for
MinGW. I suspect they are all obsolete problems. We should probably
just harmonise everything and see what breaks now that we have a CI
system, but that can be for another day.)

That warning is from pg_import_system_locales(), which is new-ish
(v16) on that OS. It was recently discovered to trigger a
pre-existing problem[2]/messages/by-id/CA+hUKG+FxeRLURZ=n8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg@mail.gmail.com: the simple setlocale() save/restore pattern
doesn't work in general on Windows, because some local names are
non-ASCII, and the restore can fail (abort in the system library due
to bad encoding, because the intermediate setlocale() changed the
expected encoding of the locale name itself). So it's good that we
aren't doing that anymore in this location; I'm just thinking out loud
about whether that phenomenon could also be somehow connected to this
failure, though I don't see it. Another realisation is that my
pg_localeconv_r() patch, which can't avoid a thread-safe setlocale()
save-and-restore on that OS (and might finish up being the only one
left in the tree by the time we're done?), had better use wsetlocale()
instead to defend itself against that particular madness.

[1]: https://cirrus-ci.com/task/5928104793735168
[2]: /messages/by-id/CA+hUKG+FxeRLURZ=n8NPyLwgjFds_SqU_cQvE40ks6RQKUGbGg@mail.gmail.com

Attachments:

0001-Harmonize-MinGW-CODESET-lookup-with-MSVC.patchtext/x-patch; charset=US-ASCII; name=0001-Harmonize-MinGW-CODESET-lookup-with-MSVC.patch
#6Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#2)
2 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

On Tue, Aug 13, 2024 at 6:23 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Patch 3 makes sense too, some comments on the details:
The #ifdefs and the LCONV_MEMBER stuff makes it a bit hard to follow
what happens in each implementation strategy. I wonder if it would be
more clear to duplicate more code.

I tried to make it easier to follow.

There's a comment at the top of pg_locale.c ("!!! NOW HEAR THIS !!!")
that needs to be removed or adjusted now.

Yeah. We can remove that PSA if we also fix up the equivalent code
for LC_TIME. First attempt at that attached.

* The POSIX standard explicitly says that it is undefined what happens if
* LC_MONETARY or LC_NUMERIC imply an encoding (codeset) different from
* that implied by LC_CTYPE. In practice, all Unix-ish platforms seem to
* believe that localeconv() should return strings that are encoded in the
* codeset implied by the LC_MONETARY or LC_NUMERIC locale name. Hence,
* once we have successfully collected the localeconv() results, we will
* convert them from that codeset to the desired server encoding.

The patch loses this comment, leaving just a much shorter comment in the
WIN32 implementation. But it still seems like a relevant comment for the
!WIN32 implementation too.

New version makes it much clearer, and also is much more careful about
what exactly happens if you have mismatched encodings.

(Over in CF #3772 I was exploring the idea of banning the use of
locales that are not compatible with the database encoding. As far as
I can guess, that idea must have come from the time when Windows
didn't have native UTF-8 support. Now it does. There I was mostly
interested in killing all the whcar_t conversion code, but maybe we
could also delete a few lines of transcoding around here too?)

Attachments:

v2-0001-Provide-thread-safe-pg_localeconv_r.patchapplication/octet-stream; name=v2-0001-Provide-thread-safe-pg_localeconv_r.patch
v2-0002-Use-thread-safe-strftime_l-instead-of-strftime.patchapplication/octet-stream; name=v2-0002-Use-thread-safe-strftime_l-instead-of-strftime.patch
#7Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#6)
3 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

Here's a new patch to add to this pile, this time for check_locale().
I also improved the error handling and comments in the other patches.

Attachments:

v3-0001-Provide-thread-safe-pg_localeconv_r.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Provide-thread-safe-pg_localeconv_r.patch
v3-0002-Use-thread-safe-strftime_l-instead-of-strftime.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Use-thread-safe-strftime_l-instead-of-strftime.patch
v3-0003-Remove-setlocale-calls-from-check_locale.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Remove-setlocale-calls-from-check_locale.patch
#8Heikki Linnakangas
Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Thomas Munro (#7)
Re: Thread-safe nl_langinfo() and localeconv()

On 15/08/2024 11:03, Thomas Munro wrote:

Here's a new patch to add to this pile, this time for check_locale().
I also improved the error handling and comments in the other patches.

There's a similar function in initdb, check_locale_name. I wonder if
that could reuse the same code.

I wonder if it would be more clear to split this into three functions:

/*
* Get the name of the locale in "native environment",
* like setlocale(category, NULL) does
*/
char *get_native_locale(int category);

/*
* Return true if 'locale' is valid locale name for 'category
*/
bool check_locale(int category, const char *locale);

/*
* Return a canonical name of given locale
*/
char *canonicalize_locale(int category, const char *locale);

result = malloc(strlen(canonical) + 1);
if (!result)
goto exit;
strcpy(result, canonical);

Could use "result = strdup(canonical)" here. Or even better, could we
use palloc()/pstrdup() here, and save the caller the trouble to copy it?

--
Heikki Linnakangas
Neon (https://neon.tech)

#9Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#8)
3 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

On Thu, Aug 15, 2024 at 11:11 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

There's a similar function in initdb, check_locale_name. I wonder if
that could reuse the same code.

Thanks, between this comment and some observations from Peter E and
Tom, I think I have a better plan now. I think they should *not*
match, and a comment saying so should be deleted. In the backend, we
should do neither ""-expansion (ie getenv("LC_...") whether direct or
indirect) nor canonicalisation (of Windows' deprecated pre-BCP 47
locale names), making this v4 patch extremely simple.

1. CREATE DATABASE doesn't really need to accept LOCALE = ''. What
is the point? It's not documented or desirable behavior AFAIK. If
you like defaults you can just not provide a locale at all and get the
template database's (which often comes from initdb, which often uses
the server environment). That behavior was already inconsistent with
CREATE COLLATION. So I think we should just reject "" in the backend
check_locale().

2. A similar argument applies to Windows canonicalisation. CREATE
COLLATION isn't doing it. CREATE DATABASE is, but again, what is the
point? See previous.

(I also think that initdb should use a different mechanism for finding
the native locale on Windows, but I already have a CF #3772 for that,
ie migration plan for BCP 47 and native UTF-8 on Windows, but I don't
want *this* thread to get blocked by our absence of Windows
reviewers/testers, so let's not tangle that up with this thread-safety
expedition.)

To show a concrete example of commands no longer accepted with this
version, because they call check_locale():

postgres=# set lc_monetary = '';
ERROR: invalid value for parameter "lc_monetary": ""

postgres=# create database db2 locale = '';
ERROR: invalid LC_COLLATE locale name: ""
HINT: If the locale name is specific to ICU, use ICU_LOCALE.

Does anyone see a problem with that?

I do see a complication for CREATE COLLATION, though. It doesn't call
check_locale(), is not changed in this patch, and still accepts ''.
Reasoning: There may be systems with '' in their pg_collation catalog
in the wild, since we never canonicalised with setlocale(), so it
might create some kind of unforeseen dump/restore/upgrade hazard if we
just ban '' outright, I just don't know what yet.

There is no immediate problem, ie there is no setlocale() to excise,
for *this* project. Longer term, we can't actually continue to allow
'' in COLLATION objects, though: that tells newlocale() to call
getenv(), which may be technically OK in a multi-threaded program
(that never calls setenv()), but is hardly desirable. But it will
also give the wrong result, if we pursue the plan that Jeff and I
discussed: we'll stop doing setenv("LC_COLLATE", datacollate) and
setenv("LC_CTYPE", datctype) in postinit.c (see pg_perm_setlocale()
calls). So I think any pg_collation catalog entries holding '' need
to be translated to datcollate/datctype... somewhere. I just don't
know where yet and don't want to tackle that in the same patch.

Attachments:

v4-0001-Provide-thread-safe-pg_localeconv_r.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Provide-thread-safe-pg_localeconv_r.patch
v4-0002-Use-thread-safe-strftime_l-instead-of-strftime.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Use-thread-safe-strftime_l-instead-of-strftime.patch
v4-0003-Remove-setlocale-from-check_locale.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Remove-setlocale-from-check_locale.patch
#10Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#9)
3 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

Here is a slightly better version of patch 0003. I removed some
unnecessary refactoring, making the patch smaller.

FTR I wrote a small program[1]https://github.com/macdice/hello-windows/blob/793eb2fe3e6738c200781f681a22a7e6358f39e5/test.c for CI to test the assumptions about
Windows in 0001. I printed out the addresses of the objects, to
confirm that different threads were looking at different objects once
the thread local mode was activated, and also assert that the struct
contents were as expected while 8 threads switched locales in a tight
loop, and the output[2]https://cirrus-ci.com/task/4650412253380608 looked OK to me.

[1]: https://github.com/macdice/hello-windows/blob/793eb2fe3e6738c200781f681a22a7e6358f39e5/test.c
[2]: https://cirrus-ci.com/task/4650412253380608

Attachments:

v5-0001-Provide-thread-safe-pg_localeconv_r.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Provide-thread-safe-pg_localeconv_r.patch
v5-0002-Use-thread-safe-strftime_l-instead-of-strftime.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Use-thread-safe-strftime_l-instead-of-strftime.patch
v5-0003-Remove-setlocale-from-check_locale.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Remove-setlocale-from-check_locale.patch
#11Thomas Munro
Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#1)
Re: Thread-safe nl_langinfo() and localeconv()

On Tue, Aug 13, 2024 at 5:45 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Over on the discussion thread about remaining setlocale() work[1], I
wrote out some long boring theories about $SUBJECT.

Just as an FYI/curiosity, I converted my frustrations with
localeconv() into a request[1]https://www.mail-archive.com/austin-group-l@opengroup.org/msg12850.html that POSIX consider standardising one
of the interfaces that doesn't suck, and the reaction seems so far
positive. Of course that doesn't really have any actionable
consequences on any relevant time frame, even if eventually
successful, and we can already use the saner interfaces on the systems
most of us really care about, but still... it's nice to think that
the pessimistic fallback code (really only used by Solaris and maybe
AIX) could eventually be redundant if it goes somewhere...

[1]: https://www.mail-archive.com/austin-group-l@opengroup.org/msg12850.html

#12Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Thomas Munro (#9)
Re: Thread-safe nl_langinfo() and localeconv()

On 16.08.24 02:48, Thomas Munro wrote:

2. A similar argument applies to Windows canonicalisation. CREATE
COLLATION isn't doing it. CREATE DATABASE is, but again, what is the
point? See previous.

I don't really know about Windows locales. But we are doing
canonicalization of ICU locale names now. So there seems to be a desire
to do canonicalization in general. (Obviously, if we're doing it
poorly, then we don't have to keep it that way indefinitely.)

#13Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Thomas Munro (#10)
Re: Thread-safe nl_langinfo() and localeconv()

On 19.08.24 08:29, Thomas Munro wrote:

Here is a slightly better version of patch 0003. I removed some
unnecessary refactoring, making the patch smaller.

FTR I wrote a small program[1] for CI to test the assumptions about
Windows in 0001. I printed out the addresses of the objects, to
confirm that different threads were looking at different objects once
the thread local mode was activated, and also assert that the struct
contents were as expected while 8 threads switched locales in a tight
loop, and the output[2] looked OK to me.

[1] https://github.com/macdice/hello-windows/blob/793eb2fe3e6738c200781f681a22a7e6358f39e5/test.c
[2] https://cirrus-ci.com/task/4650412253380608

Review of the patch
v5-0002-Use-thread-safe-strftime_l-instead-of-strftime.patch:

This all looks very sensible. My only comment on the code is that for
handling error returns from newlocale() and _create_locale() we already
have report_newlocale_failure(), which handles various special cases.
(But it doesn't do the _dosmaperr() that your patch does.) It would be
best if you used that as well (and maybe improve as needed).

A couple of comments on the commit message:

Use thread-safe strftime_l() instead of strftime().

I don't think this is about thread-safety of either function? It's more
that the latter requires a non-thread-safe code structure around it. I
would frame this more around the use-locale_t-everywhere theme than the
thread-safety theme.

While here, adjust error message for strftime_l() failure: it does not
set errno, so no %m.

Although POSIX says that strftime() and strftime_l() should change
errno, experimentation shows that they do not. So this is fine. But I
thought also that the previous code was problematic because errno could
be overwritten since the failing call, so you wouldn't get a very
accurate error message anyway.

#14Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Peter Eisentraut (#13)
3 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

This patch set is still desirable. Here is a rebased version of the v5
patches. I haven't applied any corrections or review comments.

Attachments:

v6-0001-Provide-thread-safe-pg_localeconv_r.patchtext/plain; charset=UTF-8; name=v6-0001-Provide-thread-safe-pg_localeconv_r.patch
v6-0002-Use-thread-safe-strftime_l-instead-of-strftime.patchtext/plain; charset=UTF-8; name=v6-0002-Use-thread-safe-strftime_l-instead-of-strftime.patch
v6-0003-Remove-setlocale-from-check_locale.patchtext/plain; charset=UTF-8; name=v6-0003-Remove-setlocale-from-check_locale.patch
#15Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Peter Eisentraut (#14)
5 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

On 09.02.25 15:52, Peter Eisentraut wrote:

This patch set is still desirable.  Here is a rebased version of the v5
patches.  I haven't applied any corrections or review comments.

Here is the same patch set with some review comments.

Patch 0001 looks okay to me. I'm just offering some cosmetic
improvements in patch 0004.

Patch 0002 also looks okay, except that the error handling could be
unified with existing code, as I had previously pointed out. Patch 0005
fixes that.

About patch 0003:

I had previously pointed out that the canonicalization might have been
intentional, and that we have canonicalization of ICU locale names. But
we don't have to keep the setlocale()-based locale checking
implementation just for that, I think. (If this was meant to be a real
feature offered by libc, there should be a get_locale_name(locale_t)
function.)

I'm unsure about the correct error handling of _create_locale() on
Windows. Does _dosmaperr(GetLastError()) do anything useful in this
context, or is this just copied from elsewhere? If it's useful, maybe
it should be added to report_newlocale_failure().

Also, maybe we don't need per-category locale checking? Would it not be
enough to check the locale using LC_ALL_MASK? Is there a scenario where
a locale name would work for one category but not another? I think the
old code was just conveniently coded that way so that you only have to
save and restore one locale category. But we wouldn't have to do it
that way anymore if we use newlocale().

Attachments:

v6.1-0001-Provide-thread-safe-pg_localeconv_r.patchtext/plain; charset=UTF-8; name=v6.1-0001-Provide-thread-safe-pg_localeconv_r.patch
v6.1-0002-Use-thread-safe-strftime_l-instead-of-strftime.patchtext/plain; charset=UTF-8; name=v6.1-0002-Use-thread-safe-strftime_l-instead-of-strftime.patch
v6.1-0003-Remove-setlocale-from-check_locale.patchtext/plain; charset=UTF-8; name=v6.1-0003-Remove-setlocale-from-check_locale.patch
v6.1-0004-fixup-Provide-thread-safe-pg_localeconv_r.patchtext/plain; charset=UTF-8; name=v6.1-0004-fixup-Provide-thread-safe-pg_localeconv_r.patch
v6.1-0005-fixup-Use-thread-safe-strftime_l-instead-of-str.patchtext/plain; charset=UTF-8; name=v6.1-0005-fixup-Use-thread-safe-strftime_l-instead-of-str.patch
#16Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Peter Eisentraut (#15)
Re: Thread-safe nl_langinfo() and localeconv()

On 14.02.25 15:13, Peter Eisentraut wrote:

On 09.02.25 15:52, Peter Eisentraut wrote:

This patch set is still desirable.  Here is a rebased version of the
v5 patches.  I haven't applied any corrections or review comments.

Here is the same patch set with some review comments.

Patch 0001 looks okay to me.  I'm just offering some cosmetic
improvements in patch 0004.

I have committed this patch.

The original patch had a typo that prevented the BSD-ish branch (using
localeconv_l()) from being taken, so it actually used the fallback
uselocale() branch, which then failed on CI on NetBSD. (But conversely,
this gave some testing on CI for the uselocale() branch.)

Patch 0002 also looks okay, except that the error handling could be
unified with existing code, as I had previously pointed out.  Patch 0005
fixes that.

I plan to commit this one next, after the above had a bit of time to stew.

About patch 0003:

I had previously pointed out that the canonicalization might have been
intentional, and that we have canonicalization of ICU locale names.  But
we don't have to keep the setlocale()-based locale checking
implementation just for that, I think.  (If this was meant to be a real
feature offered by libc, there should be a get_locale_name(locale_t)
function.)

POSIX 2024 actually has getlocalename_l(), but it doesn't appear to be
widely implemented.

Anyway, this patch fails tests on CI on NetBSD, so it will need some
further investigation.

#17Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Peter Eisentraut (#16)
2 attachment(s)
Re: Thread-safe nl_langinfo() and localeconv()

On 27.03.25 11:16, Peter Eisentraut wrote:

Patch 0002 also looks okay, except that the error handling could be
unified with existing code, as I had previously pointed out.  Patch
0005 fixes that.

I plan to commit this one next, after the above had a bit of time to stew.

also done

About patch 0003:

I had previously pointed out that the canonicalization might have been
intentional, and that we have canonicalization of ICU locale names.
But we don't have to keep the setlocale()-based locale checking
implementation just for that, I think.  (If this was meant to be a
real feature offered by libc, there should be a
get_locale_name(locale_t) function.)

Anyway, this patch fails tests on CI on NetBSD, so it will need some
further investigation.

It turns out the problem was that nl_langinfo_l() returns a codeset name
of "646" for the C locale, which we didn't have in our mapping table.
If we add that, then everything passes there as well. (But the use of
nl_langinfo_l() wasn't added by this patch, it just exposes it to the
tests. So this is apparently a pre-existing problem.)

Attached are the remaining patches in this series.

Attachments:

v6.2-0001-Remove-setlocale-from-check_locale.patchtext/plain; charset=UTF-8; name=v6.2-0001-Remove-setlocale-from-check_locale.patch
v6.2-0002-Add-646-as-a-codeset-name-for-ASCII.patchtext/plain; charset=UTF-8; name=v6.2-0002-Add-646-as-a-codeset-name-for-ASCII.patch
#18Andres Freund
Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#17)
Re: Thread-safe nl_langinfo() and localeconv()

Hi,

On 2025-03-28 09:13:54 +0100, Peter Eisentraut wrote:

On 27.03.25 11:16, Peter Eisentraut wrote:

Patch 0002 also looks okay, except that the error handling could be
unified with existing code, as I had previously pointed out.  Patch
0005 fixes that.

I plan to commit this one next, after the above had a bit of time to stew.

also done

My compiler complains with:

[20/1982 42 1%] Compiling C object src/port/libpgport_shlib.a.p/pg_localeconv_r.c.o
../../../../../home/andres/src/postgresql/src/port/pg_localeconv_r.c:63:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
63 | const static struct lconv_member_info table[] = {
| ^~~~~

This is the only such warning in the postgres tree. I'll go and fix that.

Greetings,

Andres Freund

#19Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Peter Eisentraut (#17)
Re: Thread-safe nl_langinfo() and localeconv()

On 28.03.25 09:13, Peter Eisentraut wrote:

About patch 0003:

I had previously pointed out that the canonicalization might have
been intentional, and that we have canonicalization of ICU locale
names. But we don't have to keep the setlocale()-based locale
checking implementation just for that, I think.  (If this was meant
to be a real feature offered by libc, there should be a
get_locale_name(locale_t) function.)

Anyway, this patch fails tests on CI on NetBSD, so it will need some
further investigation.

It turns out the problem was that nl_langinfo_l() returns a codeset name
of "646" for the C locale, which we didn't have in our mapping table. If
we add that, then everything passes there as well.  (But the use of
nl_langinfo_l() wasn't added by this patch, it just exposes it to the
tests.  So this is apparently a pre-existing problem.)

Further analysis:

(But I have not tested any of this.)

It appears that the new implementation of check_locale() provided by
this patch, using newlocale() instead of setlocale(), works differently
on NetBSD. Specifically, it apparently does not catch garbage locale
names. Instead, it just assumes they are C locale. Then, in
pg_get_encoding_from_locale(), we have special cases mapping "C" and
"POSIX" to SQL_ASCII. But as discussed, with this patch, we no longer
do canonicalization of the passed-in locale name, so if you pass a
garbage locale name, it will not match "C" or "POSIX". Then, you fall
through to the mapping table, and that's where we get the error about
the missing "646" entry. That's why this was not a problem before, even
though we've used nl_langinfo_l() for a few months and nl_langinfo()
before that.

I'm not sure what to do with this. If setlocale() and newlocale()
indeed behave differently in what set of locale names they accept, then
technically we ought to test both of them, since we do use both of them
later on. Or maybe we push on with the effort to get rid of setlocale()
calls and then just worry about testing newlocale() (as this patch
does). But right now, if newlocale() is more permissive, then we could
accept locale names that will later fail setlocale() calls, which might
be a problem.

#20Tom Lane
Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#19)
Re: Thread-safe nl_langinfo() and localeconv()

Peter Eisentraut <peter@eisentraut.org> writes:

I'm not sure what to do with this. If setlocale() and newlocale()
indeed behave differently in what set of locale names they accept, then
technically we ought to test both of them, since we do use both of them
later on. Or maybe we push on with the effort to get rid of setlocale()
calls and then just worry about testing newlocale() (as this patch
does). But right now, if newlocale() is more permissive, then we could
accept locale names that will later fail setlocale() calls, which might
be a problem.

I think the clear answer is "let's stop using setlocale(), and then
not have to worry about any behavioral differences".

regards, tom lane

#21Peter Eisentraut
Peter Eisentraut
peter@eisentraut.org
In reply to: Tom Lane (#20)
Re: Thread-safe nl_langinfo() and localeconv()

On 31.03.25 15:52, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

I'm not sure what to do with this. If setlocale() and newlocale()
indeed behave differently in what set of locale names they accept, then
technically we ought to test both of them, since we do use both of them
later on. Or maybe we push on with the effort to get rid of setlocale()
calls and then just worry about testing newlocale() (as this patch
does). But right now, if newlocale() is more permissive, then we could
accept locale names that will later fail setlocale() calls, which might
be a problem.

I think the clear answer is "let's stop using setlocale(), and then
not have to worry about any behavioral differences".

Right. That effort is woven into various other ongoing work related to
locales, collations, etc. So for now, I'm going to close this
commitfest entry as done, since $subject was achieved. The rest can be
picked up later, when the required progress in the other work has been made.