Remaining dependency on setlocale()

Started by Jeff Davisover 1 year ago90 messages

Jeff Davis

pgsql@j-davis.com

over 1 year ago

After some previous work here:

/messages/by-id/89475ee5487d795124f4e25118ea8f1853edb8cb.camel@j-davis.com

we are less dependent on setlocale(), but it's still not completely
gone.

setlocale() counts as thread-unsafe, so it would be nice to eliminate
it completely.

The obvious answer is uselocale(), which sets the locale only for the
calling thread, and takes precedence over whatever is set with
setlocale().

But there are a couple problems:

1. I don't think it's supported on Windows.

2. I don't see a good way to canonicalize a locale name, like in
check_locale(), which uses the result of setlocale().

Thoughts?

Regards,
Jeff Davis

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Jeff Davis (#1)

Re: Remaining dependency on setlocale()

Jeff Davis <pgsql@j-davis.com> writes:

But there are a couple problems:

1. I don't think it's supported on Windows.

Can't help with that, but surely Windows has some thread-safe way.

2. I don't see a good way to canonicalize a locale name, like in
check_locale(), which uses the result of setlocale().

What I can tell you about that is that check_locale's expectation
that setlocale does any useful canonicalization is mostly wishful
thinking [1]/messages/by-id/14856.1348497531@sss.pgh.pa.us. On a lot of platforms you just get the input string
back again. If that's the only thing keeping us on setlocale,
I think we could drop it. (Perhaps we should do some canonicalization
of our own instead?)

regards, tom lane

[1]: /messages/by-id/14856.1348497531@sss.pgh.pa.us

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Tom Lane (#2)

Re: Remaining dependency on setlocale()

On Wed, Aug 7, 2024 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

But there are a couple problems:

1. I don't think it's supported on Windows.

Can't help with that, but surely Windows has some thread-safe way.

It does. It's not exactly the same, instead there is a thing you can
call that puts setlocale() itself into a thread-local mode, but last
time I checked that mode was missing on MinGW so that's a bit of an
obstacle.

How far can we get by using more _l() functions? For example, [1]/messages/by-id/CA+hUKGJ=ca39Cg=y=S89EaCYvvCF8NrZRO=uog-cnz0VzC6Kfg@mail.gmail.com
shows a use of strftime() that I think can be converted to
strftime_l() so that it doesn't depend on setlocale(). Since POSIX
doesn't specify every obvious _l function, we might need to provide
any missing wrappers that save/restore thread-locally with
uselocale(). Windows doesn't have uselocale(), but it generally
doesn't need such wrappers because it does have most of the obvious
_l() functions.

2. I don't see a good way to canonicalize a locale name, like in
check_locale(), which uses the result of setlocale().

What I can tell you about that is that check_locale's expectation
that setlocale does any useful canonicalization is mostly wishful
thinking [1]. On a lot of platforms you just get the input string
back again. If that's the only thing keeping us on setlocale,
I think we could drop it. (Perhaps we should do some canonicalization
of our own instead?)

I know it does something on Windows (we know the EDB installer gives
it strings like "Language,Country" and it converts them to
"Language_Country.Encoding", see various threads about it all going
wrong), but I'm not sure it does anything we actually want to
encourage. I'm hoping we can gradually screw it down so that we only
have sane BCP 47 in the system on that OS, and I don't see why we
wouldn't just use them verbatim.

[1]: /messages/by-id/CA+hUKGJ=ca39Cg=y=S89EaCYvvCF8NrZRO=uog-cnz0VzC6Kfg@mail.gmail.com

Joe Conway

mail@joeconway.com

over 1 year ago

In reply to: Thomas Munro (#3)

Re: Remaining dependency on setlocale()

On 8/7/24 03:07, Thomas Munro wrote:

How far can we get by using more _l() functions? For example, [1]
shows a use of strftime() that I think can be converted to
strftime_l() so that it doesn't depend on setlocale(). Since POSIX
doesn't specify every obvious _l function, we might need to provide
any missing wrappers that save/restore thread-locally with
uselocale(). Windows doesn't have uselocale(), but it generally
doesn't need such wrappers because it does have most of the obvious
_l() functions.

Most of the strtoX functions have an _l variant, but one to watch is
atoi, which is defined with a hardcoded call to strtol, at least with glibc:

8<----------
/* Convert a string to an int. */
int
atoi (const char *nptr)
{
return (int) strtol (nptr, (char **) NULL, 10);
}
8<----------

I guess in many/most places we use atoi we don't care, but maybe it
matters for some?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Robert Haas

robertmhaas@gmail.com

over 1 year ago

In reply to: Joe Conway (#4)

Re: Remaining dependency on setlocale()

On Wed, Aug 7, 2024 at 9:42 AM Joe Conway <mail@joeconway.com> wrote:

I guess in many/most places we use atoi we don't care, but maybe it
matters for some?

I think we should move in the direction of replacing atoi() calls with
strtol() and actually checking for errors. In many places where use
atoi(), it's unlikely that the string would be anything but an
integer, so error checks are arguably unnecessary. A backup label file
isn't likely to say "START TIMELINE: potaytoes". On the other hand, if
it did say that, I'd prefer to get an error about potaytoes than have
it be treated as if it said "START TIMELINE: 0". And I've definitely
found missing error-checks over the years. For example, on pg14,
"pg_basebackup -Ft -Zmaximum -Dx" works as if you specified "-Z0"
because atoi("maximum") == 0. If we make a practice of checking
integer conversions for errors everywhere, we might avoid some such
silliness.

--
Robert Haas
EDB: http://www.enterprisedb.com

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Thomas Munro (#3)

Re: Remaining dependency on setlocale()

On Wed, 2024-08-07 at 19:07 +1200, Thomas Munro wrote:

How far can we get by using more _l() functions?

There are a ton of calls to, for example, isspace(), used mostly for
parsing.

I wouldn't expect a lot of differences in behavior from locale to
locale, like might be the case with iswspace(), but behavior can be
different at least in theory.

So I guess we're stuck with setlocale()/uselocale() for a while, unless
we're able to move most of those call sites over to an ascii-only
variant.

Regards,
Jeff Davis

Joe Conway

mail@joeconway.com

over 1 year ago

In reply to: Jeff Davis (#6)

Re: Remaining dependency on setlocale()

On 8/7/24 13:16, Jeff Davis wrote:

On Wed, 2024-08-07 at 19:07 +1200, Thomas Munro wrote:

How far can we get by using more _l() functions?

There are a ton of calls to, for example, isspace(), used mostly for
parsing.

I wouldn't expect a lot of differences in behavior from locale to
locale, like might be the case with iswspace(), but behavior can be
different at least in theory.

So I guess we're stuck with setlocale()/uselocale() for a while, unless
we're able to move most of those call sites over to an ascii-only
variant.

FWIW I see all of these in glibc:

isalnum_l, isalpha_l, isascii_l, isblank_l, iscntrl_l, isdigit_l,
isgraph_l, islower_l, isprint_l, ispunct_l, isspace_l, isupper_l,
isxdigit_l

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Robert Haas

robertmhaas@gmail.com

over 1 year ago

In reply to: Joe Conway (#7)

Re: Remaining dependency on setlocale()

On Wed, Aug 7, 2024 at 1:29 PM Joe Conway <mail@joeconway.com> wrote:

FWIW I see all of these in glibc:

isalnum_l, isalpha_l, isascii_l, isblank_l, iscntrl_l, isdigit_l,
isgraph_l, islower_l, isprint_l, ispunct_l, isspace_l, isupper_l,
isxdigit_l

On my MacBook (Ventura, 13.6.7), I see all of these except for isascii_l.

--
Robert Haas
EDB: http://www.enterprisedb.com

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Jeff Davis (#6)

Re: Remaining dependency on setlocale()

On Thu, Aug 8, 2024 at 5:16 AM Jeff Davis <pgsql@j-davis.com> wrote:

There are a ton of calls to, for example, isspace(), used mostly for
parsing.

I wouldn't expect a lot of differences in behavior from locale to
locale, like might be the case with iswspace(), but behavior can be
different at least in theory.

So I guess we're stuck with setlocale()/uselocale() for a while, unless
we're able to move most of those call sites over to an ascii-only
variant.

We do know of a few isspace() calls that are already questionable[1]/messages/by-id/CA+HWA9awUW0+RV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig@mail.gmail.com
(should be scanner_isspace(), or something like that). It's not only
weird that SELECT ROW('libertà!') is displayed with or without double
quote depending (in theory) on your locale, it's also undefined
behaviour because we feed individual bytes of a multi-byte sequence to
isspace(), so OSes disagree, and in practice we know that macOS and
Windows think that the byte 0xa inside 'à' is a space while glibc and
FreeBSD don't. Looking at the languages with many sequences
containing 0xa0, I guess you'd probably need to be processing CJK text
and cross-platform for the difference to become obvious (that was the
case for the problem report I analysed):

for i in range(1, 0xffff):
if (i < 0xd800 or i > 0xdfff) and 0xa0 in chr(i).encode('UTF-8'):
print("%04x: %s" % (i, chr(i)))

[1]: /messages/by-id/CA+HWA9awUW0+RV_gO9r1ABZwGoZxPztcJxPy8vMFSTbTfi4jig@mail.gmail.com

#10

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Robert Haas (#8)

Re: Remaining dependency on setlocale()

On Thu, Aug 8, 2024 at 6:18 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 7, 2024 at 1:29 PM Joe Conway <mail@joeconway.com> wrote:

FWIW I see all of these in glibc:

isalnum_l, isalpha_l, isascii_l, isblank_l, iscntrl_l, isdigit_l,
isgraph_l, islower_l, isprint_l, ispunct_l, isspace_l, isupper_l,
isxdigit_l

On my MacBook (Ventura, 13.6.7), I see all of these except for isascii_l.

Those (except isascii_l) are from POSIX 2008[1]https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/isspace.html. They were absorbed
from "Extended API Set Part 4"[2]https://pubs.opengroup.org/onlinepubs/9699939499/toc.pdf, along with locale_t (that's why
there is a header <xlocale.h> on a couple of systems even though after
absorption they are supposed to be in <locale.h>). We already
decided that all computers have that stuff (commit 8d9a9f03), but the
reality is a little messier than that... NetBSD hasn't implemented
uselocale() yet[3]/messages/by-id/CWZBBRR6YA8D.8EHMDRGLCKCD@neon.tech, though it has a good set of _l functions. As
discussed in [3]/messages/by-id/CWZBBRR6YA8D.8EHMDRGLCKCD@neon.tech, ECPG code is therefore currently broken in
multithreaded clients because it's falling back to a setlocale() path,
and I think Windows+MinGW must be too (it lacks
HAVE__CONFIGTHREADLOCALE), but those both have a good set of _l
functions. In that thread I tried to figure out how to use _l
functions to fix that problem, but ...

The issue there is that we have our own snprintf.c, that implicitly
requires LC_NUMERIC to be "C" (it is documented as always printing
floats a certain way ignoring locale and that's what the callers there
want in frontend and backend code, but in reality it punts to system
snprintf for floats, assuming that LC_NUMERIC is "C", which we
configure early in backend startup, but frontend code has to do it for
itself!). So we could use snprintf_l or strtod_l instead, but POSIX
hasn't got those yet. Or we could use own own Ryu code (fairly
specific), but integrating Ryu into our snprintf.c (and correctly
implementing all the %... stuff?) sounds like quite a hard,
devil-in-the-details kind of an undertaking to me. Or maybe it's
easy, I dunno. As for the _l functions, you could probably get away
with "every computer has either uselocale() or snprintf_() (or
strtod_()?)" and have two code paths in our snprintf.c. But then we'd
also need a place to track a locale_t for a long-lived newlocale("C"),
which was too messy in my latest attempt...

[1]: https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/functions/isspace.html
[2]: https://pubs.opengroup.org/onlinepubs/9699939499/toc.pdf
[3]: /messages/by-id/CWZBBRR6YA8D.8EHMDRGLCKCD@neon.tech

#11

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Joe Conway (#7)

Re: Remaining dependency on setlocale()

On Wed, 2024-08-07 at 13:28 -0400, Joe Conway wrote:

FWIW I see all of these in glibc:

isalnum_l, isalpha_l, isascii_l, isblank_l, iscntrl_l, isdigit_l,
isgraph_l, islower_l, isprint_l, ispunct_l, isspace_l, isupper_l,
isxdigit_l

My point was just that there are a lot of those call sites (especially
for isspace()) in various parsers. It feels like a lot of code churn to
change all of them, when a lot of them seem to be intended for ascii
anyway.

And where do we get the locale_t structure from? We can create our own
at database connection time, and supply it to each of those call sites,
but I'm not sure that's a huge advantage over just using uselocale().

Regards,
Jeff Davis

#12

Andreas Karlsson

andreas@proxel.se

over 1 year ago

In reply to: Jeff Davis (#11)

Re: Remaining dependency on setlocale()

On 8/8/24 12:45 AM, Jeff Davis wrote:

My point was just that there are a lot of those call sites (especially
for isspace()) in various parsers. It feels like a lot of code churn to
change all of them, when a lot of them seem to be intended for ascii
anyway.

And where do we get the locale_t structure from? We can create our own
at database connection time, and supply it to each of those call sites,
but I'm not sure that's a huge advantage over just using uselocale().

I am leaning towards that we should write our own pure ascii functions
for this. Since we do not support any non-ascii compatible encodings
anyway I do not see the point in having locale support in most of these
call-sites.

Andewas

#13

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Andreas Karlsson (#12)

Re: Remaining dependency on setlocale()

On Fri, 2024-08-09 at 13:41 +0200, Andreas Karlsson wrote:

I am leaning towards that we should write our own pure ascii
functions
for this.

That makes sense for a lot of call sites, but it could cause breakage
if we aren't careful.

Since we do not support any non-ascii compatible encodings
anyway I do not see the point in having locale support in most of
these
call-sites.

An ascii-compatible encoding just means that the code points in the
ascii range are represented as ascii. I'm not clear on whether code
points in the ascii range can return different results for things like
isspace(), but it sounds plausible -- toupper() can return different
results for 'i' in tr_TR.

Also, what about the values outside 128-255, which are still valid
input to isspace()?

Regards,
Jeff Davis

#14

Tristan Partin

tristan@partin.io

over 1 year ago

In reply to: Jeff Davis (#1)

Re: Remaining dependency on setlocale()

On Tue Aug 6, 2024 at 5:00 PM CDT, Jeff Davis wrote:

After some previous work here:

/messages/by-id/89475ee5487d795124f4e25118ea8f1853edb8cb.camel@j-davis.com

we are less dependent on setlocale(), but it's still not completely
gone.

setlocale() counts as thread-unsafe, so it would be nice to eliminate
it completely.

The obvious answer is uselocale(), which sets the locale only for the
calling thread, and takes precedence over whatever is set with
setlocale().

But there are a couple problems:

1. I don't think it's supported on Windows.

2. I don't see a good way to canonicalize a locale name, like in
check_locale(), which uses the result of setlocale().

Thoughts?

Hey Jeff,

See this thread[0]/messages/by-id/CWMW5OZBWJ10.1YFLQWSUE5RE9@neon.tech for some work that I had previously done. Feel free
to take it over, or we could collaborate.

[0]: /messages/by-id/CWMW5OZBWJ10.1YFLQWSUE5RE9@neon.tech

--
Tristan Partin
Neon (https://neon.tech)

#15

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Tristan Partin (#14)

Re: Remaining dependency on setlocale()

Hi,

On Fri, 2024-08-09 at 15:16 -0500, Tristan Partin wrote:

Hey Jeff,

See this thread[0] for some work that I had previously done. Feel
free
to take it over, or we could collaborate.

[0]:
/messages/by-id/CWMW5OZBWJ10.1YFLQWSUE5RE9@neon.tech

Sounds good, sorry I missed that.

Can you please rebase and we can discuss in that thread?

Regards,
Jeff Davis

#16

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Thomas Munro (#3)

Re: Remaining dependency on setlocale()

On Wed, Aug 7, 2024 at 7:07 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Aug 7, 2024 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

But there are a couple problems:

1. I don't think it's supported on Windows.

Can't help with that, but surely Windows has some thread-safe way.

It does. It's not exactly the same, instead there is a thing you can
call that puts setlocale() itself into a thread-local mode, but last
time I checked that mode was missing on MinGW so that's a bit of an
obstacle.

Actually the MinGW situation might be better than that these days. I
know of three environments where we currently have to keep code
working on MinGW: build farm animal fairywren (msys2 compiler
toochain), CI's optional "Windows - Server 2019, MinGW64 - Meson"
task, and CI's "CompilerWarnings" task, in the "mingw_cross_warning"
step (which actually runs on Linux, and uses configure rather than
meson). All three environments show that they have
_configthreadlocale. So could we could simply require it on Windows?
Then it might be possible to write a replacement implementation of
uselocale() that does a two-step dance with _configthreadlocale() and
setlocale(), restoring both afterwards if they changed. That's what
ECPG open-codes already.

The NetBSD situation is more vexing. I was trying to find out if
someone is working on it and unfortunately it looks like there is a
principled stand against adding it:

https://mail-index.netbsd.org/tech-userlevel/2015/12/28/msg009546.html
https://mail-index.netbsd.org/netbsd-users/2017/02/14/msg019352.html

They're right that we really just want to use "C" in some places, and
their LC_C_LOCALE is a very useful system-provided value to be able to
pass into _l functions. It's a shame it's non-standard, because
without it you have to allocate a locale_t for "C" and keep it
somewhere to feed to _l functions...

#17

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Thomas Munro (#16)

Re: Remaining dependency on setlocale()

I've posted a new attempt at ripping those ECPG setlocales() out on
the other thread that had the earlier version and discussion:

/messages/by-id/CA+hUKG+Yv+ps=nS2T8SS1UDU=iySHSr4sGHYiYGkPTpZx6Ooww@mail.gmail.com

#18

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Jeff Davis (#6)

Re: Remaining dependency on setlocale()

On Thu, Aug 8, 2024 at 5:16 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Wed, 2024-08-07 at 19:07 +1200, Thomas Munro wrote:

How far can we get by using more _l() functions?

There are a ton of calls to, for example, isspace(), used mostly for
parsing.

I wouldn't expect a lot of differences in behavior from locale to
locale, like might be the case with iswspace(), but behavior can be
different at least in theory.

So I guess we're stuck with setlocale()/uselocale() for a while, unless
we're able to move most of those call sites over to an ascii-only
variant.

Here are two more cases that I don't think I've seen discussed.

1. The nl_langinfo() call in pg_get_encoding_from_locale(), can
probably be changed to nl_langinfo_l() (it is everywhere we currently
care about except Windows, which has a different already-thread-safe
alternative; AIX seems to lack the _l version, but someone writing a
patch to re-add support for that OS could supply the configure goo for
a uselocale() safe/restore implementation). One problem is that it
has callers that pass it NULL meaning the backend default, but we'd
perhaps use LC_C_GLOBAL for now and have to think about where we get
the database default locale_t in the future.

2. localeconv() is *doubly* non-thread-safe: it depends on the
current locale, and it also returns an object whose storage might be
clobbered by any other call to localeconv(), setlocale, or even,
according to POSIX, uselocale() (!!!). I think that effectively
closes off that escape hatch. On some OSes (macOS, BSDs) you find
localeconv_l() and then I think they give you a more workable
lifetime: as long as the locale_t lives, which makes perfect sense. I
am surprised that no one has invented localeconv_r() where you supply
the output storage, and you could wrap that in uselocale()
save/restore to deal with the other problem, or localeconv_r_l() or
something. I can't understand why this is so bad. The glibc
documentation calls it "a masterpiece of poor design". Ahh, so it
seems like we need to delete our use of localeconf() completely,
because we should be able to get all the information we need from
nl_langinfo_l() instead:

https://www.gnu.org/software/libc/manual/html_node/Locale-Information.html

#19

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Thomas Munro (#18)

Re: Remaining dependency on setlocale()

On Mon, Aug 12, 2024 at 3:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:

1. The nl_langinfo() call in pg_get_encoding_from_locale(), can
probably be changed to nl_langinfo_l() (it is everywhere we currently
care about except Windows, which has a different already-thread-safe
alternative ...

... though if we wanted to replace all use of localeconv and struct
lconv with nl_langinfo_l() calls, it's not totally obvious how to do
that on Windows. Its closest thing is GetLocaleInfoEx(), but that has
complications: it takes wchar_t locale names, which we don't even have
and can't access when we only have a locale_t, and it must look them
up in some data structure every time, and it copies data out to the
caller as wchar_t so now you have two conversion problems and a
storage problem. If I understand correctly, the whole point of
nl_langinfo_l(item, loc) is that it is supposed to be fast, it's
really just an array lookup, and item is just an index, and the result
is supposed to be stable as long as loc hasn't been freed (and the
thread hasn't exited). So you can use it without putting your own
caching in front of it. One idea I came up with which I haven't tried
and it might turn out to be terrible, is that we could change our
definition of locale_t on Windows. Currently it's a typedef to
Windows' _locale_t, and we use it with a bunch of _XXX functions that
we access by macro to remove the underscore. Instead, we could make
locale_t a pointer to a struct of our own design in WIN32 builds,
holding the native _locale_t and also an array full of all the values
that nl_langinfo_l() can return. We'd provide the standard enums,
indexes into that array, in a fake POSIX-oid header <langinfo.h>.
Then nl_langinfo_l(item, loc) could be implemented as
loc->private_langinfo[item], and strcoll_l(.., loc) could be a static
inline function that does _strcol_l(...,
loc->private_windows_locale_t). These structs would be allocated and
freed with standard-looking newlocale() and freelocale(), so we could
finally stop using #ifdef WIN32-wrapped _create_locale() directly.
Then everything would look more POSIX-y, nl_langinfo_l() could be used
directly wherever we need fast access to that info, and we could, I
think, banish the awkward localeconv, right? I don't know if this all
makes total sense and haven't tried it, just spitballing here...

#20

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Thomas Munro (#19)

Re: Remaining dependency on setlocale()

On Mon, Aug 12, 2024 at 4:53 PM Thomas Munro <thomas.munro@gmail.com> wrote:

... though if we wanted to replace all use of localeconv and struct
lconv with nl_langinfo_l() calls,

Gah. I realised while trying the above that you can't really replace
localeconv() with nl_langinfo_l() as the GNU documentation recommends,
because some of the lconv fields we're using are missing from
langinfo.h in POSIX (only GNU added them all, that was a good idea).
:-(

Next idea:

Windows: its localeconv() returns pointer to thread-local storage,
good, but it still needs setlocale(). So the options are: make our
own lconv-populator function for Windows, using GetLocaleInfoEx(), or
do that _configthreadlocale() dance (possibly excluding some MinGW
configurations from working)
Systems that have localeconv_l(): use that
POSIX: use uselocale() and also put a big global lock around
localeconv() call + accessing result (optionally skipping that on an
OS-by-OS basis after confirming that its implementation doesn't really
need it)

The reason the uselocale() + localeconv() seems to require a Big Lock
(by default at least) is that the uselocale() deals only with the
"current locale" aspect, not the output buffer aspect. Clearly the
standard allows for it to be thread-local storage (that's why since
2008 it says that after thread-exit you can't access the result, and I
guess that's how it works on real systems (?)), but it also seems to
allow for a single static buffer (that's why it says that it's not
re-entrant, and any call to localeconv() might clobber it). That
might be OK in practice because we tend to cache that stuff, eg when
assigning GUC lc_monetary (that cache would presumably become
thread-local in the first phase of the multithreading plan), so the
locking shouldn't really hurt.

The reason we'd have to have three ways, and not just two, is again
that NetBSD declined to implement uselocale().

I'll try this in a bit unless someone else has better ideas or plans
for this part... sorry for the drip-feeding.

#21

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Thomas Munro (#20)

Re: Remaining dependency on setlocale()

On Tue, Aug 13, 2024 at 10:43 AM Thomas Munro <thomas.munro@gmail.com> wrote:

I'll try this in a bit unless someone else has better ideas or plans
for this part... sorry for the drip-feeding.

And done, see commitfest entry #5170.

#22

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Thomas Munro (#16)

Re: Remaining dependency on setlocale()

On Sat, 2024-08-10 at 09:42 +1200, Thomas Munro wrote:

The NetBSD situation is more vexing. I was trying to find out if
someone is working on it and unfortunately it looks like there is a
principled stand against adding it:

https://mail-index.netbsd.org/tech-userlevel/2015/12/28/msg009546.html
https://mail-index.netbsd.org/netbsd-users/2017/02/14/msg019352.html

The objection seems to be very general: that uselocale() modifies the
thread state and affects calls a long distance from uselocale(). I
don't disagree with the general sentiment. But in effect, that just
prevents people migrating away from setlocale(), to which the same
argument applies, and is additionally thread-unsafe.

The only alternative is to essentially ban the use of non-_l variants,
which is fine I suppose, but causes a fair amount of code churn.

They're right that we really just want to use "C" in some places, and
their LC_C_LOCALE is a very useful system-provided value to be able
to
pass into _l functions. It's a shame it's non-standard, because
without it you have to allocate a locale_t for "C" and keep it
somewhere to feed to _l functions...

If we're going to do that, why not just have ascii-only variants of our
own? pg_ascii_isspace(...) is at least as readable as isspace_l(...,
LC_C_LOCALE).

Regards,
Jeff Davis

#23

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Jeff Davis (#22)

Re: Remaining dependency on setlocale()

On Wed, Aug 14, 2024 at 1:05 PM Jeff Davis <pgsql@j-davis.com> wrote:

The only alternative is to essentially ban the use of non-_l variants,
which is fine I suppose, but causes a fair amount of code churn.

Let's zoom out a bit and consider some ways we could set up the
process, threads and individual calls:

1. The process global locale is always "C". If you ever call
uselocale(), it can only be for short stretches, and you have to
restore it straight after; perhaps it is only ever used in replacement
_l() functions for systems that lack them. You need to use _l()
functions for all non-"C" locales. The current database default needs
to be available as a variable (in future: thread-local variable, or
reachable from one), so you can use it in _l() functions. The "C"
locale can be accessed implicitly with non-l() functions, or you could
ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE) for
"C". Or a name like PG_C_LOCALE, which, in backend code could be just
LC_GLOBAL_LOCALE, while in frontend/library code it could be the
singleton mechanism I showed in CF#5166.

XXX Note that nailing LC_ALL to "C" in the backend would extend our
existing policy for LC_NUMERIC to all categories. That's why we can
use strtod() in the backend and expect the radix character to be '.'.
It's interesting to contemplate the strtod() calls in CF#5166: they
are code copied-and-pasted from backend and frontend; in the backend
we can use strtod() currently but in the frontend code I showed a
change to strtod_l(..., PG_C_LOCALE), in order to be able to delete
some ugly deeply nested uselocale()/setlocale() stuff of the exact
sort that NetBSD hackers (and I) hate. It's obviously a bit of a code
smell that it's copied-and-pasted in the first place, and should
really share code. Supposing some of that stuff finishes up in
src/common, then I think you'd want a strtod_l(..., PG_C_LOCALE) that
could be allowed to take advantage of the knowledge that the global
locale is "C" in the backend. Just thoughts...

2. The process global locale is always "C". Each backend process (in
future: thread) calls uselocale() to set the thread-local locale to
the database default, so it can keep using the non-_l() functions as a
way to access the database default, and otherwise uses _l() functions
if it wants something else (as we do already). The "C" locale can be
accessed with foo_l(..., LC_GLOBAL_LOCALE) or PG_C_LOCALE etc.

XXX This option is blocked by NetBSD's rejection of uselocale(). I
guess if you really wanted to work around NetBSD's policy you could
make your own wrapper for all affected functions, translating foo() to
foo_l(..., pg_thread_current_locale), so you could write uselocale(),
which is pretty much what every other libc does... But eughhh

3. The process global locale is inherited from the system or can be
set by the user however they want for the benefit of extensions, but
we never change it after startup or refer to it. Then we do the same
as 1 or 2, except if we ever want "C" we'll need a locale_t for that,
again perhaps using the PC_C_LOCALE mechanism. Non-_l() functions are
effectively useless except in cases where you really want to use the
system's settings inherited from startup, eg for messages, so they'd
mostly be banned.

What else?

They're right that we really just want to use "C" in some places, and
their LC_C_LOCALE is a very useful system-provided value to be able
to
pass into _l functions. It's a shame it's non-standard, because
without it you have to allocate a locale_t for "C" and keep it
somewhere to feed to _l functions...

If we're going to do that, why not just have ascii-only variants of our
own? pg_ascii_isspace(...) is at least as readable as isspace_l(...,
LC_C_LOCALE).

Yeah, I agree there are some easy things we should do that way. In
fact we've already established that scanner_isspace() needs to be used
in lots more places for that, even aside from thread-safety.

#24

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Thomas Munro (#23)

Re: Remaining dependency on setlocale()

On Wed, 2024-08-14 at 14:31 +1200, Thomas Munro wrote:

1. The process global locale is always "C". If you ever call
uselocale(), it can only be for short stretches, and you have to
restore it straight after; perhaps it is only ever used in
replacement
_l() functions for systems that lack them. You need to use _l()
functions for all non-"C" locales. The current database default
needs
to be available as a variable (in future: thread-local variable, or
reachable from one), so you can use it in _l() functions. The "C"
locale can be accessed implicitly with non-l() functions, or you
could
ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE)
for
"C". Or a name like PG_C_LOCALE, which, in backend code could be
just
LC_GLOBAL_LOCALE, while in frontend/library code it could be the
singleton mechanism I showed in CF#5166.

+1 to this approach. It makes things more consistent across platforms
and avoids surprising dependencies on the global setting.

We'll have to be careful that each call site is either OK with C, or
that it gets changed to an _l() variant. We also have to be careful
about extensions.

Regards,
Jeff Davis

#25

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Thomas Munro (#3)

Re: Remaining dependency on setlocale()

On Wed, Aug 7, 2024 at 7:07 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Aug 7, 2024 at 10:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Davis <pgsql@j-davis.com> writes:

2. I don't see a good way to canonicalize a locale name, like in
check_locale(), which uses the result of setlocale().

What I can tell you about that is that check_locale's expectation
that setlocale does any useful canonicalization is mostly wishful
thinking [1]. On a lot of platforms you just get the input string
back again. If that's the only thing keeping us on setlocale,
I think we could drop it. (Perhaps we should do some canonicalization
of our own instead?)

+1

I know it does something on Windows (we know the EDB installer gives
it strings like "Language,Country" and it converts them to
"Language_Country.Encoding", see various threads about it all going
wrong), but I'm not sure it does anything we actually want to
encourage. I'm hoping we can gradually screw it down so that we only
have sane BCP 47 in the system on that OS, and I don't see why we
wouldn't just use them verbatim.

Some more thoughts on check_locale() and canonicalisation:

I doubt the canonicalisation does anything useful on any Unix system,
as they're basically just file names. In the case of glibc, the
encoding part is munged before opening the file so it tolerates .utf8
or .UTF-8 or .u---T----f------8 on input, but it still returns
whatever you gave it so the return value isn't cleaning the input or
anything.

"" is a problem however... the special value for "native environment"
is returned as a real locale name, which we probably still need in
places. We could change that to newlocale("") + query instead, but
there is a portability pipeline problem getting the name out of it:

1. POSIX only just added getlocalename_l() in 2024[1]https://pubs.opengroup.org/onlinepubs/9799919799/functions/getlocalename_l.html[2]https://www.austingroupbugs.net/view.php?id=1220.
2. Glibc has non-standard nl_langinfo_l(NL_LOCALE_NAME(category), loc).
3. The <xlocale.h> systems (macOS/*BSD) have non-standard
querylocale(mask, loc).
4. AFAIK there is no way to do it on pure POSIX 2008 systems.
5. For Windows, there is a completely different thing to get the
user's default locale, see CF#3772.

The systems in category 4 would in practice be Solaris and (if it
comes back) AIX. Given that, we probably just can't go that way soon.

So I think the solution could perhaps be something like: in some early
startup phase before there are any threads, we nail down all the
locale categories to "C" (or whatever we decide on for the permanent
global locale), and also query the "" categories and make a copy of
them in case anyone wants them later, and then never call setlocale()
again.

[1]: https://pubs.opengroup.org/onlinepubs/9799919799/functions/getlocalename_l.html
[2]: https://www.austingroupbugs.net/view.php?id=1220

#26

Jeff Davis

pgsql@j-davis.com

over 1 year ago

In reply to: Thomas Munro (#25)

Re: Remaining dependency on setlocale()

On Thu, 2024-08-15 at 10:43 +1200, Thomas Munro wrote:

So I think the solution could perhaps be something like: in some
early
startup phase before there are any threads, we nail down all the
locale categories to "C" (or whatever we decide on for the permanent
global locale), and also query the "" categories and make a copy of
them in case anyone wants them later, and then never call setlocale()
again.

+1.

Regards,
Jeff Davis

#27

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Jeff Davis (#26)

Re: Remaining dependency on setlocale()

On Thu, Aug 15, 2024 at 11:00 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2024-08-15 at 10:43 +1200, Thomas Munro wrote:

So I think the solution could perhaps be something like: in some
early
startup phase before there are any threads, we nail down all the
locale categories to "C" (or whatever we decide on for the permanent
global locale), and also query the "" categories and make a copy of
them in case anyone wants them later, and then never call setlocale()
again.

+1.

We currently nail down these categories:

/* We keep these set to "C" always. See pg_locale.c for explanation. */
init_locale("LC_MONETARY", LC_MONETARY, "C");
init_locale("LC_NUMERIC", LC_NUMERIC, "C");
init_locale("LC_TIME", LC_TIME, "C");

CF #5170 has patches to make it so that we stop changing them even
transiently, using locale_t interfaces to feed our caches of stuff
needed to work with those categories, so they really stay truly nailed
down.

It sounds like someone needs to investigate doing the same thing for
these two, from CheckMyDatabase():

if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
ereport(FATAL,
(errmsg("database locale is incompatible with
operating system"),
errdetail("The database was initialized with
LC_COLLATE \"%s\", "
" which is not recognized by setlocale().", collate),
errhint("Recreate the database with another locale or
install the missing locale.")));

if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
ereport(FATAL,
(errmsg("database locale is incompatible with
operating system"),
errdetail("The database was initialized with LC_CTYPE \"%s\", "
" which is not recognized by setlocale().", ctype),
errhint("Recreate the database with another locale or
install the missing locale.")));

How should that work? Maybe we could imagine something like
MyDatabaseLocale, a locale_t with LC_COLLATE and LC_CTYPE categories
set appropriately. Or should it be a pg_locale_t instead (if your
database default provider is ICU, then you don't even need a locale_t,
right?).

Then I think there is one quite gnarly category, from
assign_locale_messages() (a GUC assignment function):

(void) pg_perm_setlocale(LC_MESSAGES, newval);

I have never really studied gettext(), but I know it was just
standardised in POSIX 2024, and the standardised interface has _l()
variants of all functions. Current implementations don't have them
yet. Clearly we absolutely couldn't call pg_perm_setlocale() after
early startup -- but if gettext() is relying on the current locale to
affect far away code, then maybe this is one place where we'd just
have to use uselocale(). Perhaps we could plan some transitional
strategy where NetBSD users lose the ability to change the GUC without
restarting the server and it has to be the same for all sessions, or
something like that, until they produce either gettext_l() or
uselocale(), but I haven't thought hard about this part at all yet...

#28

Peter Eisentraut

peter@eisentraut.org

over 1 year ago

In reply to: Thomas Munro (#25)

Re: Remaining dependency on setlocale()

On 15.08.24 00:43, Thomas Munro wrote:

"" is a problem however... the special value for "native environment"
is returned as a real locale name, which we probably still need in
places. We could change that to newlocale("") + query instead, but

Where do we need that in the server?

It should just be initdb doing that and then initializing the server
with concrete values based on that.

I guess technically some of these GUC settings default to the
environment? But I think we could consider getting rid of that.

#29

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Peter Eisentraut (#28)

Re: Remaining dependency on setlocale()

On Fri, Aug 16, 2024 at 1:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:

On 15.08.24 00:43, Thomas Munro wrote:

"" is a problem however... the special value for "native environment"
is returned as a real locale name, which we probably still need in
places. We could change that to newlocale("") + query instead, but

Where do we need that in the server?

Hmm. Yeah, right, the only way I've found so far to even reach that
code and that captures that result is:

create database db2 locale = '';

Thats puts 'en_NZ.UTF-8' or whatever in pg_database. In contrast,
create collation will accept '' but just store it verbatim, and the
GUCs for changing time, monetary, numeric accept it too and keep it
verbatim. We could simply ban '' in all user commands. I doubt
they're documented as acceptable values, once you get past initdb and
have a running system. Looking into that...

It should just be initdb doing that and then initializing the server
with concrete values based on that.

Right.

I guess technically some of these GUC settings default to the
environment? But I think we could consider getting rid of that.

Yeah.

#30

Thomas Munro

thomas.munro@gmail.com

over 1 year ago

In reply to: Thomas Munro (#29)

Re: Remaining dependency on setlocale()

On Fri, Aug 16, 2024 at 9:09 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Fri, Aug 16, 2024 at 1:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:

It should just be initdb doing that and then initializing the server
with concrete values based on that.

Right.

I guess technically some of these GUC settings default to the
environment? But I think we could consider getting rid of that.

Yeah.

Seems to make a lot of sense. I tried that out over in CF #5170.

(In case it's not clear why I'm splitting discussion between threads:
I was thinking of this thread as the one where we're discussing what
needs to be done, with other threads being spun off to become CF entry
with concrete patches. I realised re-reading some discussion that
that might not be obvious...)

#31

Andreas Karlsson

andreas@proxel.se

over 1 year ago

In reply to: Jeff Davis (#13)

Re: Remaining dependency on setlocale()

On 8/9/24 8:24 PM, Jeff Davis wrote:

On Fri, 2024-08-09 at 13:41 +0200, Andreas Karlsson wrote:

I am leaning towards that we should write our own pure ascii
functions
for this.

That makes sense for a lot of call sites, but it could cause breakage
if we aren't careful.

Since we do not support any non-ascii compatible encodings
anyway I do not see the point in having locale support in most of
these
call-sites.

An ascii-compatible encoding just means that the code points in the
ascii range are represented as ascii. I'm not clear on whether code
points in the ascii range can return different results for things like
isspace(), but it sounds plausible -- toupper() can return different
results for 'i' in tr_TR.

Also, what about the values outside 128-255, which are still valid
input to isspace()?

My idea was that in a lot of those cases we only try to parse e.g. 0-9
as digits and always only . as the decimal separator so we should make
just make that obvious by either using locale C or writing our own ascii
only functions. These strings are meant to be read by machines, not
humans, primarily.

Andreas

#32

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Jeff Davis (#24)

Re: Remaining dependency on setlocale()

On Wed, 2024-08-14 at 12:00 -0700, Jeff Davis wrote:

On Wed, 2024-08-14 at 14:31 +1200, Thomas Munro wrote:

1. The process global locale is always "C". If you ever call
uselocale(), it can only be for short stretches, and you have to
restore it straight after; perhaps it is only ever used in
replacement
_l() functions for systems that lack them. You need to use _l()
functions for all non-"C" locales. The current database default
needs
to be available as a variable (in future: thread-local variable, or
reachable from one), so you can use it in _l() functions. The "C"
locale can be accessed implicitly with non-l() functions, or you
could
ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE)
for
"C". Or a name like PG_C_LOCALE, which, in backend code could be
just
LC_GLOBAL_LOCALE, while in frontend/library code it could be the
singleton mechanism I showed in CF#5166.

+1 to this approach. It makes things more consistent across platforms
and avoids surprising dependencies on the global setting.

We'll have to be careful that each call site is either OK with C, or
that it gets changed to an _l() variant. We also have to be careful
about extensions.

Did we reach a conclusion here? Any thoughts on moving in this
direction, and whether 18 is the right time to do it?

Regards,
Jeff Davis

#33

Thomas Munro

thomas.munro@gmail.com

about 1 year ago

In reply to: Jeff Davis (#32)

Re: Remaining dependency on setlocale()

On Fri, Dec 13, 2024 at 8:22 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Wed, 2024-08-14 at 12:00 -0700, Jeff Davis wrote:

On Wed, 2024-08-14 at 14:31 +1200, Thomas Munro wrote:

1. The process global locale is always "C". If you ever call
uselocale(), it can only be for short stretches, and you have to
restore it straight after; perhaps it is only ever used in
replacement
_l() functions for systems that lack them. You need to use _l()
functions for all non-"C" locales. The current database default
needs
to be available as a variable (in future: thread-local variable, or
reachable from one), so you can use it in _l() functions. The "C"
locale can be accessed implicitly with non-l() functions, or you
could
ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE)
for
"C". Or a name like PG_C_LOCALE, which, in backend code could be
just
LC_GLOBAL_LOCALE, while in frontend/library code it could be the
singleton mechanism I showed in CF#5166.

+1 to this approach. It makes things more consistent across platforms
and avoids surprising dependencies on the global setting.

We'll have to be careful that each call site is either OK with C, or
that it gets changed to an _l() variant. We also have to be careful
about extensions.

Did we reach a conclusion here? Any thoughts on moving in this
direction, and whether 18 is the right time to do it?

I think this is the best way, and I haven't seen anyone supporting any
other idea. (I'm working on those setlocale()-removal patches I
mentioned, more very soon...)

#34

Peter Eisentraut

peter@eisentraut.org

about 1 year ago

In reply to: Thomas Munro (#33)

Re: Remaining dependency on setlocale()

On 13.12.24 10:44, Thomas Munro wrote:

On Fri, Dec 13, 2024 at 8:22 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Wed, 2024-08-14 at 12:00 -0700, Jeff Davis wrote:

On Wed, 2024-08-14 at 14:31 +1200, Thomas Munro wrote:

1. The process global locale is always "C". If you ever call
uselocale(), it can only be for short stretches, and you have to
restore it straight after; perhaps it is only ever used in
replacement
_l() functions for systems that lack them. You need to use _l()
functions for all non-"C" locales. The current database default
needs
to be available as a variable (in future: thread-local variable, or
reachable from one), so you can use it in _l() functions. The "C"
locale can be accessed implicitly with non-l() functions, or you
could
ban those to reduce confusion and use foo_l(..., LC_GLOBAL_LOCALE)
for
"C". Or a name like PG_C_LOCALE, which, in backend code could be
just
LC_GLOBAL_LOCALE, while in frontend/library code it could be the
singleton mechanism I showed in CF#5166.

+1 to this approach. It makes things more consistent across platforms
and avoids surprising dependencies on the global setting.

We'll have to be careful that each call site is either OK with C, or
that it gets changed to an _l() variant. We also have to be careful
about extensions.

Did we reach a conclusion here? Any thoughts on moving in this
direction, and whether 18 is the right time to do it?

I think this is the best way, and I haven't seen anyone supporting any
other idea. (I'm working on those setlocale()-removal patches I
mentioned, more very soon...)

I also think this is the right direction, and we'll get closer with the
remaining patches that Thomas has lined up.

I think at this point, we could already remove all locale settings
related to LC_COLLATE. Nothing uses that anymore.

I think we will need to keep the global LC_CTYPE setting set to
something useful, for example so that system error messages come out in
the right encoding.

But I'm concerned about the the Perl_setlocale() dance in plperl.c.
Perl apparently does a setlocale(LC_ALL, "") during startup, and that
code is a workaround to reset everything back afterwards. We need to be
careful not to break that.

(Perl has fixed that in 5.19, but the fix requires that you set another
environment variable before launching Perl, which you can't do in a
threaded system, so we'd probably need another fix eventually. See
<https://github.com/Perl/perl5/issues/8274>.)

#35

Jeff Davis

pgsql@j-davis.com

about 1 year ago

In reply to: Peter Eisentraut (#34)

Re: Remaining dependency on setlocale()

On Tue, 2024-12-17 at 13:14 +0100, Peter Eisentraut wrote:

I think we will need to keep the global LC_CTYPE setting set to
something useful, for example so that system error messages come out
in
the right encoding.

Do we need to rely on the global LC_CTYPE setting? We already use
bind_textdomain_codeset().

But I'm concerned about the the Perl_setlocale() dance in plperl.c.
Perl apparently does a setlocale(LC_ALL, "") during startup, and that
code is a workaround to reset everything back afterwards. We need to
be
careful not to break that.

(Perl has fixed that in 5.19, but the fix requires that you set
another
environment variable before launching Perl, which you can't do in a
threaded system, so we'd probably need another fix eventually. See
<https://github.com/Perl/perl5/issues/8274>.)

I don't fully understand that issue, but I would think the direction we
are going (keeping the global LC_CTYPE more consistent and relying on
it less) would make the problem better.

Regards,
Jeff Davis

#36

Peter Eisentraut

peter@eisentraut.org

about 1 year ago

In reply to: Jeff Davis (#35)

Re: Remaining dependency on setlocale()

On 17.12.24 19:10, Jeff Davis wrote:

On Tue, 2024-12-17 at 13:14 +0100, Peter Eisentraut wrote:

I think we will need to keep the global LC_CTYPE setting set to
something useful, for example so that system error messages come out
in
the right encoding.

Do we need to rely on the global LC_CTYPE setting? We already use
bind_textdomain_codeset().

I don't think that would cover messages from the C library (strerror,
dlerror, etc.).

But I'm concerned about the the Perl_setlocale() dance in plperl.c.
Perl apparently does a setlocale(LC_ALL, "") during startup, and that
code is a workaround to reset everything back afterwards. We need to
be
careful not to break that.

(Perl has fixed that in 5.19, but the fix requires that you set
another
environment variable before launching Perl, which you can't do in a
threaded system, so we'd probably need another fix eventually. See
<https://github.com/Perl/perl5/issues/8274>.)

I don't fully understand that issue, but I would think the direction we
are going (keeping the global LC_CTYPE more consistent and relying on
it less) would make the problem better.

Yes, I think it's the right direction, but we need to figure this issue
out eventually.

#37

Jeff Davis

pgsql@j-davis.com

7 months ago

In reply to: Peter Eisentraut (#34)

1 attachment(s)

Re: Remaining dependency on setlocale()

On Tue, 2024-12-17 at 13:14 +0100, Peter Eisentraut wrote:

+1 to this approach. It makes things more consistent across
platforms
and avoids surprising dependencies on the global setting.

I think this is the best way, and I haven't seen anyone supporting
any
other idea. (I'm working on those setlocale()-removal patches I
mentioned, more very soon...)

I also think this is the right direction, and we'll get closer with
the
remaining patches that Thomas has lined up.

I think at this point, we could already remove all locale settings
related to LC_COLLATE. Nothing uses that anymore.

I think we will need to keep the global LC_CTYPE setting set to
something useful, for example so that system error messages come out
in
the right encoding.

But I'm concerned about the the Perl_setlocale() dance in plperl.c.
Perl apparently does a setlocale(LC_ALL, "") during startup, and that
code is a workaround to reset everything back afterwards. We need to
be
careful not to break that.

(Perl has fixed that in 5.19, but the fix requires that you set
another
environment variable before launching Perl, which you can't do in a
threaded system, so we'd probably need another fix eventually. See
<https://github.com/Perl/perl5/issues/8274>.)

To continue this thread, I did a symbol search in the meson build
directory like (patterns.txt attached):

for f in `find . -name *.o`; do
if ( nm --format=just-symbols $f | \
grep -xE -f /tmp/patterns.txt > /dev/null ); then
echo $f; fi; done

and it output:

./contrib/fuzzystrmatch/fuzzystrmatch.so.p/dmetaphone.c.o
./contrib/fuzzystrmatch/fuzzystrmatch.so.p/fuzzystrmatch.c.o
./contrib/isn/isn.so.p/isn.c.o
./contrib/spi/refint.so.p/refint.c.o
./contrib/ltree/ltree.so.p/crc32.c.o
./src/backend/postgres_lib.a.p/commands_copyfromparse.c.o
./src/backend/postgres_lib.a.p/utils_adt_pg_locale_libc.c.o
./src/backend/postgres_lib.a.p/tsearch_wparser_def.c.o
./src/backend/postgres_lib.a.p/parser_scansup.c.o
./src/backend/postgres_lib.a.p/utils_adt_inet_net_pton.c.o
./src/backend/postgres_lib.a.p/tsearch_ts_locale.c.o
./src/bin/psql/psql.p/meson-generated_.._tab-complete.c.o
./src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o
./src/interfaces/ecpg/compatlib/libecpg_compat.so.3.18.p/informix.c.o
./src/interfaces/ecpg/compatlib/libecpg_compat.a.p/informix.c.o
./src/port/libpgport_srv.a.p/pgstrcasecmp.c.o
./src/port/libpgport_shlib.a.p/pgstrcasecmp.c.o
./src/port/libpgport.a.p/pgstrcasecmp.c.o

Not a short list, but not a long list, either, so seems tractable. Note
that this misses things like isdigit() which is inlined.

A few observations while spot-checking these files:

---------------------
pgstrcasecmp.c - has code like:

else if (IS_HIGHBIT_SET(ch) && islower(ch))
ch = toupper(ch);

and comments like "Note however that the whole thing is a bit bogus for
multibyte character sets."

Most of the callers are directly comparing with ascii literals, so I'm
not sure what the point is. There are probably some more interesting
callers hidden in there.
----------------------
pg_locale_libc.c -

char2wchar and wchar2char use mbstowcs and wcstombs when the input
locale is NULL. The main culprit seems to be full text search, which
has a bunch of /* TODO */ comments. Another caller is
get_iso_localename().

There are also a couple false positives where mbstowcs_l/wcstombs_l are
emulated with uselocale() and mbstowcs/wcstombs. In that case, it's not
actually sensitive to the global setting.
-----------------------
copyfromparse.c - the input is ASCII so it can use pg_ascii_tolower()
instead of tolower()
-----------------------

Regards,
Jeff Davis

#38

Jeff Davis

pgsql@j-davis.com

7 months ago

In reply to: Jeff Davis (#37)

8 attachment(s)

Re: Remaining dependency on setlocale()

On Thu, 2025-06-05 at 22:15 -0700, Jeff Davis wrote:

To continue this thread, I did a symbol search in the meson build
directory like (patterns.txt attached):

Attached a rough patch series which does what everyone seemed to agree
on:

* Change some trivial ASCII cases to use pg_ascii_* variants
* Set LC_COLLATE and LC_CTYPE to C with pg_perm_setlocale
* Introduce a new global_lc_ctype for callers that still need to use
operations that depend on datctype

There should be no behavior changes in this series.

Benefits:

* a tiny step toward multithreading, because connections to different
databases no longer require different setlocale() settings
* easier to identify dependence on datctype, because callers will
need to refer to global_lc_ctype.
* harder to accidentally depend on datctype or datcollate

Ideally, when the database locale provider is not libc, the user
shouldn't need to even think about a valid LC_CTYPE locale at all. But
that requires more work, and potentially risk of breakage.

Regards,
Jeff Davis

Attachments:

v1-0001-copyfromparse.c-use-pg_ascii_tolower-rather-than-.patchtext/x-patch; charset=UTF-8; name=v1-0001-copyfromparse.c-use-pg_ascii_tolower-rather-than-.patchDownload

From 21f5cc0bca48ef8d2fdc746385e3afda575fbd9e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 09:50:53 -0700
Subject: [PATCH v1 1/8] copyfromparse.c: use pg_ascii_tolower() rather than
 tolower().

---
 src/backend/commands/copyfromparse.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..f52f2477df1 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1538,7 +1538,7 @@ GetDecimalFromHex(char hex)
 	if (isdigit((unsigned char) hex))
 		return hex - '0';
 	else
-		return tolower((unsigned char) hex) - 'a' + 10;
+		return pg_ascii_tolower((unsigned char) hex) - 'a' + 10;
 }
 
 /*
-- 
2.43.0

v1-0002-contrib-spi-refint.c-use-pg_ascii_tolower-instead.patchtext/x-patch; charset=UTF-8; name=v1-0002-contrib-spi-refint.c-use-pg_ascii_tolower-instead.patchDownload

From 5b60fbc629f6466c435021b8d3997b5763268ed1 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 09:54:40 -0700
Subject: [PATCH v1 2/8] contrib/spi/refint.c: use pg_ascii_tolower() instead.

---
 contrib/spi/refint.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/spi/refint.c b/contrib/spi/refint.c
index d5e25e07ae9..89898cad7b0 100644
--- a/contrib/spi/refint.c
+++ b/contrib/spi/refint.c
@@ -321,7 +321,7 @@ check_foreign_key(PG_FUNCTION_ARGS)
 	if (nrefs < 1)
 		/* internal error */
 		elog(ERROR, "check_foreign_key: %d (< 1) number of references specified", nrefs);
-	action = tolower((unsigned char) *(args[1]));
+	action = pg_ascii_tolower((unsigned char) *(args[1]));
 	if (action != 'r' && action != 'c' && action != 's')
 		/* internal error */
 		elog(ERROR, "check_foreign_key: invalid action %s", args[1]);
-- 
2.43.0

v1-0003-isn.c-use-pg_ascii_toupper-instead-of-toupper.patchtext/x-patch; charset=UTF-8; name=v1-0003-isn.c-use-pg_ascii_toupper-instead-of-toupper.patchDownload

From 8fcbfb8d5a5e38d6bb2e0d21486a81cebfa45721 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 09:58:24 -0700
Subject: [PATCH v1 3/8] isn.c: use pg_ascii_toupper() instead of toupper().

---
 contrib/isn/isn.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/isn/isn.c b/contrib/isn/isn.c
index 038c8ed4db7..1880c91844e 100644
--- a/contrib/isn/isn.c
+++ b/contrib/isn/isn.c
@@ -726,7 +726,7 @@ string2ean(const char *str, struct Node *escontext, ean13 *result,
 			if (type != INVALID)
 				goto eaninvalid;
 			type = ISSN;
-			*aux1++ = toupper((unsigned char) *aux2);
+			*aux1++ = pg_ascii_toupper((unsigned char) *aux2);
 			length++;
 		}
 		else if (length == 9 && (digit || *aux2 == 'X' || *aux2 == 'x') && last)
@@ -736,7 +736,7 @@ string2ean(const char *str, struct Node *escontext, ean13 *result,
 				goto eaninvalid;
 			if (type == INVALID)
 				type = ISBN;	/* ISMN must start with 'M' */
-			*aux1++ = toupper((unsigned char) *aux2);
+			*aux1++ = pg_ascii_toupper((unsigned char) *aux2);
 			length++;
 		}
 		else if (length == 11 && digit && last)
-- 
2.43.0

v1-0004-inet_net_pton.c-use-pg_ascii_tolower-rather-than-.patchtext/x-patch; charset=UTF-8; name=v1-0004-inet_net_pton.c-use-pg_ascii_tolower-rather-than-.patchDownload

From d3bd2f24a9c450cbb2ff2f88f8efd4ba7c455be0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 10:06:01 -0700
Subject: [PATCH v1 4/8] inet_net_pton.c: use pg_ascii_tolower() rather than
 tolower().

---
 src/backend/utils/adt/inet_net_pton.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/inet_net_pton.c b/src/backend/utils/adt/inet_net_pton.c
index ef2236d9f04..3b0db2a3799 100644
--- a/src/backend/utils/adt/inet_net_pton.c
+++ b/src/backend/utils/adt/inet_net_pton.c
@@ -115,8 +115,7 @@ inet_cidr_pton_ipv4(const char *src, u_char *dst, size_t size)
 		src++;					/* skip x or X. */
 		while ((ch = *src++) != '\0' && isxdigit((unsigned char) ch))
 		{
-			if (isupper((unsigned char) ch))
-				ch = tolower((unsigned char) ch);
+			ch = pg_ascii_tolower((unsigned char) ch);
 			n = strchr(xdigits, ch) - xdigits;
 			assert(n >= 0 && n <= 15);
 			if (dirty == 0)
-- 
2.43.0

v1-0005-Add-global_lc_ctype-to-hold-locale_t-for-datctype.patchtext/x-patch; charset=UTF-8; name=v1-0005-Add-global_lc_ctype-to-hold-locale_t-for-datctype.patchDownload

From 60e659df02c33b14ee232c578d8dfdbfe1fb6dc6 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 14:13:16 -0700
Subject: [PATCH v1 5/8] Add global_lc_ctype to hold locale_t for datctype.

Callers of locale-aware ctype operations should use the "_l" variants
of the functions and pass global_lc_ctype for the locale. Doing so
avoids depending on setlocale().
---
 src/backend/utils/adt/pg_locale_libc.c | 32 ++++++++++++++++++++++++--
 src/backend/utils/init/postinit.c      |  2 ++
 src/include/utils/pg_locale.h          |  7 ++++++
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 199857e22db..a45fb4df38c 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -85,6 +85,12 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+/*
+ * Represents datctype in a global variable, so that we don't need to rely on
+ * setlocale().
+ */
+locale_t	global_lc_ctype = NULL;
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -417,6 +423,28 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_size;
 }
 
+void
+init_global_lc_ctype(const char *ctype)
+{
+	locale_t	loc;
+
+	errno = 0;
+#ifndef WIN32
+	loc = newlocale(LC_CTYPE_MASK, ctype, NULL);
+#else
+	loc = _create_locale(LC_ALL, ctype);
+#endif
+
+	if (!loc)
+		ereport(FATAL,
+				(errmsg("database locale is incompatible with operating system"),
+				 errdetail("The database was initialized with LC_CTYPE \"%s\", "
+						   " which is not recognized by setlocale().", ctype),
+				 errhint("Recreate the database with another locale or install the missing locale.")));
+
+	global_lc_ctype = loc;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
@@ -912,7 +940,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	if (locale == (pg_locale_t) 0)
 	{
 		/* Use wcstombs directly for the default locale */
-		result = wcstombs(to, from, tolen);
+		result = wcstombs_l(to, from, tolen, global_lc_ctype);
 	}
 	else
 	{
@@ -972,7 +1000,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		if (locale == (pg_locale_t) 0)
 		{
 			/* Use mbstowcs directly for the default locale */
-			result = mbstowcs(to, str, tolen);
+			result = mbstowcs_l(to, str, tolen, global_lc_ctype);
 		}
 		else
 		{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c86ceefda94..3eaa1486f6f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -431,6 +431,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	init_global_lc_ctype(ctype);
+
 	if (strcmp(ctype, "C") == 0 ||
 		strcmp(ctype, "POSIX") == 0)
 		database_ctype_is_c = true;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 7b8cbf58d2c..7fdf420dd7a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -32,6 +32,12 @@ extern PGDLLIMPORT char *localized_full_days[];
 extern PGDLLIMPORT char *localized_abbrev_months[];
 extern PGDLLIMPORT char *localized_full_months[];
 
+/*
+ * Represents datctype in a global variable, so that we don't need to rely on
+ * setlocale().
+ */
+extern PGDLLIMPORT locale_t global_lc_ctype;
+
 /* is the databases's LC_CTYPE the C locale? */
 extern PGDLLIMPORT bool database_ctype_is_c;
 
@@ -121,6 +127,7 @@ struct pg_locale_struct
 	}			info;
 };
 
+extern void init_global_lc_ctype(const char *ctype);
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
-- 
2.43.0

v1-0006-Use-global_lc_ctype-for-callers-of-locale-aware-f.patchtext/x-patch; charset=UTF-8; name=v1-0006-Use-global_lc_ctype-for-callers-of-locale-aware-f.patchDownload

From c4eaa408a045d0ff9baaa44571469be0ffabf2f2 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 14:17:22 -0700
Subject: [PATCH v1 6/8] Use global_lc_ctype for callers of locale-aware
 functions.

Rather than relying on setlocale() to set the right LC_CTYPE, use
global_lc_ctype explicitly to refer to datctype.
---
 contrib/fuzzystrmatch/dmetaphone.c    |  3 ++-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 19 +++++++++++--------
 contrib/ltree/crc32.c                 |  2 +-
 src/backend/parser/scansup.c          |  3 ++-
 src/backend/tsearch/ts_locale.c       |  4 ++--
 src/port/pgstrcasecmp.c               | 20 ++++++++++++++------
 6 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..07d8781cd2a 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 
 /* turn off assertions for embedded function */
 #define NDEBUG
@@ -284,7 +285,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = toupper_l((unsigned char) *i, global_lc_ctype);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..b619178a1f6 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -41,6 +41,7 @@
 #include <ctype.h>
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -56,13 +57,15 @@ static void _soundex(const char *instr, char *outstr);
 
 #define SOUNDEX_LEN 4
 
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_lc_ctype)
+
 /*									ABCDEFGHIJKLMNOPQRSTUVWXYZ */
 static const char *const soundex_table = "01230120022455012623010202";
 
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = TOUPPER((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +127,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = TOUPPER((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +304,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (TOUPPER((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (TOUPPER((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? TOUPPER((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? TOUPPER((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) TOUPPER((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +745,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) TOUPPER((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..2ea7c8a5ec0 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -12,7 +12,7 @@
 
 #ifdef LOWER_NODE
 #include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
+#define TOLOWER(x)	tolower_l((unsigned char) (x), global_lc_ctype)
 #else
 #define TOLOWER(x)	(x)
 #endif
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..98c1f30d04f 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -68,7 +69,7 @@ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 		if (ch >= 'A' && ch <= 'Z')
 			ch += 'a' - 'A';
 		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
+			ch = tolower_l(ch, global_lc_ctype);
 		result[i] = (char) ch;
 	}
 	result[i] = '\0';
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index b77d8c23d36..51ba3b41813 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -43,7 +43,7 @@ t_isalpha(const char *ptr)
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalpha((wint_t) character[0]);
+	return iswalpha_l((wint_t) character[0], global_lc_ctype);
 }
 
 int
@@ -58,7 +58,7 @@ t_isalnum(const char *ptr)
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalnum((wint_t) character[0]);
+	return iswalnum_l((wint_t) character[0], global_lc_ctype);
 }
 
 
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..f6dc6b0ff3b 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -28,6 +28,14 @@
 
 #include <ctype.h>
 
+#ifndef FRONTEND
+extern PGDLLIMPORT locale_t global_lc_ctype;
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_lc_ctype)
+#define TOLOWER(x) tolower_l((unsigned char) (x), global_lc_ctype)
+#else
+#define TOUPPER(x) toupper(x)
+#define TOLOWER(x) tolower(x)
+#endif
 
 /*
  * Case-independent comparison of two null-terminated strings.
@@ -45,12 +53,12 @@ pg_strcasecmp(const char *s1, const char *s2)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -78,12 +86,12 @@ pg_strncasecmp(const char *s1, const char *s2, size_t n)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -107,7 +115,7 @@ pg_toupper(unsigned char ch)
 	if (ch >= 'a' && ch <= 'z')
 		ch += 'A' - 'a';
 	else if (IS_HIGHBIT_SET(ch) && islower(ch))
-		ch = toupper(ch);
+		ch = TOUPPER(ch);
 	return ch;
 }
 
@@ -124,7 +132,7 @@ pg_tolower(unsigned char ch)
 	if (ch >= 'A' && ch <= 'Z')
 		ch += 'a' - 'A';
 	else if (IS_HIGHBIT_SET(ch) && isupper(ch))
-		ch = tolower(ch);
+		ch = TOLOWER(ch);
 	return ch;
 }
 
-- 
2.43.0

v1-0007-Fix-the-last-remaining-callers-relying-on-setloca.patchtext/x-patch; charset=UTF-8; name=v1-0007-Fix-the-last-remaining-callers-relying-on-setloca.patchDownload

From 0fb479743de60a59c9139d27d981881f337becb2 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 14:17:33 -0700
Subject: [PATCH v1 7/8] Fix the last remaining callers relying on setlocale().

---
 configure                         |  2 +-
 configure.ac                      |  2 ++
 meson.build                       |  2 ++
 src/backend/tsearch/wparser_def.c | 40 +++++++++++++++++++++++++++++--
 src/include/pg_config.h.in        |  6 +++++
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 4f15347cc95..2660c29e0d2 100755
--- a/configure
+++ b/configure
@@ -15616,7 +15616,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton iswxdigit_l isxdigit_l kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 4b8335dc613..2d16c5fd43f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1789,6 +1789,8 @@ AC_CHECK_FUNCS(m4_normalize([
 	getifaddrs
 	getpeerucred
 	inet_pton
+	iswxdigit_l
+	isxdigit_l
 	kqueue
 	localeconv_l
 	mbstowcs_l
diff --git a/meson.build b/meson.build
index d142e3e408b..0bd6f9f2076 100644
--- a/meson.build
+++ b/meson.build
@@ -2880,6 +2880,8 @@ func_checks = [
   ['getpeerucred'],
   ['inet_aton'],
   ['inet_pton'],
+  ['iswxdigit_l'],
+  ['isxdigit_l'],
   ['kqueue'],
   ['localeconv_l'],
   ['mbstowcs_l'],
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 79bcd32a063..0fe90b7ad8d 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -411,6 +411,40 @@ TParserCopyClose(TParser *prs)
 }
 
 
+#ifndef HAVE_ISXDIGIT_L
+static int
+isxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _isxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = isxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+#ifndef HAVE_ISWXDIGIT_L
+static int
+iswxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _iswxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = iswxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+
+
 /*
  * Character-type support functions, equivalent to is* macros, but
  * working with any possible encodings and locales. Notes:
@@ -436,9 +470,11 @@ p_is##type(TParser *prs)													\
 				return nonascii;											\
 			return is##type(c);												\
 		}																	\
-		return isw##type(*(prs->wstr + prs->state->poschar));				\
+		return isw##type##_l(*(prs->wstr + prs->state->poschar),			\
+							 global_lc_ctype);								\
 	}																		\
-	return is##type(*(unsigned char *) (prs->str + prs->state->posbyte));	\
+	return is##type##_l(*(unsigned char *) (prs->str + prs->state->posbyte),	\
+						global_lc_ctype);									\
 }																			\
 																			\
 static int																	\
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 726a7c1be1f..f06396c94f4 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -229,6 +229,12 @@
 /* Define to 1 if you have the global variable 'int timezone'. */
 #undef HAVE_INT_TIMEZONE
 
+/* Define to 1 if you have the `iswxdigit_l' function. */
+#undef HAVE_ISWXDIGIT_L
+
+/* Define to 1 if you have the `isxdigit_l' function. */
+#undef HAVE_ISXDIGIT_L
+
 /* Define to 1 if __builtin_constant_p(x) implies "i"(x) acceptance. */
 #undef HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P
 
-- 
2.43.0

v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patchtext/x-patch; charset=UTF-8; name=v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patchDownload

From 9dc49121a391ccdefa19904329171e1fbe9a8a3d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 14:14:22 -0700
Subject: [PATCH v1 8/8] Set process LC_COLLATE=C and LC_CTYPE=C.

Now that locale-aware functions use global_lc_locale rather than
relying on setlocale(), set LC_COLLATE and LC_CTYPE to C for
consistency.
---
 src/backend/utils/init/postinit.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 3eaa1486f6f..9841a33689a 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -417,19 +417,17 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
 	ctype = TextDatumGetCString(datum);
 
-	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
-		ereport(FATAL,
-				(errmsg("database locale is incompatible with operating system"),
-				 errdetail("The database was initialized with LC_COLLATE \"%s\", "
-						   " which is not recognized by setlocale().", collate),
-				 errhint("Recreate the database with another locale or install the missing locale.")));
-
-	if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
-		ereport(FATAL,
-				(errmsg("database locale is incompatible with operating system"),
-				 errdetail("The database was initialized with LC_CTYPE \"%s\", "
-						   " which is not recognized by setlocale().", ctype),
-				 errhint("Recreate the database with another locale or install the missing locale.")));
+	/*
+	 * Set LC_COLLATE and LC_CTYPE both to "C" for consistency.
+	 *
+	 * Historically, these were set to datcollate and datctype, respectively,
+	 * but that made it too easy to depend on setlocale() at odd places
+	 * throughout the server.
+	 */
+	if (pg_perm_setlocale(LC_COLLATE, "C") == NULL)
+		elog(ERROR, "failure setting LC_COLLATE=\"C\"");
+	if (pg_perm_setlocale(LC_CTYPE, "C") == NULL)
+		elog(ERROR, "failure setting LC_CTYPE=\"C\"");
 
 	init_global_lc_ctype(ctype);
 
-- 
2.43.0

#39

Peter Eisentraut

peter@eisentraut.org

7 months ago

In reply to: Jeff Davis (#38)

Re: Remaining dependency on setlocale()

On 07.06.25 00:23, Jeff Davis wrote:

On Thu, 2025-06-05 at 22:15 -0700, Jeff Davis wrote:

To continue this thread, I did a symbol search in the meson build
directory like (patterns.txt attached):

Attached a rough patch series which does what everyone seemed to agree
on:

* Change some trivial ASCII cases to use pg_ascii_* variants
* Set LC_COLLATE and LC_CTYPE to C with pg_perm_setlocale
* Introduce a new global_lc_ctype for callers that still need to use
operations that depend on datctype

v1-0001-copyfromparse.c-use-pg_ascii_tolower-rather-than-.patch
v1-0002-contrib-spi-refint.c-use-pg_ascii_tolower-instead.patch
v1-0003-isn.c-use-pg_ascii_toupper-instead-of-toupper.patch
v1-0004-inet_net_pton.c-use-pg_ascii_tolower-rather-than-.patch

These look good to me.

v1-0005-Add-global_lc_ctype-to-hold-locale_t-for-datctype.patch

This looks ok (but might depend on how patch 0006 turns out).

v1-0006-Use-global_lc_ctype-for-callers-of-locale-aware-f.patch

I think these need further individual analysis and explanation why these
should use the global lc_ctype setting. For example, you could argue
that the SQL-callable soundex(text) function should use the collation
object of its input value, not the global locale. But furthermore,
soundex_code() could actually just use pg_ascii_toupper() instead. And
in ts_locale.c, the isalnum_l() call should use mylocale that already
exists in that function. The problem to solve it getting a good value
into mylocale. Using the global setting confuses the issue a bit, I think.

v1-0007-Fix-the-last-remaining-callers-relying-on-setloca.patch

Do we have any data what platforms we'd need these checks for?

Also, if you look into wparser_def.c what p_isxdigit is used for, it's
used for parsing XML (presumably HTML) files, so we just need ASCII-only
behavior and no locale dependency.

v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch

As I mentioned earlier in the thread, I don't think we can do this for
LC_CTYPE, because otherwise system error messages would not come out in
the right encoding. For the LC_COLLATE settings, I think we could just
do the setting in main(), where the other non-database-specific locale
categories are set.

#40

Jeff Davis

pgsql@j-davis.com

7 months ago

In reply to: Peter Eisentraut (#39)

Re: Remaining dependency on setlocale()

On Tue, 2025-06-10 at 17:32 +0200, Peter Eisentraut wrote:

As I mentioned earlier in the thread, I don't think we can do this
for
LC_CTYPE, because otherwise system error messages would not come out
in
the right encoding.

Is there any way around that? If all we need is the right encoding, do
we need a proper locale?

Regards,
Jeff Davis

#41

Jeff Davis

pgsql@j-davis.com

7 months ago

In reply to: Peter Eisentraut (#39)

7 attachment(s)

Re: Remaining dependency on setlocale()

On Tue, 2025-06-10 at 17:32 +0200, Peter Eisentraut wrote:

v1-0001-copyfromparse.c-use-pg_ascii_tolower-rather-than-.patch
v1-0002-contrib-spi-refint.c-use-pg_ascii_tolower-instead.patch
v1-0003-isn.c-use-pg_ascii_toupper-instead-of-toupper.patch
v1-0004-inet_net_pton.c-use-pg_ascii_tolower-rather-than-.patch

These look good to me.

Committed. (That means they're in 18, which was not my intention, but
others seemed to think it was harmless enough, so I didn't revert. I
will wait for the branch before I commit any more of these.)

v1-0005-Add-global_lc_ctype-to-hold-locale_t-for-datctype.patch

This looks ok (but might depend on how patch 0006 turns out).

I changed this to a global_libc_locale that includes both LC_COLLATE
and LC_CTYPE (from datcollate and datctype), in case an extension is
relying on strcoll for some reason.

v1-0006-Use-global_lc_ctype-for-callers-of-locale-aware-f.patch

I think these need further individual analysis and explanation why
these
should use the global lc_ctype setting.

This patch series, at least so far, is designed to have zero behavior
changes. Anything with a potential for a behavior change should be a
separate commit, so that if we need to revert it, we can revert the
behavior change without reintroducing a setlocale() dependency.

For example, you could argue
that the SQL-callable soundex(text) function should use the collation
object of its input value, not the global locale.

That would be a behavior change.

But furthermore,
soundex_code() could actually just use pg_ascii_toupper() instead.

Is that a behavior change?

And
in ts_locale.c, the isalnum_l() call should use mylocale that already
exists in that function. The problem to solve it getting a good
value
into mylocale. Using the global setting confuses the issue a bit, I
think.

I reworked it to be less confusing by changing wchar2char/char2wchar to
take a locale_t instead of pg_locale_t. Hopefully it's an improvement.

In get_iso_localename(), there's a comment saying that it doesn't
matter which locale is used (because it's ASCII), but to use the "_l"
variants, we need to pick some locale. At that point it's not clear to
me that global_libc_locale will be set yet, so I used LC_C_LOCALE.

I'm not sure whether we can rely on LC_C_LOCALE being available, but it
passed in CI, and if it's not available somewhere it might be a good
idea to create it on those platforms anyway.

v1-0007-Fix-the-last-remaining-callers-relying-on-setloca.patch

Do we have any data what platforms we'd need these checks for?

https://cirrus-ci.com/build/5167600088383488

Looks like windows doesn't have iswxdigit_l or isxdigit_l.

Also, if you look into wparser_def.c what p_isxdigit is used for,
it's
used for parsing XML (presumably HTML) files, so we just need ASCII-
only
behavior and no locale dependency.

iswxdigit() does seem to be dependent on locale, so this could be a
subtle behavior change.

v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch

As I mentioned earlier in the thread, I don't think we can do this
for
LC_CTYPE, because otherwise system error messages would not come out
in
the right encoding.

Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE
set to datctype.

Unfortunately, as long as LC_CTYPE is set to a real locale, there's a
danger of accidentally depending on that setting. Can the encoding be
controlled with LC_MESSAGES instead of LC_CTYPE?

Do you have an example of how things can go wrong?

For the LC_COLLATE settings, I think we could just
do the setting in main(), where the other non-database-specific
locale
categories are set.

Done.

Regards,
Jeff Davis

Attachments:

v2-0001-Hold-datcollate-datctype-in-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v2-0001-Hold-datcollate-datctype-in-global_libc_locale.patchDownload

From 52a2be3ac85314212e0ce7949e1341e6a8560f7c Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 14:13:16 -0700
Subject: [PATCH v2 1/7] Hold datcollate/datctype in global_libc_locale.

Callers of locale-aware ctype operations should use the "_l" variants
of the functions and pass global_libc_locale for the locale. Doing so
avoids depending on setlocale().

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 src/backend/utils/adt/pg_locale_libc.c | 77 ++++++++++++++++++++++++++
 src/backend/utils/init/postinit.c      |  2 +
 src/include/utils/pg_locale.h          |  7 +++
 3 files changed, 86 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 199857e22db..d6eef885ce0 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -85,6 +85,12 @@ static size_t strupper_libc_mb(char *dest, size_t destsize,
 							   const char *src, ssize_t srclen,
 							   pg_locale_t locale);
 
+/*
+ * Represents datcollate and datctype locales in a global variable, so that we
+ * don't need to rely on setlocale() anywhere.
+ */
+locale_t	global_libc_locale = NULL;
+
 static const struct collate_methods collate_methods_libc = {
 	.strncoll = strncoll_libc,
 	.strnxfrm = strnxfrm_libc,
@@ -417,6 +423,77 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_size;
 }
 
+/*
+ * Initialize global locale for LC_COLLATE and LC_CTYPE from datcollate and
+ * datctype, respectively.
+ *
+ * NB: should be consistent with make_libc_collator(), except that it must
+ * create the locale even for "C" and "POSIX".
+ */
+void
+init_global_libc_locale(const char *collate, const char *ctype)
+{
+	locale_t	loc = 0;
+
+	if (strcmp(collate, ctype) == 0)
+	{
+		/* Normal case where they're the same */
+		errno = 0;
+#ifndef WIN32
+		loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, collate, NULL);
+#else
+		loc = _create_locale(LC_ALL, collate);
+#endif
+		if (!loc)
+			ereport(FATAL,
+					(errmsg("database locale is incompatible with operating system"),
+					 errdetail("The database was initialized with LC_COLLATE \"%s\", "
+							   " which is not recognized by setlocale().", collate),
+					 errhint("Recreate the database with another locale or install the missing locale.")));
+	}
+	else
+	{
+#ifndef WIN32
+		/* We need two newlocale() steps */
+		locale_t	loc1 = 0;
+
+		errno = 0;
+		loc1 = newlocale(LC_COLLATE_MASK, collate, NULL);
+		if (!loc1)
+			ereport(FATAL,
+					(errmsg("database locale is incompatible with operating system"),
+					 errdetail("The database was initialized with LC_COLLATE \"%s\", "
+							   " which is not recognized by setlocale().", collate),
+					 errhint("Recreate the database with another locale or install the missing locale.")));
+
+		errno = 0;
+		loc = newlocale(LC_CTYPE_MASK, ctype, loc1);
+		if (!loc)
+		{
+			if (loc1)
+				freelocale(loc1);
+			ereport(FATAL,
+					(errmsg("database locale is incompatible with operating system"),
+					 errdetail("The database was initialized with LC_CTYPE \"%s\", "
+							   " which is not recognized by setlocale().", ctype),
+					 errhint("Recreate the database with another locale or install the missing locale.")));
+		}
+#else
+
+		/*
+		 * XXX The _create_locale() API doesn't appear to support this. Could
+		 * perhaps be worked around by changing pg_locale_t to contain two
+		 * separate fields.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("collations with different collate and ctype values are not supported on this platform")));
+#endif
+	}
+
+	global_libc_locale = loc;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c86ceefda94..74f9df84fde 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -431,6 +431,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	init_global_libc_locale(collate, ctype);
+
 	if (strcmp(ctype, "C") == 0 ||
 		strcmp(ctype, "POSIX") == 0)
 		database_ctype_is_c = true;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 7b8cbf58d2c..3ea16e83ee1 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -32,6 +32,12 @@ extern PGDLLIMPORT char *localized_full_days[];
 extern PGDLLIMPORT char *localized_abbrev_months[];
 extern PGDLLIMPORT char *localized_full_months[];
 
+/*
+ * Represents datcollate and datctype locales in a global variable, so that we
+ * don't need to rely on setlocale() anywhere.
+ */
+extern PGDLLIMPORT locale_t global_libc_locale;
+
 /* is the databases's LC_CTYPE the C locale? */
 extern PGDLLIMPORT bool database_ctype_is_c;
 
@@ -121,6 +127,7 @@ struct pg_locale_struct
 	}			info;
 };
 
+extern void init_global_libc_locale(const char *collate, const char *ctype);
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
-- 
2.43.0

v2-0002-fuzzystrmatch-use-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v2-0002-fuzzystrmatch-use-global_libc_locale.patchDownload

From 5612969727eaab953c29c1e94324b9afc2bcca14 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:06:34 -0700
Subject: [PATCH v2 2/7] fuzzystrmatch: use global_libc_locale.

---
 contrib/fuzzystrmatch/dmetaphone.c    |  3 ++-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 19 +++++++++++--------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..8777c1f5c04 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 
 /* turn off assertions for embedded function */
 #define NDEBUG
@@ -284,7 +285,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = toupper_l((unsigned char) *i, global_libc_locale);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..103dd07220c 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -41,6 +41,7 @@
 #include <ctype.h>
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -56,13 +57,15 @@ static void _soundex(const char *instr, char *outstr);
 
 #define SOUNDEX_LEN 4
 
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_locale)
+
 /*									ABCDEFGHIJKLMNOPQRSTUVWXYZ */
 static const char *const soundex_table = "01230120022455012623010202";
 
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = TOUPPER((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +127,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = TOUPPER((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +304,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (TOUPPER((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (TOUPPER((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? TOUPPER((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? TOUPPER((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) TOUPPER((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +745,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) TOUPPER((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v2-0003-ltree-use-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v2-0003-ltree-use-global_libc_locale.patchDownload

From 5b25dcf2e75a05e4edff72a8378183f390970329 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:06:50 -0700
Subject: [PATCH v2 3/7] ltree: use global_libc_locale.

---
 contrib/ltree/crc32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..5f5c563471e 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -12,7 +12,7 @@
 
 #ifdef LOWER_NODE
 #include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
+#define TOLOWER(x)	tolower_l((unsigned char) (x), global_libc_locale)
 #else
 #define TOLOWER(x)	(x)
 #endif
-- 
2.43.0

v2-0004-Use-global_libc_locale-for-downcase_identifier-an.patchtext/x-patch; charset=UTF-8; name=v2-0004-Use-global_libc_locale-for-downcase_identifier-an.patchDownload

From bcb3392383c32d57c74f1fb334e6c3515598b6d2 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:07:01 -0700
Subject: [PATCH v2 4/7] Use global_libc_locale for downcase_identifier() and
 pg_strcasecmp().

---
 src/backend/parser/scansup.c |  3 ++-
 src/port/pgstrcasecmp.c      | 20 ++++++++++++++------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..d45bf275e42 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -68,7 +69,7 @@ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 		if (ch >= 'A' && ch <= 'Z')
 			ch += 'a' - 'A';
 		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
+			ch = tolower_l(ch, global_libc_locale);
 		result[i] = (char) ch;
 	}
 	result[i] = '\0';
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..812050598e7 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -28,6 +28,14 @@
 
 #include <ctype.h>
 
+#ifndef FRONTEND
+extern PGDLLIMPORT locale_t global_libc_locale;
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_locale)
+#define TOLOWER(x) tolower_l((unsigned char) (x), global_libc_locale)
+#else
+#define TOUPPER(x) toupper(x)
+#define TOLOWER(x) tolower(x)
+#endif
 
 /*
  * Case-independent comparison of two null-terminated strings.
@@ -45,12 +53,12 @@ pg_strcasecmp(const char *s1, const char *s2)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -78,12 +86,12 @@ pg_strncasecmp(const char *s1, const char *s2, size_t n)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -107,7 +115,7 @@ pg_toupper(unsigned char ch)
 	if (ch >= 'a' && ch <= 'z')
 		ch += 'A' - 'a';
 	else if (IS_HIGHBIT_SET(ch) && islower(ch))
-		ch = toupper(ch);
+		ch = TOUPPER(ch);
 	return ch;
 }
 
@@ -124,7 +132,7 @@ pg_tolower(unsigned char ch)
 	if (ch >= 'A' && ch <= 'Z')
 		ch += 'a' - 'A';
 	else if (IS_HIGHBIT_SET(ch) && isupper(ch))
-		ch = tolower(ch);
+		ch = TOLOWER(ch);
 	return ch;
 }
 
-- 
2.43.0

v2-0005-Change-wchar2char-and-char2wchar-to-accept-a-loca.patchtext/x-patch; charset=UTF-8; name=v2-0005-Change-wchar2char-and-char2wchar-to-accept-a-loca.patchDownload

From 7973eaa1bdb483b0c57e82b6be90a4e78c47e3af Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 11 Jun 2025 10:11:16 -0700
Subject: [PATCH v2 5/7] Change wchar2char() and char2wchar() to accept a
 locale_t.

These are libc-specific functions, so accepting a locale_t makes more
sense than accepting a pg_locale_t (which could use another provider).

Also, no longer accept NULL.
---
 src/backend/tsearch/ts_locale.c        |  4 +--
 src/backend/tsearch/wparser_def.c      |  2 +-
 src/backend/utils/adt/pg_locale.c      |  2 +-
 src/backend/utils/adt/pg_locale_libc.c | 42 +++++++++-----------------
 src/include/utils/pg_locale.h          |  4 +--
 5 files changed, 20 insertions(+), 34 deletions(-)

diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index b77d8c23d36..4801fe90089 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -36,7 +36,7 @@ t_isalpha(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	pg_locale_t mylocale = 0;	/* TODO */
+	locale_t	mylocale = 0;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalpha(TOUCHAR(ptr));
@@ -51,7 +51,7 @@ t_isalnum(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	pg_locale_t mylocale = 0;	/* TODO */
+	locale_t	mylocale = 0;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalnum(TOUCHAR(ptr));
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 79bcd32a063..e2dd3da3aa3 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -299,7 +299,7 @@ TParserInit(char *str, int len)
 	 */
 	if (prs->charmaxlen > 1)
 	{
-		pg_locale_t mylocale = 0;	/* TODO */
+		locale_t	mylocale = 0;	/* TODO */
 
 		prs->usewide = true;
 		if (database_ctype_is_c)
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f5e31c433a0..6d63d08c8ae 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1024,7 +1024,7 @@ get_iso_localename(const char *winlocname)
 		char	   *hyphen;
 
 		/* Locale names use only ASCII, any conversion locale suffices. */
-		rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), NULL);
+		rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), LC_C_LOCALE);
 		if (rc == -1 || rc == sizeof(iso_lc_messages))
 			return NULL;
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index d6eef885ce0..cceb28f9a72 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -215,7 +215,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
@@ -226,7 +226,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -310,7 +310,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
@@ -327,7 +327,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -398,7 +398,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
@@ -409,7 +409,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -956,10 +956,12 @@ wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
  * zero-terminated.  The output will be zero-terminated iff there is room.
  */
 size_t
-wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
+wchar2char(char *to, const wchar_t *from, size_t tolen, locale_t loc)
 {
 	size_t		result;
 
+	Assert(loc != NULL);
+
 	if (tolen == 0)
 		return 0;
 
@@ -986,16 +988,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 #endif							/* WIN32 */
-	if (locale == (pg_locale_t) 0)
-	{
-		/* Use wcstombs directly for the default locale */
-		result = wcstombs(to, from, tolen);
-	}
-	else
-	{
-		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
-	}
+		result = wcstombs_l(to, from, tolen, loc);
 
 	return result;
 }
@@ -1011,10 +1004,12 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
  */
 size_t
 char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
-		   pg_locale_t locale)
+		   locale_t loc)
 {
 	size_t		result;
 
+	Assert(loc != NULL);
+
 	if (tolen == 0)
 		return 0;
 
@@ -1046,16 +1041,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		/* mbstowcs requires ending '\0' */
 		char	   *str = pnstrdup(from, fromlen);
 
-		if (locale == (pg_locale_t) 0)
-		{
-			/* Use mbstowcs directly for the default locale */
-			result = mbstowcs(to, str, tolen);
-		}
-		else
-		{
-			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
-		}
+		result = mbstowcs_l(to, str, tolen, loc);
 
 		pfree(str);
 	}
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 3ea16e83ee1..6565a523f88 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -166,8 +166,8 @@ extern void report_newlocale_failure(const char *localename);
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
-						 pg_locale_t locale);
+						 locale_t loc);
 extern size_t char2wchar(wchar_t *to, size_t tolen,
-						 const char *from, size_t fromlen, pg_locale_t locale);
+						 const char *from, size_t fromlen, locale_t loc);
 
 #endif							/* _PG_LOCALE_ */
-- 
2.43.0

v2-0006-tsearch-use-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v2-0006-tsearch-use-global_libc_locale.patchDownload

From 229d9ec22a6c8dc50a709ae5032896e7932d219b Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 11 Jun 2025 10:07:29 -0700
Subject: [PATCH v2 6/7] tsearch: use global_libc_locale.

---
 configure                         |  2 +-
 configure.ac                      |  2 ++
 meson.build                       |  2 ++
 src/backend/tsearch/ts_locale.c   |  8 +++---
 src/backend/tsearch/wparser_def.c | 44 ++++++++++++++++++++++++++++---
 src/include/pg_config.h.in        |  6 +++++
 6 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 4f15347cc95..2660c29e0d2 100755
--- a/configure
+++ b/configure
@@ -15616,7 +15616,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton iswxdigit_l isxdigit_l kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 4b8335dc613..2d16c5fd43f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1789,6 +1789,8 @@ AC_CHECK_FUNCS(m4_normalize([
 	getifaddrs
 	getpeerucred
 	inet_pton
+	iswxdigit_l
+	isxdigit_l
 	kqueue
 	localeconv_l
 	mbstowcs_l
diff --git a/meson.build b/meson.build
index d142e3e408b..0bd6f9f2076 100644
--- a/meson.build
+++ b/meson.build
@@ -2880,6 +2880,8 @@ func_checks = [
   ['getpeerucred'],
   ['inet_aton'],
   ['inet_pton'],
+  ['iswxdigit_l'],
+  ['isxdigit_l'],
   ['kqueue'],
   ['localeconv_l'],
   ['mbstowcs_l'],
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index 4801fe90089..6b66fd1c05b 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -36,14 +36,14 @@ t_isalpha(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	locale_t	mylocale = 0;	/* TODO */
+	locale_t	mylocale = global_libc_locale;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalpha(TOUCHAR(ptr));
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalpha((wint_t) character[0]);
+	return iswalpha_l((wint_t) character[0], mylocale);
 }
 
 int
@@ -51,14 +51,14 @@ t_isalnum(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	locale_t	mylocale = 0;	/* TODO */
+	locale_t	mylocale = global_libc_locale;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalnum(TOUCHAR(ptr));
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalnum((wint_t) character[0]);
+	return iswalnum_l((wint_t) character[0], mylocale);
 }
 
 
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index e2dd3da3aa3..9a80d32b448 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -299,7 +299,7 @@ TParserInit(char *str, int len)
 	 */
 	if (prs->charmaxlen > 1)
 	{
-		locale_t	mylocale = 0;	/* TODO */
+		locale_t	mylocale = global_libc_locale;	/* TODO */
 
 		prs->usewide = true;
 		if (database_ctype_is_c)
@@ -411,6 +411,40 @@ TParserCopyClose(TParser *prs)
 }
 
 
+#ifndef HAVE_ISXDIGIT_L
+static int
+isxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _isxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = isxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+#ifndef HAVE_ISWXDIGIT_L
+static int
+iswxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _iswxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = iswxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+
+
 /*
  * Character-type support functions, equivalent to is* macros, but
  * working with any possible encodings and locales. Notes:
@@ -434,11 +468,13 @@ p_is##type(TParser *prs)													\
 			unsigned int c = *(prs->pgwstr + prs->state->poschar);			\
 			if (c > 0x7f)													\
 				return nonascii;											\
-			return is##type(c);												\
+			return is##type##_l(c, global_libc_locale);						\
 		}																	\
-		return isw##type(*(prs->wstr + prs->state->poschar));				\
+		return isw##type##_l(*(prs->wstr + prs->state->poschar),			\
+							 global_libc_locale);							\
 	}																		\
-	return is##type(*(unsigned char *) (prs->str + prs->state->posbyte));	\
+	return is##type##_l(*(unsigned char *) (prs->str + prs->state->posbyte),	\
+						global_libc_locale);								\
 }																			\
 																			\
 static int																	\
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 726a7c1be1f..f06396c94f4 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -229,6 +229,12 @@
 /* Define to 1 if you have the global variable 'int timezone'. */
 #undef HAVE_INT_TIMEZONE
 
+/* Define to 1 if you have the `iswxdigit_l' function. */
+#undef HAVE_ISWXDIGIT_L
+
+/* Define to 1 if you have the `isxdigit_l' function. */
+#undef HAVE_ISXDIGIT_L
+
 /* Define to 1 if __builtin_constant_p(x) implies "i"(x) acceptance. */
 #undef HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P
 
-- 
2.43.0

v2-0007-Force-LC_COLLATE-to-C-in-postmaster.patchtext/x-patch; charset=UTF-8; name=v2-0007-Force-LC_COLLATE-to-C-in-postmaster.patchDownload

From b2c8cd6a69530f48c760e09c12f16c2c33e321f8 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 11:32:01 -0700
Subject: [PATCH v2 7/7] Force LC_COLLATE to C in postmaster.

Avoid dependence on setlocale().

strcoll(), etc., is not called directly; all such calls should go
through pg_locale.c and use the appropriate provider. By setting
LC_COLLATE to C, we avoid accidentally depending on libc behavior when
using a different provider.

No behavior change in the backend, but it's possible that some
extensions will be affected. Such extensions should ordinarily be
updated to use the pg_locale_t APIs. If the extension must use libc
behavior, it can instead use the "_l" variants of functions along with
global_libc_locale.

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
---
 src/backend/main/main.c           | 16 ++++++++++------
 src/backend/utils/init/postinit.c | 10 ++++------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 7d63cf94a6b..9e11557d91a 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -125,13 +125,17 @@ main(int argc, char *argv[])
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("postgres"));
 
 	/*
-	 * In the postmaster, absorb the environment values for LC_COLLATE and
-	 * LC_CTYPE.  Individual backends will change these later to settings
-	 * taken from pg_database, but the postmaster cannot do that.  If we leave
-	 * these set to "C" then message localization might not work well in the
-	 * postmaster.
+	 * Collation is handled by pg_locale.c, and the behavior is dependent on
+	 * the provider. strcoll(), etc., should not be called directly.
+	 */
+	init_locale("LC_COLLATE", LC_COLLATE, "C");
+
+	/*
+	 * In the postmaster, absorb the environment values for LC_CTYPE.
+	 * Individual backends will change it later to pg_database.datctype, but
+	 * the postmaster cannot do that.  If we leave it set to "C" then message
+	 * localization might not work well in the postmaster.
 	 */
-	init_locale("LC_COLLATE", LC_COLLATE, "");
 	init_locale("LC_CTYPE", LC_CTYPE, "");
 
 	/*
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 74f9df84fde..6deabf7474c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -417,12 +417,10 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
 	ctype = TextDatumGetCString(datum);
 
-	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
-		ereport(FATAL,
-				(errmsg("database locale is incompatible with operating system"),
-				 errdetail("The database was initialized with LC_COLLATE \"%s\", "
-						   " which is not recognized by setlocale().", collate),
-				 errhint("Recreate the database with another locale or install the missing locale.")));
+	/*
+	 * Historcally, we set LC_COLLATE from datcollate, as well, but that's no
+	 * longer necessary.
+	 */
 
 	if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
 		ereport(FATAL,
-- 
2.43.0

#42

Jeff Davis

pgsql@j-davis.com

7 months ago

In reply to: Jeff Davis (#41)

7 attachment(s)

Re: Remaining dependency on setlocale()

On Wed, 2025-06-11 at 12:15 -0700, Jeff Davis wrote:

I changed this to a global_libc_locale that includes both LC_COLLATE
and LC_CTYPE (from datcollate and datctype), in case an extension is
relying on strcoll for some reason.

This patch series, at least so far, is designed to have zero behavior
changes. Anything with a potential for a behavior change should be a
separate commit, so that if we need to revert it, we can revert the
behavior change without reintroducing a setlocale() dependency.

...

I reworked it to be less confusing by changing wchar2char/char2wchar
to
take a locale_t instead of pg_locale_t. Hopefully it's an
improvement.

...

Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE
set to datctype.

Attached rebased v3.

Regards,
Jeff Davis

Attachments:

v3-0001-Hold-datcollate-datctype-in-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v3-0001-Hold-datcollate-datctype-in-global_libc_locale.patchDownload

From 454a8998196c49de9a17aa83d198464d52a3f278 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 14:13:16 -0700
Subject: [PATCH v3 1/7] Hold datcollate/datctype in global_libc_locale.

Callers of locale-aware ctype operations should use the "_l" variants
of the functions and pass global_libc_locale for the locale. Doing so
avoids depending on setlocale().

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 src/backend/utils/adt/pg_locale_libc.c | 77 ++++++++++++++++++++++++++
 src/backend/utils/init/postinit.c      |  2 +
 src/include/utils/pg_locale.h          |  7 +++
 3 files changed, 86 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index e9f9fc1e369..a3d8b51a7d9 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -81,6 +81,12 @@
  */
 #define		TEXTBUFLEN			1024
 
+/*
+ * Represents datcollate and datctype locales in a global variable, so that we
+ * don't need to rely on setlocale() anywhere.
+ */
+locale_t	global_libc_locale = NULL;
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -665,6 +671,77 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_size;
 }
 
+/*
+ * Initialize global locale for LC_COLLATE and LC_CTYPE from datcollate and
+ * datctype, respectively.
+ *
+ * NB: should be consistent with make_libc_collator(), except that it must
+ * create the locale even for "C" and "POSIX".
+ */
+void
+init_global_libc_locale(const char *collate, const char *ctype)
+{
+	locale_t	loc = 0;
+
+	if (strcmp(collate, ctype) == 0)
+	{
+		/* Normal case where they're the same */
+		errno = 0;
+#ifndef WIN32
+		loc = newlocale(LC_COLLATE_MASK | LC_CTYPE_MASK, collate, NULL);
+#else
+		loc = _create_locale(LC_ALL, collate);
+#endif
+		if (!loc)
+			ereport(FATAL,
+					(errmsg("database locale is incompatible with operating system"),
+					 errdetail("The database was initialized with LC_COLLATE \"%s\", "
+							   " which is not recognized by setlocale().", collate),
+					 errhint("Recreate the database with another locale or install the missing locale.")));
+	}
+	else
+	{
+#ifndef WIN32
+		/* We need two newlocale() steps */
+		locale_t	loc1 = 0;
+
+		errno = 0;
+		loc1 = newlocale(LC_COLLATE_MASK, collate, NULL);
+		if (!loc1)
+			ereport(FATAL,
+					(errmsg("database locale is incompatible with operating system"),
+					 errdetail("The database was initialized with LC_COLLATE \"%s\", "
+							   " which is not recognized by setlocale().", collate),
+					 errhint("Recreate the database with another locale or install the missing locale.")));
+
+		errno = 0;
+		loc = newlocale(LC_CTYPE_MASK, ctype, loc1);
+		if (!loc)
+		{
+			if (loc1)
+				freelocale(loc1);
+			ereport(FATAL,
+					(errmsg("database locale is incompatible with operating system"),
+					 errdetail("The database was initialized with LC_CTYPE \"%s\", "
+							   " which is not recognized by setlocale().", ctype),
+					 errhint("Recreate the database with another locale or install the missing locale.")));
+		}
+#else
+
+		/*
+		 * XXX The _create_locale() API doesn't appear to support this. Could
+		 * perhaps be worked around by changing pg_locale_t to contain two
+		 * separate fields.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("collations with different collate and ctype values are not supported on this platform")));
+#endif
+	}
+
+	global_libc_locale = loc;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c86ceefda94..74f9df84fde 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -431,6 +431,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	init_global_libc_locale(collate, ctype);
+
 	if (strcmp(ctype, "C") == 0 ||
 		strcmp(ctype, "POSIX") == 0)
 		database_ctype_is_c = true;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 44ff60a25b4..9735d15ceb2 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -34,6 +34,12 @@ extern PGDLLIMPORT char *localized_full_days[];
 extern PGDLLIMPORT char *localized_abbrev_months[];
 extern PGDLLIMPORT char *localized_full_months[];
 
+/*
+ * Represents datcollate and datctype locales in a global variable, so that we
+ * don't need to rely on setlocale() anywhere.
+ */
+extern PGDLLIMPORT locale_t global_libc_locale;
+
 /* is the databases's LC_CTYPE the C locale? */
 extern PGDLLIMPORT bool database_ctype_is_c;
 
@@ -169,6 +175,7 @@ struct pg_locale_struct
 	}			info;
 };
 
+extern void init_global_libc_locale(const char *collate, const char *ctype);
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
-- 
2.43.0

v3-0002-fuzzystrmatch-use-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v3-0002-fuzzystrmatch-use-global_libc_locale.patchDownload

From 9305b8065086a7d03900e2f4dc4396219c206768 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:06:34 -0700
Subject: [PATCH v3 2/7] fuzzystrmatch: use global_libc_locale.

---
 contrib/fuzzystrmatch/dmetaphone.c    |  3 ++-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 19 +++++++++++--------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..8777c1f5c04 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 
 /* turn off assertions for embedded function */
 #define NDEBUG
@@ -284,7 +285,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = toupper_l((unsigned char) *i, global_libc_locale);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..103dd07220c 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -41,6 +41,7 @@
 #include <ctype.h>
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -56,13 +57,15 @@ static void _soundex(const char *instr, char *outstr);
 
 #define SOUNDEX_LEN 4
 
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_locale)
+
 /*									ABCDEFGHIJKLMNOPQRSTUVWXYZ */
 static const char *const soundex_table = "01230120022455012623010202";
 
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = TOUPPER((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +127,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = TOUPPER((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +304,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (TOUPPER((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (TOUPPER((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? TOUPPER((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? TOUPPER((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) TOUPPER((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +745,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) TOUPPER((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v3-0003-ltree-use-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v3-0003-ltree-use-global_libc_locale.patchDownload

From 04db5e1ff309c08010ef7f87aba96e71fbd8f42c Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:06:50 -0700
Subject: [PATCH v3 3/7] ltree: use global_libc_locale.

---
 contrib/ltree/crc32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..5f5c563471e 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -12,7 +12,7 @@
 
 #ifdef LOWER_NODE
 #include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
+#define TOLOWER(x)	tolower_l((unsigned char) (x), global_libc_locale)
 #else
 #define TOLOWER(x)	(x)
 #endif
-- 
2.43.0

v3-0004-Use-global_libc_locale-for-downcase_identifier-an.patchtext/x-patch; charset=UTF-8; name=v3-0004-Use-global_libc_locale-for-downcase_identifier-an.patchDownload

From 9c454496624da63948641b19e4592e4fbb4f609f Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:07:01 -0700
Subject: [PATCH v3 4/7] Use global_libc_locale for downcase_identifier() and
 pg_strcasecmp().

---
 src/backend/parser/scansup.c |  3 ++-
 src/port/pgstrcasecmp.c      | 20 ++++++++++++++------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..d45bf275e42 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -68,7 +69,7 @@ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 		if (ch >= 'A' && ch <= 'Z')
 			ch += 'a' - 'A';
 		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
+			ch = tolower_l(ch, global_libc_locale);
 		result[i] = (char) ch;
 	}
 	result[i] = '\0';
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..812050598e7 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -28,6 +28,14 @@
 
 #include <ctype.h>
 
+#ifndef FRONTEND
+extern PGDLLIMPORT locale_t global_libc_locale;
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_locale)
+#define TOLOWER(x) tolower_l((unsigned char) (x), global_libc_locale)
+#else
+#define TOUPPER(x) toupper(x)
+#define TOLOWER(x) tolower(x)
+#endif
 
 /*
  * Case-independent comparison of two null-terminated strings.
@@ -45,12 +53,12 @@ pg_strcasecmp(const char *s1, const char *s2)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -78,12 +86,12 @@ pg_strncasecmp(const char *s1, const char *s2, size_t n)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -107,7 +115,7 @@ pg_toupper(unsigned char ch)
 	if (ch >= 'a' && ch <= 'z')
 		ch += 'A' - 'a';
 	else if (IS_HIGHBIT_SET(ch) && islower(ch))
-		ch = toupper(ch);
+		ch = TOUPPER(ch);
 	return ch;
 }
 
@@ -124,7 +132,7 @@ pg_tolower(unsigned char ch)
 	if (ch >= 'A' && ch <= 'Z')
 		ch += 'a' - 'A';
 	else if (IS_HIGHBIT_SET(ch) && isupper(ch))
-		ch = tolower(ch);
+		ch = TOLOWER(ch);
 	return ch;
 }
 
-- 
2.43.0

v3-0005-Change-wchar2char-and-char2wchar-to-accept-a-loca.patchtext/x-patch; charset=UTF-8; name=v3-0005-Change-wchar2char-and-char2wchar-to-accept-a-loca.patchDownload

From cfc6a2d1dacc51dd7e09291eaa1d4cac350625c7 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 11 Jun 2025 10:11:16 -0700
Subject: [PATCH v3 5/7] Change wchar2char() and char2wchar() to accept a
 locale_t.

These are libc-specific functions, so accepting a locale_t makes more
sense than accepting a pg_locale_t (which could use another provider).

Also, no longer accept NULL.
---
 src/backend/tsearch/ts_locale.c        |  4 +--
 src/backend/tsearch/wparser_def.c      |  2 +-
 src/backend/utils/adt/pg_locale.c      |  2 +-
 src/backend/utils/adt/pg_locale_libc.c | 42 +++++++++-----------------
 src/include/utils/pg_locale.h          |  4 +--
 5 files changed, 20 insertions(+), 34 deletions(-)

diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index b77d8c23d36..4801fe90089 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -36,7 +36,7 @@ t_isalpha(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	pg_locale_t mylocale = 0;	/* TODO */
+	locale_t	mylocale = 0;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalpha(TOUCHAR(ptr));
@@ -51,7 +51,7 @@ t_isalnum(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	pg_locale_t mylocale = 0;	/* TODO */
+	locale_t	mylocale = 0;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalnum(TOUCHAR(ptr));
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 79bcd32a063..e2dd3da3aa3 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -299,7 +299,7 @@ TParserInit(char *str, int len)
 	 */
 	if (prs->charmaxlen > 1)
 	{
-		pg_locale_t mylocale = 0;	/* TODO */
+		locale_t	mylocale = 0;	/* TODO */
 
 		prs->usewide = true;
 		if (database_ctype_is_c)
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ce50e9e15d0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -998,7 +998,7 @@ get_iso_localename(const char *winlocname)
 		char	   *hyphen;
 
 		/* Locale names use only ASCII, any conversion locale suffices. */
-		rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), NULL);
+		rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), LC_C_LOCALE);
 		if (rc == -1 || rc == sizeof(iso_lc_messages))
 			return NULL;
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index a3d8b51a7d9..998bfa857f0 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -463,7 +463,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
@@ -474,7 +474,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -558,7 +558,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
@@ -575,7 +575,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -646,7 +646,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
@@ -657,7 +657,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -1207,10 +1207,12 @@ wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
  * zero-terminated.  The output will be zero-terminated iff there is room.
  */
 size_t
-wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
+wchar2char(char *to, const wchar_t *from, size_t tolen, locale_t loc)
 {
 	size_t		result;
 
+	Assert(loc != NULL);
+
 	if (tolen == 0)
 		return 0;
 
@@ -1237,16 +1239,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 #endif							/* WIN32 */
-	if (locale == (pg_locale_t) 0)
-	{
-		/* Use wcstombs directly for the default locale */
-		result = wcstombs(to, from, tolen);
-	}
-	else
-	{
-		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
-	}
+		result = wcstombs_l(to, from, tolen, loc);
 
 	return result;
 }
@@ -1262,10 +1255,12 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
  */
 size_t
 char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
-		   pg_locale_t locale)
+		   locale_t loc)
 {
 	size_t		result;
 
+	Assert(loc != NULL);
+
 	if (tolen == 0)
 		return 0;
 
@@ -1297,16 +1292,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		/* mbstowcs requires ending '\0' */
 		char	   *str = pnstrdup(from, fromlen);
 
-		if (locale == (pg_locale_t) 0)
-		{
-			/* Use mbstowcs directly for the default locale */
-			result = mbstowcs(to, str, tolen);
-		}
-		else
-		{
-			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
-		}
+		result = mbstowcs_l(to, str, tolen, loc);
 
 		pfree(str);
 	}
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 9735d15ceb2..d008b49e3c7 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -218,8 +218,8 @@ extern void report_newlocale_failure(const char *localename);
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
-						 pg_locale_t locale);
+						 locale_t loc);
 extern size_t char2wchar(wchar_t *to, size_t tolen,
-						 const char *from, size_t fromlen, pg_locale_t locale);
+						 const char *from, size_t fromlen, locale_t loc);
 
 #endif							/* _PG_LOCALE_ */
-- 
2.43.0

v3-0006-tsearch-use-global_libc_locale.patchtext/x-patch; charset=UTF-8; name=v3-0006-tsearch-use-global_libc_locale.patchDownload

From 3040033010333689ee3135e476b31b9dd07cbe41 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 11 Jun 2025 10:07:29 -0700
Subject: [PATCH v3 6/7] tsearch: use global_libc_locale.

---
 configure                         |  2 +-
 configure.ac                      |  2 ++
 meson.build                       |  2 ++
 src/backend/tsearch/ts_locale.c   |  8 +++---
 src/backend/tsearch/wparser_def.c | 44 ++++++++++++++++++++++++++++---
 src/include/pg_config.h.in        |  6 +++++
 6 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 16ef5b58d1a..82dd3a04e3a 100755
--- a/configure
+++ b/configure
@@ -15616,7 +15616,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton iswxdigit_l isxdigit_l kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index b3efc49c97a..d23ef43f243 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1789,6 +1789,8 @@ AC_CHECK_FUNCS(m4_normalize([
 	getifaddrs
 	getpeerucred
 	inet_pton
+	iswxdigit_l
+	isxdigit_l
 	kqueue
 	localeconv_l
 	mbstowcs_l
diff --git a/meson.build b/meson.build
index 91fb4756ed4..c11a6f63a05 100644
--- a/meson.build
+++ b/meson.build
@@ -2885,6 +2885,8 @@ func_checks = [
   ['getpeerucred'],
   ['inet_aton'],
   ['inet_pton'],
+  ['iswxdigit_l'],
+  ['isxdigit_l'],
   ['kqueue'],
   ['localeconv_l'],
   ['mbstowcs_l'],
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index 4801fe90089..6b66fd1c05b 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -36,14 +36,14 @@ t_isalpha(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	locale_t	mylocale = 0;	/* TODO */
+	locale_t	mylocale = global_libc_locale;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalpha(TOUCHAR(ptr));
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalpha((wint_t) character[0]);
+	return iswalpha_l((wint_t) character[0], mylocale);
 }
 
 int
@@ -51,14 +51,14 @@ t_isalnum(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	locale_t	mylocale = 0;	/* TODO */
+	locale_t	mylocale = global_libc_locale;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalnum(TOUCHAR(ptr));
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalnum((wint_t) character[0]);
+	return iswalnum_l((wint_t) character[0], mylocale);
 }
 
 
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index e2dd3da3aa3..9a80d32b448 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -299,7 +299,7 @@ TParserInit(char *str, int len)
 	 */
 	if (prs->charmaxlen > 1)
 	{
-		locale_t	mylocale = 0;	/* TODO */
+		locale_t	mylocale = global_libc_locale;	/* TODO */
 
 		prs->usewide = true;
 		if (database_ctype_is_c)
@@ -411,6 +411,40 @@ TParserCopyClose(TParser *prs)
 }
 
 
+#ifndef HAVE_ISXDIGIT_L
+static int
+isxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _isxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = isxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+#ifndef HAVE_ISWXDIGIT_L
+static int
+iswxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _iswxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = iswxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+
+
 /*
  * Character-type support functions, equivalent to is* macros, but
  * working with any possible encodings and locales. Notes:
@@ -434,11 +468,13 @@ p_is##type(TParser *prs)													\
 			unsigned int c = *(prs->pgwstr + prs->state->poschar);			\
 			if (c > 0x7f)													\
 				return nonascii;											\
-			return is##type(c);												\
+			return is##type##_l(c, global_libc_locale);						\
 		}																	\
-		return isw##type(*(prs->wstr + prs->state->poschar));				\
+		return isw##type##_l(*(prs->wstr + prs->state->poschar),			\
+							 global_libc_locale);							\
 	}																		\
-	return is##type(*(unsigned char *) (prs->str + prs->state->posbyte));	\
+	return is##type##_l(*(unsigned char *) (prs->str + prs->state->posbyte),	\
+						global_libc_locale);								\
 }																			\
 																			\
 static int																	\
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 726a7c1be1f..f06396c94f4 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -229,6 +229,12 @@
 /* Define to 1 if you have the global variable 'int timezone'. */
 #undef HAVE_INT_TIMEZONE
 
+/* Define to 1 if you have the `iswxdigit_l' function. */
+#undef HAVE_ISWXDIGIT_L
+
+/* Define to 1 if you have the `isxdigit_l' function. */
+#undef HAVE_ISXDIGIT_L
+
 /* Define to 1 if __builtin_constant_p(x) implies "i"(x) acceptance. */
 #undef HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P
 
-- 
2.43.0

v3-0007-Force-LC_COLLATE-to-C-in-postmaster.patchtext/x-patch; charset=UTF-8; name=v3-0007-Force-LC_COLLATE-to-C-in-postmaster.patchDownload

From 13c2e61f4592c85645a4ea73cb3f2a3dd5da3a68 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 11:32:01 -0700
Subject: [PATCH v3 7/7] Force LC_COLLATE to C in postmaster.

Avoid dependence on setlocale().

strcoll(), etc., is not called directly; all such calls should go
through pg_locale.c and use the appropriate provider. By setting
LC_COLLATE to C, we avoid accidentally depending on libc behavior when
using a different provider.

No behavior change in the backend, but it's possible that some
extensions will be affected. Such extensions should ordinarily be
updated to use the pg_locale_t APIs. If the extension must use libc
behavior, it can instead use the "_l" variants of functions along with
global_libc_locale.

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
---
 src/backend/main/main.c           | 16 ++++++++++------
 src/backend/utils/init/postinit.c | 10 ++++------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 7d63cf94a6b..9e11557d91a 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -125,13 +125,17 @@ main(int argc, char *argv[])
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("postgres"));
 
 	/*
-	 * In the postmaster, absorb the environment values for LC_COLLATE and
-	 * LC_CTYPE.  Individual backends will change these later to settings
-	 * taken from pg_database, but the postmaster cannot do that.  If we leave
-	 * these set to "C" then message localization might not work well in the
-	 * postmaster.
+	 * Collation is handled by pg_locale.c, and the behavior is dependent on
+	 * the provider. strcoll(), etc., should not be called directly.
+	 */
+	init_locale("LC_COLLATE", LC_COLLATE, "C");
+
+	/*
+	 * In the postmaster, absorb the environment values for LC_CTYPE.
+	 * Individual backends will change it later to pg_database.datctype, but
+	 * the postmaster cannot do that.  If we leave it set to "C" then message
+	 * localization might not work well in the postmaster.
 	 */
-	init_locale("LC_COLLATE", LC_COLLATE, "");
 	init_locale("LC_CTYPE", LC_CTYPE, "");
 
 	/*
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 74f9df84fde..6deabf7474c 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -417,12 +417,10 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
 	ctype = TextDatumGetCString(datum);
 
-	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
-		ereport(FATAL,
-				(errmsg("database locale is incompatible with operating system"),
-				 errdetail("The database was initialized with LC_COLLATE \"%s\", "
-						   " which is not recognized by setlocale().", collate),
-				 errhint("Recreate the database with another locale or install the missing locale.")));
+	/*
+	 * Historcally, we set LC_COLLATE from datcollate, as well, but that's no
+	 * longer necessary.
+	 */
 
 	if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
 		ereport(FATAL,
-- 
2.43.0

#43

Jeff Davis

pgsql@j-davis.com

6 months ago

In reply to: Jeff Davis (#41)

1 attachment(s)

Re: Remaining dependency on setlocale()

On Wed, 2025-06-11 at 12:15 -0700, Jeff Davis wrote:

v1-0008-Set-process-LC_COLLATE-C-and-LC_CTYPE-C.patch

As I mentioned earlier in the thread, I don't think we can do this
for
LC_CTYPE, because otherwise system error messages would not come
out
in
the right encoding.

Changed it so that it only sets LC_COLLATE to C, and leaves LC_CTYPE
set to datctype.

Unfortunately, as long as LC_CTYPE is set to a real locale, there's a
danger of accidentally depending on that setting. Can the encoding be
controlled with LC_MESSAGES instead of LC_CTYPE?

Do you have an example of how things can go wrong?

I looked into this a bit, and if I understand correctly, the only
problem is with strerror() and strerror_r(), which depend on
LC_MESSAGES for the language but LC_CTYPE to find the right encoding.

I attached some example C code to illustrate how strerror() is affected
by both LC_MESSAGES and LC_CTYPE. For example:

$ ./strerror de_DE.UTF-8 de_DE.UTF-8
LC_CTYPE set to: de_DE.UTF-8
LC_MESSAGES set to: de_DE.UTF-8
Error message (from strerror(EILSEQ)): Ungültiges oder
unvollständiges Multi-Byte- oder Wide-Zeichen
$ ./strerror C de_DE.UTF-8
LC_CTYPE set to: C
LC_MESSAGES set to: de_DE.UTF-8
Error message (from strerror(EILSEQ)): Ung?ltiges oder
unvollst?ndiges Multi-Byte- oder Wide-Zeichen

On unix-based systems, we can use newlocale() to initialize a global
variable with both LC_CTYPE and LC_MESSAGES set. The LC_MESSAGES
portion would need to be updated every time the GUC changes, which is
not great.

Windows would be a different story, though: strerror() doesn't seem to
have a variant that accepts a _locale_t object, and even if it did, I
don't see a way to create a _locale_t object with LC_MESSAGES and
LC_CTYPE set to different values. One idea is to use
_configthreadlocale(_ENABLE_PER_THREAD_LOCALE), and then use
setlocale(), which could enable us to use setlocale() similar to how we
use uselocale() on other systems. That would be awkward, though.

Thoughts? That seems like a lot of work just for the case of
strerror()/strerror_r().

Regards,
Jeff Davis

[1]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/configthreadlocale?view=msvc-170
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/configthreadlocale?view=msvc-170

#44

Jeff Davis

pgsql@j-davis.com

6 months ago

In reply to: Jeff Davis (#42)

8 attachment(s)

Re: Remaining dependency on setlocale()

On Tue, 2025-07-01 at 08:06 -0700, Jeff Davis wrote:

Attached rebased v3.

And here's v4.

I changed the global variable to only hold the LC_CTYPE (not
LC_COLLATE), because windows doesn't support a _locale_t that
represents multiple categories with different locales.

This patch series is designed to not have any changes in behavior.
There was some feedback that I could go further, but I'll leave those
suggestions for future patches, in case one causes an unexpected
behavior change and needs to be reverted. I intend to start committing
these soon.

v4-0008 uses LC_C_LOCALE, and I'm not sure if that's portable, but if
the buildfarm complains then I'll fix it or revert it.

Regards,
Jeff Davis

Attachments:

v4-0001-Force-LC_COLLATE-to-C-in-postmaster.patchtext/x-patch; charset=UTF-8; name=v4-0001-Force-LC_COLLATE-to-C-in-postmaster.patchDownload

From a0f021977d6d86055748e05e0c3d3de9fd17c7bf Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 11:32:01 -0700
Subject: [PATCH v4 1/8] Force LC_COLLATE to C in postmaster.

Avoid dependence on setlocale().

strcoll(), etc., is not called directly; all such calls should go
through pg_locale.c and use the appropriate provider. By setting
LC_COLLATE to C, we avoid accidentally depending on libc behavior when
using a different provider.

No behavior change in the backend, but it's possible that some
extensions will be affected. Such extensions should ordinarily be
updated to use the pg_locale_t APIs.

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
---
 src/backend/main/main.c           | 16 ++++++++++------
 src/backend/utils/init/postinit.c | 10 ++++------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 7d63cf94a6b..9e11557d91a 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -125,13 +125,17 @@ main(int argc, char *argv[])
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("postgres"));
 
 	/*
-	 * In the postmaster, absorb the environment values for LC_COLLATE and
-	 * LC_CTYPE.  Individual backends will change these later to settings
-	 * taken from pg_database, but the postmaster cannot do that.  If we leave
-	 * these set to "C" then message localization might not work well in the
-	 * postmaster.
+	 * Collation is handled by pg_locale.c, and the behavior is dependent on
+	 * the provider. strcoll(), etc., should not be called directly.
+	 */
+	init_locale("LC_COLLATE", LC_COLLATE, "C");
+
+	/*
+	 * In the postmaster, absorb the environment values for LC_CTYPE.
+	 * Individual backends will change it later to pg_database.datctype, but
+	 * the postmaster cannot do that.  If we leave it set to "C" then message
+	 * localization might not work well in the postmaster.
 	 */
-	init_locale("LC_COLLATE", LC_COLLATE, "");
 	init_locale("LC_CTYPE", LC_CTYPE, "");
 
 	/*
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c86ceefda94..464f1196be3 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -417,12 +417,10 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	datum = SysCacheGetAttrNotNull(DATABASEOID, tup, Anum_pg_database_datctype);
 	ctype = TextDatumGetCString(datum);
 
-	if (pg_perm_setlocale(LC_COLLATE, collate) == NULL)
-		ereport(FATAL,
-				(errmsg("database locale is incompatible with operating system"),
-				 errdetail("The database was initialized with LC_COLLATE \"%s\", "
-						   " which is not recognized by setlocale().", collate),
-				 errhint("Recreate the database with another locale or install the missing locale.")));
+	/*
+	 * Historcally, we set LC_COLLATE from datcollate, as well, but that's no
+	 * longer necessary.
+	 */
 
 	if (pg_perm_setlocale(LC_CTYPE, ctype) == NULL)
 		ereport(FATAL,
-- 
2.43.0

v4-0002-Change-wchar2char-and-char2wchar-to-accept-a-loca.patchtext/x-patch; charset=UTF-8; name=v4-0002-Change-wchar2char-and-char2wchar-to-accept-a-loca.patchDownload

From 4b7490912a4720e5ccb433679ac3895a65503d84 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Jul 2025 15:43:32 -0700
Subject: [PATCH v4 2/8] Change wchar2char() and char2wchar() to accept a
 locale_t.

These are libc-specific functions, so should require a locale_t rather
than a pg_locale_t (which could use another provider).

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 src/backend/tsearch/ts_locale.c        |  4 ++--
 src/backend/tsearch/wparser_def.c      |  2 +-
 src/backend/utils/adt/pg_locale_libc.c | 24 ++++++++++++------------
 src/include/utils/pg_locale.h          |  4 ++--
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index b77d8c23d36..4801fe90089 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -36,7 +36,7 @@ t_isalpha(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	pg_locale_t mylocale = 0;	/* TODO */
+	locale_t	mylocale = 0;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalpha(TOUCHAR(ptr));
@@ -51,7 +51,7 @@ t_isalnum(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	pg_locale_t mylocale = 0;	/* TODO */
+	locale_t	mylocale = 0;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalnum(TOUCHAR(ptr));
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 79bcd32a063..e2dd3da3aa3 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -299,7 +299,7 @@ TParserInit(char *str, int len)
 	 */
 	if (prs->charmaxlen > 1)
 	{
-		pg_locale_t mylocale = 0;	/* TODO */
+		locale_t	mylocale = 0;	/* TODO */
 
 		prs->usewide = true;
 		if (database_ctype_is_c)
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index e9f9fc1e369..8d88b53c375 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -457,7 +457,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 		workspace[curr_char] = towlower_l(workspace[curr_char], loc);
@@ -468,7 +468,7 @@ strlower_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -552,7 +552,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 	{
@@ -569,7 +569,7 @@ strtitle_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -640,7 +640,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	/* Output workspace cannot have more codes than input bytes */
 	workspace = (wchar_t *) palloc((srclen + 1) * sizeof(wchar_t));
 
-	char2wchar(workspace, srclen + 1, src, srclen, locale);
+	char2wchar(workspace, srclen + 1, src, srclen, loc);
 
 	for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
 		workspace[curr_char] = towupper_l(workspace[curr_char], loc);
@@ -651,7 +651,7 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	max_size = curr_char * pg_database_encoding_max_length();
 	result = palloc(max_size + 1);
 
-	result_size = wchar2char(result, workspace, max_size + 1, locale);
+	result_size = wchar2char(result, workspace, max_size + 1, loc);
 
 	if (result_size + 1 > destsize)
 		return result_size;
@@ -1130,7 +1130,7 @@ wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
  * zero-terminated.  The output will be zero-terminated iff there is room.
  */
 size_t
-wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
+wchar2char(char *to, const wchar_t *from, size_t tolen, locale_t loc)
 {
 	size_t		result;
 
@@ -1160,7 +1160,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	}
 	else
 #endif							/* WIN32 */
-	if (locale == (pg_locale_t) 0)
+	if (loc == (locale_t) 0)
 	{
 		/* Use wcstombs directly for the default locale */
 		result = wcstombs(to, from, tolen);
@@ -1168,7 +1168,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
 	else
 	{
 		/* Use wcstombs_l for nondefault locales */
-		result = wcstombs_l(to, from, tolen, locale->info.lt);
+		result = wcstombs_l(to, from, tolen, loc);
 	}
 
 	return result;
@@ -1185,7 +1185,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
  */
 size_t
 char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
-		   pg_locale_t locale)
+		   locale_t loc)
 {
 	size_t		result;
 
@@ -1220,7 +1220,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		/* mbstowcs requires ending '\0' */
 		char	   *str = pnstrdup(from, fromlen);
 
-		if (locale == (pg_locale_t) 0)
+		if (loc == (locale_t) 0)
 		{
 			/* Use mbstowcs directly for the default locale */
 			result = mbstowcs(to, str, tolen);
@@ -1228,7 +1228,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		else
 		{
 			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, locale->info.lt);
+			result = mbstowcs_l(to, str, tolen, loc);
 		}
 
 		pfree(str);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 44ff60a25b4..6c60f2e2a74 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -211,8 +211,8 @@ extern void report_newlocale_failure(const char *localename);
 
 /* These functions convert from/to libc's wchar_t, *not* pg_wchar_t */
 extern size_t wchar2char(char *to, const wchar_t *from, size_t tolen,
-						 pg_locale_t locale);
+						 locale_t loc);
 extern size_t char2wchar(wchar_t *to, size_t tolen,
-						 const char *from, size_t fromlen, pg_locale_t locale);
+						 const char *from, size_t fromlen, locale_t loc);
 
 #endif							/* _PG_LOCALE_ */
-- 
2.43.0

v4-0003-Initialize-datctype-in-global-locale_t-variable.patchtext/x-patch; charset=UTF-8; name=v4-0003-Initialize-datctype-in-global-locale_t-variable.patchDownload

From 3b7f43c4ad3b2bd99de95ae9db980d7bc465a8a9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 6 Jun 2025 14:13:16 -0700
Subject: [PATCH v4 3/8] Initialize datctype in global locale_t variable.

Callers of locale-aware ctype operations should use the "_l" variants
of the functions and pass global_libc_ctype for the locale. Doing so
avoids depending on setlocale().

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 src/backend/utils/adt/pg_locale_libc.c | 35 ++++++++++++++++++++++++++
 src/backend/utils/init/postinit.c      |  2 ++
 src/include/utils/pg_locale.h          |  7 ++++++
 3 files changed, 44 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 8d88b53c375..33a082b6490 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -81,6 +81,12 @@
  */
 #define		TEXTBUFLEN			1024
 
+/*
+ * Represents datctype locale in a global variable, so that we don't need to
+ * rely on setlocale() anywhere.
+ */
+locale_t	global_libc_ctype = NULL;
+
 extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 
 static int	strncoll_libc(const char *arg1, ssize_t len1,
@@ -665,6 +671,35 @@ strupper_libc_mb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_size;
 }
 
+/*
+ * Initialize global locale for LC_COLLATE and LC_CTYPE from datcollate and
+ * datctype, respectively.
+ *
+ * NB: should be consistent with make_libc_collator(), except that it must
+ * create the locale even for "C" and "POSIX".
+ */
+void
+init_global_libc_ctype(const char *ctype)
+{
+	locale_t	loc = 0;
+
+	/* Normal case where they're the same */
+	errno = 0;
+#ifndef WIN32
+	loc = newlocale(LC_CTYPE_MASK, ctype, NULL);
+#else
+	loc = _create_locale(LC_ALL, ctype);
+#endif
+	if (!loc)
+		ereport(FATAL,
+				(errmsg("database locale is incompatible with operating system"),
+				 errdetail("The database was initialized with LC_CTYPE \"%s\", "
+						   " which is not recognized by setlocale().", ctype),
+				 errhint("Recreate the database with another locale or install the missing locale.")));
+
+	global_libc_ctype = loc;
+}
+
 pg_locale_t
 create_pg_locale_libc(Oid collid, MemoryContext context)
 {
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 464f1196be3..81bde31a791 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -429,6 +429,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	init_global_libc_ctype(ctype);
+
 	if (strcmp(ctype, "C") == 0 ||
 		strcmp(ctype, "POSIX") == 0)
 		database_ctype_is_c = true;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 6c60f2e2a74..a4ab020685f 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -34,6 +34,12 @@ extern PGDLLIMPORT char *localized_full_days[];
 extern PGDLLIMPORT char *localized_abbrev_months[];
 extern PGDLLIMPORT char *localized_full_months[];
 
+/*
+ * Represents datcollate and datctype locales in a global variable, so that we
+ * don't need to rely on setlocale() anywhere.
+ */
+extern PGDLLIMPORT locale_t global_libc_ctype;
+
 /* is the databases's LC_CTYPE the C locale? */
 extern PGDLLIMPORT bool database_ctype_is_c;
 
@@ -169,6 +175,7 @@ struct pg_locale_struct
 	}			info;
 };
 
+extern void init_global_libc_ctype(const char *ctype);
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
-- 
2.43.0

v4-0004-fuzzystrmatch-use-global_libc_ctype.patchtext/x-patch; charset=UTF-8; name=v4-0004-fuzzystrmatch-use-global_libc_ctype.patchDownload

From f55ea79c07a80092cff82040b37e80fe2704ec9b Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:06:34 -0700
Subject: [PATCH v4 4/8] fuzzystrmatch: use global_libc_ctype.

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 contrib/fuzzystrmatch/dmetaphone.c    |  3 ++-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 19 +++++++++++--------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..9c6ce53ca08 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 
 /* turn off assertions for embedded function */
 #define NDEBUG
@@ -284,7 +285,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = toupper_l((unsigned char) *i, global_libc_ctype);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..28d62cb2ee5 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -41,6 +41,7 @@
 #include <ctype.h>
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -56,13 +57,15 @@ static void _soundex(const char *instr, char *outstr);
 
 #define SOUNDEX_LEN 4
 
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_ctype)
+
 /*									ABCDEFGHIJKLMNOPQRSTUVWXYZ */
 static const char *const soundex_table = "01230120022455012623010202";
 
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = TOUPPER((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +127,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = TOUPPER((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +304,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (TOUPPER((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (TOUPPER((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? TOUPPER((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? TOUPPER((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) TOUPPER((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +745,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) TOUPPER((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v4-0005-ltree-use-global_libc_ctype.patchtext/x-patch; charset=UTF-8; name=v4-0005-ltree-use-global_libc_ctype.patchDownload

From fbe7c9c469653731e013b7b98d9ade655df68e5f Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:06:50 -0700
Subject: [PATCH v4 5/8] ltree: use global_libc_ctype.

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 contrib/ltree/crc32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..7a1f666608c 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -12,7 +12,7 @@
 
 #ifdef LOWER_NODE
 #include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
+#define TOLOWER(x)	tolower_l((unsigned char) (x), global_libc_ctype)
 #else
 #define TOLOWER(x)	(x)
 #endif
-- 
2.43.0

v4-0006-Use-global_libc_ctype-for-downcase_identifier-and.patchtext/x-patch; charset=UTF-8; name=v4-0006-Use-global_libc_ctype-for-downcase_identifier-and.patchDownload

From 9146228242b497f856a28bb8b58204c47a182873 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Tue, 10 Jun 2025 20:07:01 -0700
Subject: [PATCH v4 6/8] Use global_libc_ctype for downcase_identifier() and
 pg_strcasecmp().

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 src/backend/parser/scansup.c |  3 ++-
 src/port/pgstrcasecmp.c      | 20 ++++++++++++++------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..2a477ae25c6 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -68,7 +69,7 @@ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 		if (ch >= 'A' && ch <= 'Z')
 			ch += 'a' - 'A';
 		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
+			ch = tolower_l(ch, global_libc_ctype);
 		result[i] = (char) ch;
 	}
 	result[i] = '\0';
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..4dd9c50b652 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -28,6 +28,14 @@
 
 #include <ctype.h>
 
+#ifndef FRONTEND
+extern PGDLLIMPORT locale_t global_libc_ctype;
+#define TOUPPER(x) toupper_l((unsigned char) (x), global_libc_ctype)
+#define TOLOWER(x) tolower_l((unsigned char) (x), global_libc_ctype)
+#else
+#define TOUPPER(x) toupper(x)
+#define TOLOWER(x) tolower(x)
+#endif
 
 /*
  * Case-independent comparison of two null-terminated strings.
@@ -45,12 +53,12 @@ pg_strcasecmp(const char *s1, const char *s2)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -78,12 +86,12 @@ pg_strncasecmp(const char *s1, const char *s2, size_t n)
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
 			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -107,7 +115,7 @@ pg_toupper(unsigned char ch)
 	if (ch >= 'a' && ch <= 'z')
 		ch += 'A' - 'a';
 	else if (IS_HIGHBIT_SET(ch) && islower(ch))
-		ch = toupper(ch);
+		ch = TOUPPER(ch);
 	return ch;
 }
 
@@ -124,7 +132,7 @@ pg_tolower(unsigned char ch)
 	if (ch >= 'A' && ch <= 'Z')
 		ch += 'a' - 'A';
 	else if (IS_HIGHBIT_SET(ch) && isupper(ch))
-		ch = tolower(ch);
+		ch = TOLOWER(ch);
 	return ch;
 }
 
-- 
2.43.0

v4-0007-tsearch-use-global_libc_ctype.patchtext/x-patch; charset=UTF-8; name=v4-0007-tsearch-use-global_libc_ctype.patchDownload

From 701ed7a40011122869f2dec9a1af48ebd438a678 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 11 Jun 2025 10:07:29 -0700
Subject: [PATCH v4 7/8] tsearch: use global_libc_ctype.

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 configure                         |  2 +-
 configure.ac                      |  2 ++
 meson.build                       |  2 ++
 src/backend/tsearch/ts_locale.c   |  8 +++---
 src/backend/tsearch/wparser_def.c | 44 ++++++++++++++++++++++++++++---
 src/include/pg_config.h.in        |  6 +++++
 6 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 16ef5b58d1a..82dd3a04e3a 100755
--- a/configure
+++ b/configure
@@ -15616,7 +15616,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols copyfile copy_file_range elf_aux_info getauxval getifaddrs getpeerucred inet_pton iswxdigit_l isxdigit_l kqueue localeconv_l mbstowcs_l posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strsignal syncfs sync_file_range uselocale wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index b3efc49c97a..d23ef43f243 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1789,6 +1789,8 @@ AC_CHECK_FUNCS(m4_normalize([
 	getifaddrs
 	getpeerucred
 	inet_pton
+	iswxdigit_l
+	isxdigit_l
 	kqueue
 	localeconv_l
 	mbstowcs_l
diff --git a/meson.build b/meson.build
index a97854a947d..45e44979ecd 100644
--- a/meson.build
+++ b/meson.build
@@ -2886,6 +2886,8 @@ func_checks = [
   ['getpeerucred'],
   ['inet_aton'],
   ['inet_pton'],
+  ['iswxdigit_l'],
+  ['isxdigit_l'],
   ['kqueue'],
   ['localeconv_l'],
   ['mbstowcs_l'],
diff --git a/src/backend/tsearch/ts_locale.c b/src/backend/tsearch/ts_locale.c
index 4801fe90089..d93ea00f09c 100644
--- a/src/backend/tsearch/ts_locale.c
+++ b/src/backend/tsearch/ts_locale.c
@@ -36,14 +36,14 @@ t_isalpha(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	locale_t	mylocale = 0;	/* TODO */
+	locale_t	mylocale = global_libc_ctype;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalpha(TOUCHAR(ptr));
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalpha((wint_t) character[0]);
+	return iswalpha_l((wint_t) character[0], mylocale);
 }
 
 int
@@ -51,14 +51,14 @@ t_isalnum(const char *ptr)
 {
 	int			clen = pg_mblen(ptr);
 	wchar_t		character[WC_BUF_LEN];
-	locale_t	mylocale = 0;	/* TODO */
+	locale_t	mylocale = global_libc_ctype;	/* TODO */
 
 	if (clen == 1 || database_ctype_is_c)
 		return isalnum(TOUCHAR(ptr));
 
 	char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
 
-	return iswalnum((wint_t) character[0]);
+	return iswalnum_l((wint_t) character[0], mylocale);
 }
 
 
diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index e2dd3da3aa3..1ef6ca1d12c 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -299,7 +299,7 @@ TParserInit(char *str, int len)
 	 */
 	if (prs->charmaxlen > 1)
 	{
-		locale_t	mylocale = 0;	/* TODO */
+		locale_t	mylocale = global_libc_ctype;	/* TODO */
 
 		prs->usewide = true;
 		if (database_ctype_is_c)
@@ -411,6 +411,40 @@ TParserCopyClose(TParser *prs)
 }
 
 
+#ifndef HAVE_ISXDIGIT_L
+static int
+isxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _isxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = isxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+#ifndef HAVE_ISWXDIGIT_L
+static int
+iswxdigit_l(wint_t wc, locale_t loc)
+{
+#ifdef WIN32
+	return _iswxdigit_l(wc, loc);
+#else
+	size_t		result;
+	locale_t	save_locale = uselocale(loc);
+
+	result = iswxdigit(wc);
+	uselocale(save_locale);
+	return result;
+#endif
+}
+#endif
+
+
 /*
  * Character-type support functions, equivalent to is* macros, but
  * working with any possible encodings and locales. Notes:
@@ -434,11 +468,13 @@ p_is##type(TParser *prs)													\
 			unsigned int c = *(prs->pgwstr + prs->state->poschar);			\
 			if (c > 0x7f)													\
 				return nonascii;											\
-			return is##type(c);												\
+			return is##type##_l(c, global_libc_ctype);						\
 		}																	\
-		return isw##type(*(prs->wstr + prs->state->poschar));				\
+		return isw##type##_l(*(prs->wstr + prs->state->poschar),			\
+							 global_libc_ctype);							\
 	}																		\
-	return is##type(*(unsigned char *) (prs->str + prs->state->posbyte));	\
+	return is##type##_l(*(unsigned char *) (prs->str + prs->state->posbyte),	\
+						global_libc_ctype);									\
 }																			\
 																			\
 static int																	\
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 726a7c1be1f..f06396c94f4 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -229,6 +229,12 @@
 /* Define to 1 if you have the global variable 'int timezone'. */
 #undef HAVE_INT_TIMEZONE
 
+/* Define to 1 if you have the `iswxdigit_l' function. */
+#undef HAVE_ISWXDIGIT_L
+
+/* Define to 1 if you have the `isxdigit_l' function. */
+#undef HAVE_ISXDIGIT_L
+
 /* Define to 1 if __builtin_constant_p(x) implies "i"(x) acceptance. */
 #undef HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P
 
-- 
2.43.0

v4-0008-No-longer-accept-NULL-for-wchar2char-char2wchar.patchtext/x-patch; charset=UTF-8; name=v4-0008-No-longer-accept-NULL-for-wchar2char-char2wchar.patchDownload

From 70b7e1012daec72641d866bf09385933a98d1d42 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 7 Jul 2025 15:43:48 -0700
Subject: [PATCH v4 8/8] No longer accept NULL for wchar2char()/char2wchar().

Avoid dependence on setlocale().

Discussion: https://postgr.es/m/9875f7f9-50f1-4b5d-86fc-ee8b03e8c162@eisentraut.org
---
 src/backend/utils/adt/pg_locale.c      |  2 +-
 src/backend/utils/adt/pg_locale_libc.c | 24 +++++-------------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..ce50e9e15d0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -998,7 +998,7 @@ get_iso_localename(const char *winlocname)
 		char	   *hyphen;
 
 		/* Locale names use only ASCII, any conversion locale suffices. */
-		rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), NULL);
+		rc = wchar2char(iso_lc_messages, buffer, sizeof(iso_lc_messages), LC_C_LOCALE);
 		if (rc == -1 || rc == sizeof(iso_lc_messages))
 			return NULL;
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 33a082b6490..4013771e301 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -1169,6 +1169,8 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, locale_t loc)
 {
 	size_t		result;
 
+	Assert(loc != NULL);
+
 	if (tolen == 0)
 		return 0;
 
@@ -1195,16 +1197,7 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, locale_t loc)
 	}
 	else
 #endif							/* WIN32 */
-	if (loc == (locale_t) 0)
-	{
-		/* Use wcstombs directly for the default locale */
-		result = wcstombs(to, from, tolen);
-	}
-	else
-	{
-		/* Use wcstombs_l for nondefault locales */
 		result = wcstombs_l(to, from, tolen, loc);
-	}
 
 	return result;
 }
@@ -1224,6 +1217,8 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 {
 	size_t		result;
 
+	Assert(loc != NULL);
+
 	if (tolen == 0)
 		return 0;
 
@@ -1255,16 +1250,7 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
 		/* mbstowcs requires ending '\0' */
 		char	   *str = pnstrdup(from, fromlen);
 
-		if (loc == (locale_t) 0)
-		{
-			/* Use mbstowcs directly for the default locale */
-			result = mbstowcs(to, str, tolen);
-		}
-		else
-		{
-			/* Use mbstowcs_l for nondefault locales */
-			result = mbstowcs_l(to, str, tolen, loc);
-		}
+		result = mbstowcs_l(to, str, tolen, loc);
 
 		pfree(str);
 	}
-- 
2.43.0

#45

Jeff Davis

pgsql@j-davis.com

6 months ago

In reply to: Jeff Davis (#43)

Re: Remaining dependency on setlocale()

On Mon, 2025-07-07 at 17:56 -0700, Jeff Davis wrote:

I looked into this a bit, and if I understand correctly, the only
problem is with strerror() and strerror_r(), which depend on
LC_MESSAGES for the language but LC_CTYPE to find the right encoding.

...

Windows would be a different story, though: strerror() doesn't seem
to
have a variant that accepts a _locale_t object, and even if it did, I
don't see a way to create a _locale_t object with LC_MESSAGES and
LC_CTYPE set to different values.

I think I have an answer to the second part here:

"For information about the format of the locale argument, see Locale
names, Languages, and Country/Region strings."

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/create-locale-wcreate-locale?view=msvc-170

and when I follow that link, I see:

"You can specify multiple category types, separated by semicolons.
Category types that aren't specified use the current locale setting.
For example, this code snippet sets the current locale for all
categories to de-DE, and then sets the categories LC_MONETARY to en-GB
and LC_TIME to es-ES:

_wsetlocale(LC_ALL, L"de-DE");
_wsetlocale(LC_ALL, L"LC_MONETARY=en-GB;LC_TIME=es-ES");"

https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170

So we just need to construct a string of the right form, and we can
have a _locale_t object representing the global locale for all
categories. I'm not sure exactly how we escape the individual locale
names, but it might be enough to just reject ';' in the locale name (at
least for windows).

The first problem -- how to affect the encoding of strings returned by
strerror() on windows -- may be solvable as well. It looks like
LC_MESSAGES is not supported at all on windows, so the only thing to be
concerned about is the encoding, which is affected by LC_CTYPE. But
windows doesn't offer uselocale() or strerror_l(). The only way seems
to be to call _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) and then
setlocale(LC_CTYPE, datctype) right before strerror(), and switch it
back to "C" right afterward. Comments welcome.

Regards,
Jeff Davis

#46

Thomas Munro

thomas.munro@gmail.com

6 months ago

In reply to: Jeff Davis (#45)

Re: Remaining dependency on setlocale()

On Thu, Jul 10, 2025 at 10:52 AM Jeff Davis <pgsql@j-davis.com> wrote:

The first problem -- how to affect the encoding of strings returned by
strerror() on windows -- may be solvable as well. It looks like
LC_MESSAGES is not supported at all on windows, so the only thing to be
concerned about is the encoding, which is affected by LC_CTYPE. But
windows doesn't offer uselocale() or strerror_l(). The only way seems
to be to call _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) and then
setlocale(LC_CTYPE, datctype) right before strerror(), and switch it
back to "C" right afterward. Comments welcome.

FWIW there is an example of that in src/port/pg_localeconv_r.c.

#47

Thomas Munro

thomas.munro@gmail.com

6 months ago

In reply to: Jeff Davis (#44)

Re: Remaining dependency on setlocale()

On Tue, Jul 8, 2025 at 1:14 PM Jeff Davis <pgsql@j-davis.com> wrote:

v4-0008 uses LC_C_LOCALE, and I'm not sure if that's portable, but if
the buildfarm complains then I'll fix it or revert it.

(Catching up with this thread...)

LC_C_LOCALE is definitely not portable: I've only seen it on macOS and
NetBSD. It would be a good thing to propose to POSIX, since no other
approach can be retrofitted quite so cromulently...

I tried to make a portable PG_C_LOCALE mechanism like that, but it was
reverted for reasons needing more investigation... see
8e993bff5326b00ced137c837fce7cd1e0ecae14 (reverted by
3c8e463b0d885e0d976f6a13a1fb78187b25c86f).

#48

Jeff Davis

pgsql@j-davis.com

6 months ago

In reply to: Thomas Munro (#46)

Re: Remaining dependency on setlocale()

On Thu, 2025-07-10 at 11:53 +1200, Thomas Munro wrote:

On Thu, Jul 10, 2025 at 10:52 AM Jeff Davis <pgsql@j-davis.com>
wrote:

The first problem -- how to affect the encoding of strings returned
by
strerror() on windows -- may be solvable as well. It looks like
LC_MESSAGES is not supported at all on windows, so the only thing
to be
concerned about is the encoding, which is affected by LC_CTYPE. But
windows doesn't offer uselocale() or strerror_l(). The only way
seems
to be to call _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) and
then
setlocale(LC_CTYPE, datctype) right before strerror(), and switch
it
back to "C" right afterward. Comments welcome.

FWIW there is an example of that in src/port/pg_localeconv_r.c.

OK, so it seems we have a path forward here:

1. Have a global_libc_locale that represents all of the categories, and
keep it up to date with GUC changes. On windows, it requires keeping
the textual locale names handy (e.g. copies of datcollate and
datctype), and building the special locale string and doing
_create_locale(LC_ALL, "LC_ABC=somelocale;LC_XYZ=otherlocale").

2. When there's no _l() variant of a function, like strerror_r(), wrap
with uselocale(). On windows, this means using the trick above with
_configthreadlocale(_ENABLE_PER_THREAD_LOCALE).

I don't have a great windows development environment, and it appears CI
and the buildfarm don't offer great coverage either. Can I ask for a
volunteer to do the windows side of this work?

Regards,
Jeff Davis

#49

Jeff Davis

pgsql@j-davis.com

6 months ago

In reply to: Thomas Munro (#47)

Re: Remaining dependency on setlocale()

On Thu, 2025-07-10 at 12:01 +1200, Thomas Munro wrote:

I tried to make a portable PG_C_LOCALE mechanism like that, but it
was
reverted for reasons needing more investigation... see
8e993bff5326b00ced137c837fce7cd1e0ecae14 (reverted by
3c8e463b0d885e0d976f6a13a1fb78187b25c86f).

The revert seems to be related to pgport_shlib. At least for my current
work, I'm focused on removing setlocale() dependencies in the backend,
and a PG_C_LOCALE should work fine there.

Regards,
Jeff Davis

#50

Thomas Munro

thomas.munro@gmail.com

6 months ago

In reply to: Jeff Davis (#49)

Re: Remaining dependency on setlocale()

On Fri, Jul 11, 2025 at 6:33 AM Jeff Davis <pgsql@j-davis.com> wrote:

On Thu, 2025-07-10 at 12:01 +1200, Thomas Munro wrote:

I tried to make a portable PG_C_LOCALE mechanism like that, but it
was
reverted for reasons needing more investigation... see
8e993bff5326b00ced137c837fce7cd1e0ecae14 (reverted by
3c8e463b0d885e0d976f6a13a1fb78187b25c86f).

The revert seems to be related to pgport_shlib. At least for my current
work, I'm focused on removing setlocale() dependencies in the backend,
and a PG_C_LOCALE should work fine there.

OK, I'll figure out what happened with that and try to post a new
version over on that other thread soon.

(FWIW I learned a couple of encouraging things about that topic:
glibc's newlocale(LC_ALL, NULL, 0) seems to give you a static
singleton anyway, no allocation happens, so it can't fail in practice
and it's cheap, and FreeBSD also supports LC_C_LOCALE like NetBSD and
macOS, it just doesn't have a name, you pass NULL instead (which is
the value that LC_C_LOCALE has on those other systems). I actually
rather like the macro, because how else are you supposed to test
whether this system can accept NULL there?)

#51

Thomas Munro

thomas.munro@gmail.com

6 months ago

In reply to: Jeff Davis (#48)

Re: Remaining dependency on setlocale()

On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql@j-davis.com> wrote:

I don't have a great windows development environment, and it appears CI
and the buildfarm don't offer great coverage either. Can I ask for a
volunteer to do the windows side of this work?

Me neither but I'm willing to help with that, and have done lots of
closely related things through trial-by-CI...

#52

Jeff Davis

pgsql@j-davis.com

6 months ago

In reply to: Thomas Munro (#51)

1 attachment(s)

Re: Remaining dependency on setlocale()

On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote:

On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql@j-davis.com> wrote:

I don't have a great windows development environment, and it
appears CI
and the buildfarm don't offer great coverage either. Can I ask for
a
volunteer to do the windows side of this work?

Me neither but I'm willing to help with that, and have done lots of
closely related things through trial-by-CI...

Attached a patch to separate the message translation (both gettext and
strerror translations) from setlocale(). That's a step towards thread
safety, and also a step toward setting LC_CTYPE=C permanently (more
work still required there).

The patch feels a bit over-engineered, but I'd like to know what you
think. It would be great if you could test/debug the windows NLS-
enabled paths.

I'm also not sure what to do about the NetBSD path. NetBSD has no
uselocale(), so I have to fall bad to temporary setlocale(), which is
not thread safe. And I'm getting a mysterious error in test_aio for
NetBSD, which I haven't investigated yet.

Regards,
Jeff Davis

Attachments:

v5-0001-Create-wrapper-for-managing-NLS-locale.patchtext/x-patch; charset=UTF-8; name=v5-0001-Create-wrapper-for-managing-NLS-locale.patchDownload

From 8bd59e7d52351fadeb4fe26023aa0ff57735e03f Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 18 Jul 2025 14:06:45 -0700
Subject: [PATCH v5] Create wrapper for managing NLS locale.

Message translation depends on LC_CTYPE and LC_MESSAGES. Use wrapper
functions to control those settings rather than relying on the
permanent setlocale() settings.

Improves thread safety by using "_l()" variants of functions or
uselocale() where available. On windows, setlocale() can be made
thread-safe. There is still at least one platform (NetBSD) where none
of those options are available, in which case it still depends on
thread-unsafe setlocale().

Also separates message translation behavior from other, unrelated
behaviors like tolower().

Discussion: https://postgr.es/m/f040113cf384ada69558ec004a04a3ddb3e40a26.camel@j-davis.com
---
 configure.ac                      |   2 +
 meson.build                       |   2 +
 src/backend/main/main.c           |  13 +-
 src/backend/utils/adt/Makefile    |   1 +
 src/backend/utils/adt/meson.build |   1 +
 src/backend/utils/adt/pg_locale.c |  39 +--
 src/backend/utils/adt/pg_nls.c    | 417 ++++++++++++++++++++++++++++++
 src/backend/utils/init/postinit.c |   4 +
 src/include/c.h                   |  19 +-
 src/include/pg_config.h.in        |   8 +
 src/include/port.h                |  10 +
 src/include/utils/pg_nls.h        |  29 +++
 src/tools/pg_bsd_indent/err.c     |   2 +
 13 files changed, 524 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/utils/adt/pg_nls.c
 create mode 100644 src/include/utils/pg_nls.h

diff --git a/configure.ac b/configure.ac
index c2877e36935..493014302cd 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1791,6 +1791,8 @@ AC_CHECK_FUNCS(m4_normalize([
 	backtrace_symbols
 	copyfile
 	copy_file_range
+	dgettext_l
+	dngettext_l
 	elf_aux_info
 	getauxval
 	getifaddrs
diff --git a/meson.build b/meson.build
index 5365aaf95e6..d3c285b2d54 100644
--- a/meson.build
+++ b/meson.build
@@ -2882,6 +2882,8 @@ func_checks = [
   # when enabling asan the dlopen check doesn't notice that -ldl is actually
   # required. Just checking for dlsym() ought to suffice.
   ['dlsym', {'dependencies': [dl_dep], 'define': false}],
+  ['dgettext_l'],
+  ['dngettext_l'],
   ['elf_aux_info'],
   ['explicit_bzero'],
   ['getauxval'],
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index bdcb5e4f261..fbef0245b28 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -38,6 +38,7 @@
 #include "utils/help_config.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/pg_nls.h"
 #include "utils/ps_status.h"
 
 
@@ -139,14 +140,16 @@ main(int argc, char *argv[])
 	init_locale("LC_CTYPE", LC_CTYPE, "");
 
 	/*
-	 * LC_MESSAGES will get set later during GUC option processing, but we set
-	 * it here to allow startup error messages to be localized.
+	 * Initialize NLS locale's LC_CTYPE and LC_MESSAGES from the environment.
+	 * It will be updated later during GUC option processing, but we set it
+	 * here to allow startup error messages to be localized.
 	 */
-#ifdef LC_MESSAGES
-	init_locale("LC_MESSAGES", LC_MESSAGES, "");
-#endif
+	pg_nls_set_locale("", "");
 
 	/* We keep these set to "C" always.  See pg_locale.c for explanation. */
+#ifdef LC_MESSAGES
+	init_locale("LC_MESSAGES", LC_MESSAGES, "C");
+#endif
 	init_locale("LC_MONETARY", LC_MONETARY, "C");
 	init_locale("LC_NUMERIC", LC_NUMERIC, "C");
 	init_locale("LC_TIME", LC_TIME, "C");
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index ffeacf2b819..38e395b7de9 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -84,6 +84,7 @@ OBJS = \
 	pg_locale_icu.o \
 	pg_locale_libc.o \
 	pg_lsn.o \
+	pg_nls.o \
 	pg_upgrade_support.o \
 	pgstatfuncs.o \
 	pseudorandomfuncs.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index ed9bbd7b926..f85436cd766 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -71,6 +71,7 @@ backend_sources += files(
   'pg_locale_icu.c',
   'pg_locale_libc.c',
   'pg_lsn.c',
+  'pg_nls.c',
   'pg_upgrade_support.c',
   'pgstatfuncs.c',
   'pseudorandomfuncs.c',
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 97c2ac1faf9..39c06b91e7d 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -10,21 +10,29 @@
  */
 
 /*----------
- * Here is how the locale stuff is handled: LC_COLLATE and LC_CTYPE
- * are fixed at CREATE DATABASE time, stored in pg_database, and cannot
- * be changed. Thus, the effects of strcoll(), strxfrm(), isupper(),
- * toupper(), etc. are always in the same fixed locale.
+ * Here is how the locale stuff is handled:
  *
- * LC_MESSAGES is settable at run time and will take effect
- * immediately.
+ * LC_COLLATE is permanently set to "C" with setlocale(), and collation
+ * behavior is defined entirely by pg_locale_t, which has provider-dependent
+ * behavior. If the provider is libc, then it holds a locale_t object with
+ * LC_COLLATE set appropriately.
  *
- * The other categories, LC_MONETARY, LC_NUMERIC, and LC_TIME are
- * permanently set to "C", and then we use temporary locale_t
- * objects when we need to look up locale data based on the GUCs
- * of the same name.  Information is cached when the GUCs change.
- * The cached information is only used by the formatting functions
- * (to_char, etc.) and the money type.  For the user, this should all be
- * transparent.
+ * LC_CTYPE is fixed at CREATE DATABASE time, stored in pg_database, set at
+ * database connection time with setlocale(), and cannot be changed. The
+ * effects are limited, because casing and character classification is mostly
+ * defined by pg_locale_t, and message encoding is controlled by
+ * pg_nls_set_locale(). LC_CTYPE does affect a few places in the backend, such
+ * as case conversions where a pg_locale_t object is unavailable.
+ *
+ * LC_MESSAGES is permanently set to "C" with setlocale(), and NLS behavior is
+ * controlled with pg_nls_set_locale().
+ *
+ * The other categories, LC_MONETARY, LC_NUMERIC, and LC_TIME are permanently
+ * set to "C" with setlocale(), and then we use temporary locale_t objects
+ * when we need to look up locale data based on the GUCs of the same name.
+ * Information is cached when the GUCs change.  The cached information is only
+ * used by the formatting functions (to_char, etc.) and the money type.  For
+ * the user, this should all be transparent.
  *----------
  */
 
@@ -45,6 +53,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/pg_nls.h"
 #include "utils/relcache.h"
 #include "utils/syscache.h"
 
@@ -405,9 +414,7 @@ assign_locale_messages(const char *newval, void *extra)
 	 * LC_MESSAGES category does not exist everywhere, but accept it anyway.
 	 * We ignore failure, as per comment above.
 	 */
-#ifdef LC_MESSAGES
-	(void) pg_perm_setlocale(LC_MESSAGES, newval);
-#endif
+	pg_nls_set_locale(NULL, newval);
 }
 
 
diff --git a/src/backend/utils/adt/pg_nls.c b/src/backend/utils/adt/pg_nls.c
new file mode 100644
index 00000000000..0276da880d6
--- /dev/null
+++ b/src/backend/utils/adt/pg_nls.c
@@ -0,0 +1,417 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL NLS utilities
+ *
+ * Portions Copyright (c) 2002-2025, PostgreSQL Global Development Group
+ *
+ * src/backend/utils/adt/pg_nls.c
+ *
+ * Platform-independent wrappers for message translation functions. The
+ * LC_CTYPE and LC_MESSAGES settings are set with pg_nls_set_locale() and the
+ * state is managed internally to this file, regardless of the outside
+ * settings from setlocale() or uselocale().
+ *
+ * The implementation prefers the "_l()" variants of functions, then
+ * secondarily a temporary uselocale() setting (thread safe), and lastly a
+ * temporary setlocale() setting (which can be made thread safe on windows).
+ *
+ * This mechanism improves thread safety (on most platforms), and provides
+ * better separation between the behavior of NLS and other behaviors like
+ * isupper(), etc.
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/memutils.h"
+#include "utils/pg_locale.h"
+#include "utils/pg_nls.h"
+
+/*
+ * Represents global LC_CTYPE and LC_MESSAGES settings, for the purpose of
+ * message translation. LC_CTYPE in the postmaster comes from the environment,
+ * and in a backend comes from pg_database.datctype. LC_MESSAGES comes from a
+ * GUC, and must be kept up to date.
+ *
+ * If there's no uselocale(), keep the string values instead, and use
+ * setlocale().
+ */
+#ifdef HAVE_USELOCALE
+
+static locale_t nls_locale = (locale_t) 0;
+
+#else
+
+static char *nls_lc_ctype = NULL;
+static char *nls_lc_messages = NULL;
+
+typedef struct SaveLocale
+{
+#ifndef WIN32
+	char	   *lc_ctype;
+	char	   *lc_messages;
+#else
+	int			config_thread_locale;
+	wchar_t    *lc_ctype;
+	wchar_t    *lc_messages;
+#endif							/* WIN32 */
+}			SaveLocale;
+
+#endif							/* !HAVE_USELOCALE */
+
+/*
+ * Set the LC_CTYPE and LC_MESSAGES to be used for message translation.
+ */
+void
+pg_nls_set_locale(const char *ctype, const char *messages)
+{
+	if (ctype)
+	{
+#ifdef HAVE_USELOCALE
+		locale_t	loc = 0;
+
+		errno = 0;
+		loc = newlocale(LC_CTYPE_MASK, ctype, nls_locale);
+		if (!loc)
+			report_newlocale_failure(ctype);
+		nls_locale = loc;
+#else
+		if (!check_locale(LC_CTYPE, ctype, NULL))
+			report_newlocale_failure(ctype);
+		if (nls_lc_ctype)
+			pfree(nls_lc_ctype);
+		nls_lc_ctype = MemoryContextStrdup(TopMemoryContext, ctype);
+#endif
+
+		/*
+		 * Use the right encoding in translated messages.  Under ENABLE_NLS,
+		 * let pg_bind_textdomain_codeset() figure it out.  Under !ENABLE_NLS,
+		 * message format strings are ASCII, but database-encoding strings may
+		 * enter the message via %s.  This makes the overall message encoding
+		 * equal to the database encoding.
+		 */
+#ifdef ENABLE_NLS
+		SetMessageEncoding(pg_bind_textdomain_codeset(textdomain(NULL)));
+#else
+		SetMessageEncoding(GetDatabaseEncoding());
+#endif
+	}
+
+	if (messages)
+	{
+#ifdef HAVE_USELOCALE
+		locale_t	loc = 0;
+
+		errno = 0;
+		loc = newlocale(LC_MESSAGES_MASK, messages, nls_locale);
+		if (!loc)
+			report_newlocale_failure(messages);
+		nls_locale = loc;
+#else
+#ifdef LC_MESSAGES
+		if (!check_locale(LC_MESSAGES, messages, NULL))
+			report_newlocale_failure(messages);
+#endif
+		if (nls_lc_messages)
+			pfree(nls_lc_messages);
+		nls_lc_messages = MemoryContextStrdup(TopMemoryContext, messages);
+#endif
+	}
+}
+
+#ifdef ENABLE_NLS
+
+#ifdef HAVE_USELOCALE
+
+#ifndef HAVE_DGETTEXT_L
+static char *
+dgettext_l(const char *domainname, const char *msgid, locale_t loc)
+{
+	char	   *result;
+	locale_t	save_loc = uselocale(loc);
+
+	result = dcgettext(domainname, msgid, LC_MESSAGES);
+	uselocale(save_loc);
+	return result;
+}
+#endif							/* HAVE_DGETTEXT_L */
+
+#ifndef HAVE_DNGETTEXT_L
+static char *
+dngettext_l(const char *domainname, const char *s, const char *p,
+			unsigned long int n, locale_t loc)
+{
+	char	   *result;
+	locale_t	save_loc = uselocale(loc);
+
+	result = dcngettext(domainname, s, p, n, LC_MESSAGES);
+	uselocale(save_loc);
+	return result;
+}
+#endif							/* HAVE_DNGETTEXT_L */
+
+static char *
+pg_strerror_l(int errnum, locale_t loc)
+{
+	char	   *result;
+	locale_t	save_loc = uselocale(loc);
+
+	result = pg_strerror(errnum);
+	uselocale(save_loc);
+	return result;
+}
+
+static char *
+pg_strerror_r_l(int errnum, char *buf, size_t buflen, locale_t loc)
+{
+	char	   *result;
+	locale_t	save_loc = uselocale(loc);
+
+	result = pg_strerror_r(errnum, buf, buflen);
+	uselocale(save_loc);
+	return result;
+}
+
+#else							/* !HAVE_USELOCALE */
+
+static bool
+save_message_locale(SaveLocale * save)
+{
+#ifndef WIN32
+	char	   *tmp;
+
+	/*
+	 * This path -- ENABLE_NLS, !HAVE_USELOCALE, !WIN32 -- is not thread safe,
+	 * but is only known to be used on NetBSD.
+	 */
+	tmp = setlocale(LC_CTYPE, NULL);
+	if (!tmp)
+		return false;
+	save->lc_ctype = pstrdup(tmp);
+
+	tmp = setlocale(LC_MESSAGES, NULL);
+	if (!tmp)
+		return false;
+	save->lc_messages = pstrdup(tmp);
+
+	return true;
+#else
+	wchar_t    *tmp;
+
+	/* Put setlocale() into thread-local mode. */
+	save->config_thread_locale = _configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
+
+	/*
+	 * Capture the current values as wide strings.  Otherwise, we might not be
+	 * able to restore them if their names contain non-ASCII characters and
+	 * the intermediate locale changes the expected encoding.  We don't want
+	 * to leave the caller in an unexpected state by failing to restore, or
+	 * crash the runtime library.
+	 */
+	tmp = _wsetlocale(LC_CTYPE, NULL);
+	if (!tmp || !(tmp = wcsdup(tmp)))
+		return false;
+	*save->lc_ctype = tmp;
+
+	tmp = _wsetlocale(LC_MESSAGES, NULL);
+	if (!tmp || !(tmp = wcsdup(tmp)))
+		return false;
+	*save->lc_messages = tmp;
+
+	return true;
+#endif
+}
+
+static void
+restore_message_locale(SaveLocale * save)
+{
+#ifndef WIN32
+	if (save->lc_ctype)
+	{
+		setlocale(LC_CTYPE, save->lc_ctype);
+		pfree(save->lc_ctype);
+		save->lc_ctype = NULL;
+	}
+	if (save->lc_messages)
+	{
+		setlocale(LC_MESSAGES, save->lc_messages);
+		pfree(save->lc_messages);
+		save->lc_messages = NULL;
+	}
+#else
+	if (save->lc_ctype)
+	{
+		_wsetlocale(LC_CTYPE, save->lc_ctype);
+		free(save->lc_ctype);
+		save->lc_ctype = NULL;
+	}
+	if (save->lc_messages)
+	{
+		_wsetlocale(LC_MESSAGES, save->lc_messages);
+		free(save->lc_messages);
+		save->lc_messages = NULL;
+	}
+	_configthreadlocale(save->config_thread_locale);
+#endif
+}
+
+static char *
+dgettext_l(const char *domainname, const char *msgid, const char *lc_ctype,
+		   const char *lc_messages)
+{
+	SaveLocale	save;
+
+	if (save_message_locale(&save))
+	{
+		char	   *result;
+
+		(void) setlocale(LC_CTYPE, lc_ctype);
+		(void) setlocale(LC_MESSAGES, lc_messages);
+
+		result = dcgettext(domainname, msgid, LC_MESSAGES);
+		restore_message_locale(&save);
+		return result;
+	}
+	else
+		return dcgettext(domainname, msgid, LC_MESSAGES);
+}
+
+static char *
+dngettext_l(const char *domainname, const char *s, const char *p,
+			unsigned long int n, const char *lc_ctype,
+			const char *lc_messages)
+{
+	SaveLocale	save;
+
+	if (save_message_locale(&save))
+	{
+		char	   *result;
+
+		(void) setlocale(LC_CTYPE, lc_ctype);
+		(void) setlocale(LC_MESSAGES, lc_messages);
+
+		result = dcngettext(domainname, s, p, n, LC_MESSAGES);
+		restore_message_locale(&save);
+		return result;
+	}
+	else
+		return dcngettext(domainname, s, p, n, LC_MESSAGES);
+}
+
+static char *
+pg_strerror_l(int errnum, const char *lc_ctype, const char *lc_messages)
+{
+	SaveLocale	save;
+
+	if (save_message_locale(&save))
+	{
+		char	   *result;
+
+		(void) setlocale(LC_CTYPE, lc_ctype);
+		(void) setlocale(LC_MESSAGES, lc_messages);
+
+		result = pg_strerror(errnum);
+		restore_message_locale(&save);
+		return result;
+	}
+	else
+		return pg_strerror(errnum);
+}
+
+static char *
+pg_strerror_r_l(int errnum, char *buf, size_t buflen, const char *lc_ctype,
+				const char *lc_messages)
+{
+	SaveLocale	save;
+
+	if (save_message_locale(&save))
+	{
+		char	   *result;
+
+		(void) setlocale(LC_CTYPE, lc_ctype);
+		(void) setlocale(LC_MESSAGES, lc_messages);
+
+		result = pg_strerror_r(errnum, buf, buflen);
+		restore_message_locale(&save);
+		return result;
+	}
+	else
+		return pg_strerror_r(errnum, buf, buflen);
+}
+
+#endif							/* !HAVE_USELOCALE */
+
+/*
+ * dgettext() with nls_locale, if set.
+ */
+char *
+pg_nls_dgettext(const char *domainname, const char *msgid)
+{
+#ifdef HAVE_USELOCALE
+	if (nls_locale)
+		return dgettext_l(domainname, msgid, nls_locale);
+#else
+	if (nls_lc_ctype)
+		return dgettext_l(domainname, msgid, nls_lc_ctype,
+						  nls_lc_messages);
+#endif
+	else
+		return dcgettext(domainname, msgid, LC_MESSAGES);
+}
+
+/*
+ * dngettext() with nls_locale, if set.
+ */
+char *
+pg_nls_dngettext(const char *domainname, const char *s, const char *p,
+				 unsigned long int n)
+{
+#ifdef HAVE_USELOCALE
+	if (nls_locale)
+		return dngettext_l(domainname, s, p, n, nls_locale);
+#else
+	if (nls_lc_ctype)
+		return dngettext_l(domainname, s, p, n, nls_lc_ctype,
+						   nls_lc_messages);
+#endif
+	else
+		return dcngettext(domainname, s, p, n, LC_MESSAGES);
+}
+
+/*
+ * pg_strerror() with nls_locale, if set.
+ */
+char *
+pg_nls_strerror(int errnum)
+{
+#ifdef HAVE_USELOCALE
+	if (nls_locale)
+		return pg_strerror_l(errnum, nls_locale);
+#else
+	if (nls_lc_ctype)
+		return pg_strerror_l(errnum, nls_lc_ctype, nls_lc_messages);
+#endif
+	else
+		return pg_strerror(errnum);
+}
+
+/*
+ * pg_strerror_r() with nls_locale, if set.
+ */
+char *
+pg_nls_strerror_r(int errnum, char *buf, size_t buflen)
+{
+#ifdef HAVE_USELOCALE
+	if (nls_locale)
+		return pg_strerror_r_l(errnum, buf, buflen, nls_locale);
+#else
+	if (nls_lc_ctype)
+		return pg_strerror_r_l(errnum, buf, buflen, nls_lc_ctype,
+							   nls_lc_messages);
+#endif
+	else
+		return pg_strerror_r(errnum, buf, buflen);
+}
+
+#endif							/* ENABLE_NLS */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 641e535a73c..3206dd121ed 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -65,6 +65,7 @@
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/portal.h"
+#include "utils/pg_nls.h"
 #include "utils/ps_status.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -430,6 +431,9 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 						   " which is not recognized by setlocale().", ctype),
 				 errhint("Recreate the database with another locale or install the missing locale.")));
 
+	/* set global_message_locale for this database to datctype */
+	pg_nls_set_locale(ctype, NULL);
+
 	if (strcmp(ctype, "C") == 0 ||
 		strcmp(ctype, "POSIX") == 0)
 		database_ctype_is_c = true;
diff --git a/src/include/c.h b/src/include/c.h
index 6d4495bdd9f..2edb8e2f63d 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1159,8 +1159,23 @@ typedef union PGAlignedXLogBlock
  * gettext support
  */
 
-#ifndef ENABLE_NLS
-/* stuff we'd otherwise get from <libintl.h> */
+#if defined(ENABLE_NLS) && !defined(FRONTEND)
+/* use backend's global message locale setting */
+#include "utils/pg_nls.h"
+
+#undef gettext
+#undef dgettext
+#undef ngettext
+#undef dngettext
+
+#define gettext(x) pg_nls_dgettext(NULL, x)
+#define dgettext(d,x) pg_nls_dgettext(d, x)
+#define ngettext(s,p,n) pg_nls_dngettext(NULL, s, p, n)
+#define dngettext(d,s,p,n) pg_nls_dngettext(d, s, p, n)
+#elif defined(ENABLE_NLS)
+/* use <libintl.h> directly */
+#else
+/* no-op */
 #define gettext(x) (x)
 #define dgettext(d,x) (x)
 #define ngettext(s,p,n) ((n) == 1 ? (s) : (p))
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index c4dc5d72bdb..f2fa336bd95 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -111,6 +111,14 @@
    don't. */
 #undef HAVE_DECL_STRCHRNUL
 
+/* Define to 1 if you have the declaration of `dgettext_l', and to 0 if you
+   don't. */
+#undef HAVE_DGETTEXT_L
+
+/* Define to 1 if you have the declaration of `dngettext_l', and to 0 if you
+   don't. */
+#undef HAVE_DNGETTEXT_L
+
 /* Define to 1 if you have the declaration of `strlcat', and to 0 if you
    don't. */
 #undef HAVE_DECL_STRLCAT
diff --git a/src/include/port.h b/src/include/port.h
index 3964d3b1293..c00d84d0dbf 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -249,11 +249,21 @@ extern int	pg_strfromd(char *str, size_t count, int precision, double value);
 
 /* Replace strerror() with our own, somewhat more robust wrapper */
 extern char *pg_strerror(int errnum);
+#if defined(ENABLE_NLS) && !defined(FRONTEND)
+extern char *pg_nls_strerror(int errnum);
+#define strerror pg_nls_strerror
+#else
 #define strerror pg_strerror
+#endif
 
 /* Likewise for strerror_r(); note we prefer the GNU API for that */
 extern char *pg_strerror_r(int errnum, char *buf, size_t buflen);
+#if defined(ENABLE_NLS) && !defined(FRONTEND)
+extern char *pg_nls_strerror_r(int errnum, char *buf, size_t buflen);
+#define strerror_r pg_nls_strerror_r
+#else
 #define strerror_r pg_strerror_r
+#endif
 #define PG_STRERROR_R_BUFLEN 256	/* Recommended buffer size for strerror_r */
 
 /* Wrap strsignal(), or provide our own version if necessary */
diff --git a/src/include/utils/pg_nls.h b/src/include/utils/pg_nls.h
new file mode 100644
index 00000000000..c8c605e9f26
--- /dev/null
+++ b/src/include/utils/pg_nls.h
@@ -0,0 +1,29 @@
+/*-----------------------------------------------------------------------
+ *
+ * PostgreSQL NLS utilities
+ *
+ * src/include/utils/pg_nls.h
+ *
+ * Copyright (c) 2002-2025, PostgreSQL Global Development Group
+ *
+ *-----------------------------------------------------------------------
+ */
+
+#ifndef _PG_NLS_
+#define _PG_NLS_
+
+extern void pg_nls_set_locale(const char *ctype, const char *messages);
+
+#ifdef ENABLE_NLS
+
+extern char *pg_nls_dgettext(const char *domainname, const char *msgid)
+			pg_attribute_format_arg(2);
+extern char *pg_nls_dngettext(const char *domainname, const char *s,
+							  const char *p, unsigned long int n)
+			pg_attribute_format_arg(2) pg_attribute_format_arg(3);
+extern char *pg_nls_strerror(int errnum);
+extern char *pg_nls_strerror_r(int errnum, char *buf, size_t buflen);
+
+#endif
+
+#endif							/* _PG_NLS_ */
diff --git a/src/tools/pg_bsd_indent/err.c b/src/tools/pg_bsd_indent/err.c
index 807319334bc..fe153aa3dcd 100644
--- a/src/tools/pg_bsd_indent/err.c
+++ b/src/tools/pg_bsd_indent/err.c
@@ -27,6 +27,8 @@
  * SUCH DAMAGE.
  */
 
+#define FRONTEND 1
+
 /*
  * This is cut down to just the minimum that we need to build indent.
  */
-- 
2.43.0

#53

Jeff Davis

pgsql@j-davis.com

6 months ago

In reply to: Jeff Davis (#52)

Re: Remaining dependency on setlocale()

On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:

The patch feels a bit over-engineered, but I'd like to know what you
think. It would be great if you could test/debug the windows NLS-
enabled paths.

Let me explain how it ended up looking over-engineered, and perhaps
someone has a simpler solution.

For gettext, we already configure the encoding with
bind_textdomain_codeset(). All it needs is LC_MESSAGES set properly,
which can be done with uselocale(), as a semi-permanent setting until
the next GUC change, just like setlocale() today. There are a couple
minor problems for platforms without uselocale(). For windows, we could
just permanently do:

_configthreadlocale(_ENABLE_PER_THREAD_LOCALE)

and then use _wsetlocale. For NetBSD, I don't have a solution, but
perhaps we can just reject new lc_messages settings after startup, or
just defer the problem until threading actually becomes a pressing
issue.

The main problem is with strerror_r(). To get the right LC_MESSAGES
setting, we need the separate path for windows (which has neither
uselocale() nor strerror_l()). Because we need to keep track of that
path anyway, I used it for gettext as well to have a cleaner separation
for the entire message translation locale. That means we can avoid
permanent locale settings, and reduce the chances that we accidentally
depend on the global locale.

Regards,
Jeff Davis

#54

Jeff Davis

pgsql@j-davis.com

3 months ago

In reply to: Jeff Davis (#53)

Re: Remaining dependency on setlocale()

On Thu, 2025-07-24 at 11:10 -0700, Jeff Davis wrote:

The main problem is with strerror_r()...

Postgres messages, like "division by zero" are translated just fine
without LC_CTYPE; gettext() only needs LC_MESSAGES and the server
encoding. So these are fine.

We use strerror_r() to translate the system errno into a readable
message, like "No such file or directory", i.e. the %m replacements.
That needs LC_CTYPE set (just for the encoding, not the
language/region) as well as LC_MESSAGES (for the language/region).

When using a locale provider other than libc, it's unfortunate to
require LC_CTYPE to be set for just this one single purpose. The locale
itself, e.g. the "en_US" part, is not used at all; only the encoding
part of the setting is relevant. And there is no value other than "C"
that works on all platforms. It's fairly confusing to explain why the
LC_CTYPE setting is required for the builtin or ICU providers at all.
Also, while it's far from the biggest challenge when it comes to
multithreading, it does cause thread-safety headaches on platforms
without uselocale().

Perhaps we could get the ASCII message and run it through gettext()?
That would be extra work for translators, but perhaps not a lot, given
that it's a small and static set of messages in practice. That would
also have the benefit that either NLS is enabled or not -- right now,
since the translation happens in two different ways you can end up with
partially-translated messages. It would also result in consistent
translations across platforms.

Regards,
Jeff Davis

#55

Jeff Davis

pgsql@j-davis.com

3 months ago

In reply to: Jeff Davis (#52)

9 attachment(s)

Re: Remaining dependency on setlocale()

On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:

On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote:

On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql@j-davis.com>
wrote:

I don't have a great windows development environment, and it
appears CI
and the buildfarm don't offer great coverage either. Can I ask
for
a
volunteer to do the windows side of this work?

Me neither but I'm willing to help with that, and have done lots of
closely related things through trial-by-CI...

Attached a new patch series, v6.

Rather than creating new global locale_t objects, this series (along
with a separate patch for NLS[1]/messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com) removes the dependency on the global
LC_CTYPE entirely. It's a bunch of small patches that replace direct
calls to tolower()/toupper() with calls into the provider.

An assumption of these patches is that, in the UTF-8 encoding, the
logic in pg_tolower()/pg_toupper() is equivalent to
pg_ascii_tolower()/pg_ascii_toupper().

Generally these preserve existing behavior, but there are a couple
differences:

* If using the builtin C locale (not C.UTF-8) along with a datctype
that's a non-C locale with single-byte encoding, it could affect the
results of downcase_identifier(), ltree, and fuzzystrmatch on
characters > 127. For ICU, I went to a bit of extra effort to preserve
the existing behavior here, because it's more likely to be used for
single-byte encodings.

* When using ICU or builtin C.UTF-8, along with a datctype of
"tr_TR.UTF-8", then it will affect ltree's and fuzzystrmatch's
treatment of i/I.

If these are a concern we can fix them with some hacks, but those
behaviors seem fairly obscure to me.

Regards,
Jeff Davis

[1]: /messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com
/messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com

Attachments:

v6-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patchtext/x-patch; charset=UTF-8; name=v6-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patchDownload

From 78fbb9220930918221dc0a6aa48b1d0023860707 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 14:58:02 -0700
Subject: [PATCH v6 1/9] Avoid global LC_CTYPE dependency in pg_locale_libc.c.

Call tolower_l() directly instead of through pg_tolower(), because the
latter depends on the global LC_CTYPE.
---
 src/backend/utils/adt/pg_locale_libc.c | 28 ++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 9c7fcd1fc7a..716f005066a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -450,7 +450,12 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (locale->is_default)
-				*p = pg_tolower((unsigned char) *p);
+			{
+				if (*p >= 'A' && *p <= 'Z')
+					*p += 'a' - 'A';
+				else if (IS_HIGHBIT_SET(*p) && isupper_l(*p, loc))
+					*p = tolower_l((unsigned char) *p, loc);
+			}
 			else
 				*p = tolower_l((unsigned char) *p, loc);
 		}
@@ -535,9 +540,19 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 			{
 				if (wasalnum)
-					*p = pg_tolower((unsigned char) *p);
+				{
+					if (*p >= 'A' && *p <= 'Z')
+						*p += 'a' - 'A';
+					else if (IS_HIGHBIT_SET(*p) && isupper_l(*p, loc))
+						*p = tolower_l((unsigned char) *p, loc);
+				}
 				else
-					*p = pg_toupper((unsigned char) *p);
+				{
+					if (*p >= 'a' && *p <= 'z')
+						*p -= 'a' - 'A';
+					else if (IS_HIGHBIT_SET(*p) && islower_l(*p, loc))
+						*p = toupper_l((unsigned char) *p, loc);
+				}
 			}
 			else
 			{
@@ -633,7 +648,12 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (locale->is_default)
-				*p = pg_toupper((unsigned char) *p);
+			{
+				if (*p >= 'a' && *p <= 'z')
+					*p -= 'a' - 'A';
+				else if (IS_HIGHBIT_SET(*p) && islower_l(*p, loc))
+					*p = toupper_l((unsigned char) *p, loc);
+			}
 			else
 				*p = toupper_l((unsigned char) *p, loc);
 		}
-- 
2.43.0

v6-0002-Define-char_tolower-char_toupper-for-all-locale-p.patchtext/x-patch; charset=UTF-8; name=v6-0002-Define-char_tolower-char_toupper-for-all-locale-p.patchDownload

From 631daededebd9649169951764c72d8a372897b5c Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 14:51:47 -0700
Subject: [PATCH v6 2/9] Define char_tolower()/char_toupper() for all locale
 providers.

The behavior is defined for each locale provider rather than
unconditionally depending on the global LC_CTYPE setting. Needed as an
alternative for tolower()/toupper() for some callers.
---
 src/backend/utils/adt/like.c              |  4 +--
 src/backend/utils/adt/pg_locale.c         | 32 ++++++++++++++++-------
 src/backend/utils/adt/pg_locale_builtin.c | 18 +++++++++++++
 src/backend/utils/adt/pg_locale_icu.c     | 23 ++++++++++++++++
 src/backend/utils/adt/pg_locale_libc.c    | 21 +++++++++++++--
 src/include/utils/pg_locale.h             | 10 +++----
 6 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4216ac17f43..37c1c86aee8 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -209,9 +209,7 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (locale->ctype_is_c ||
-		(char_tolower_enabled(locale) &&
-		 pg_database_encoding_max_length() == 1))
+	if (locale->ctype_is_c || locale->ctype->pattern_casefold_char)
 	{
 		p = VARDATA_ANY(pat);
 		plen = VARSIZE_ANY_EXHDR(pat);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 67299c55ed8..26a7244c3db 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1551,25 +1551,39 @@ char_is_cased(char ch, pg_locale_t locale)
 }
 
 /*
- * char_tolower_enabled()
+ * char_tolower()
  *
- * Does the provider support char_tolower()?
+ * Convert single-byte char to lowercase. Not correct for multibyte encodings,
+ * but needed for historical compatibility purposes.
  */
-bool
-char_tolower_enabled(pg_locale_t locale)
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
 {
-	return (locale->ctype->char_tolower != NULL);
+	if (locale->ctype == NULL)
+	{
+		if (ch >= 'A' && ch <= 'Z')
+			return ch + ('a' - 'A');
+		return ch;
+	}
+	return locale->ctype->char_tolower(ch, locale);
 }
 
 /*
- * char_tolower()
+ * char_toupper()
  *
- * Convert char (single-byte encoding) to lowercase.
+ * Convert single-byte char to uppercase. Not correct for multibyte encodings,
+ * but needed for historical compatibility purposes.
  */
 char
-char_tolower(unsigned char ch, pg_locale_t locale)
+char_toupper(unsigned char ch, pg_locale_t locale)
 {
-	return locale->ctype->char_tolower(ch, locale);
+	if (locale->ctype == NULL)
+	{
+		if (ch >= 'a' && ch <= 'z')
+			return ch - ('a' - 'A');
+		return ch;
+	}
+	return locale->ctype->char_toupper(ch, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 3dc611b50e1..cfef6a86377 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -169,6 +169,22 @@ wc_isxdigit_builtin(pg_wchar wc, pg_locale_t locale)
 	return pg_u_isxdigit(wc, !locale->builtin.casemap_full);
 }
 
+static char
+char_tolower_builtin(unsigned char ch, pg_locale_t locale)
+{
+	if (ch >= 'A' && ch <= 'Z')
+		return ch + ('a' - 'A');
+	return ch;
+}
+
+static char
+char_toupper_builtin(unsigned char ch, pg_locale_t locale)
+{
+	if (ch >= 'a' && ch <= 'z')
+		return ch - ('a' - 'A');
+	return ch;
+}
+
 static bool
 char_is_cased_builtin(char ch, pg_locale_t locale)
 {
@@ -203,6 +219,8 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_ispunct = wc_ispunct_builtin,
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
+	.char_tolower = char_tolower_builtin,
+	.char_toupper = char_toupper_builtin,
 	.char_is_cased = char_is_cased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index f5a0cc8fe41..449e3bbb7a6 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,6 +121,27 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
+/*
+ * ICU still depends on libc for compatibility with certain historical
+ * behavior for single-byte encodings.  XXX: consider fixing by decoding the
+ * single byte into a code point, and using u_tolower().
+ */
+static char
+char_tolower_icu(unsigned char ch, pg_locale_t locale)
+{
+	if (isupper(ch))
+		return tolower(ch);
+	return ch;
+}
+
+static char
+char_toupper_icu(unsigned char ch, pg_locale_t locale)
+{
+	if (islower(ch))
+		return toupper(ch);
+	return ch;
+}
+
 static bool
 char_is_cased_icu(char ch, pg_locale_t locale)
 {
@@ -238,6 +259,8 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_ispunct = wc_ispunct_icu,
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
+	.char_tolower = char_tolower_icu,
+	.char_toupper = char_toupper_icu,
 	.char_is_cased = char_is_cased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 716f005066a..b0428ad288e 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -251,8 +251,21 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
-	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->lt);
+	locale_t	loc = locale->lt;
+
+	if (isupper_l(ch, loc))
+		return tolower_l(ch, loc);
+	return ch;
+}
+
+static char
+char_toupper_libc(unsigned char ch, pg_locale_t locale)
+{
+	locale_t	loc = locale->lt;
+
+	if (islower_l(ch, loc))
+		return toupper_l(ch, loc);
+	return ch;
 }
 
 static bool
@@ -338,9 +351,11 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
 	.char_tolower = char_tolower_libc,
+	.char_toupper = char_toupper_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 	.max_chr = UCHAR_MAX,
+	.pattern_casefold_char = true,
 };
 
 /*
@@ -363,6 +378,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
 	.char_tolower = char_tolower_libc,
+	.char_toupper = char_toupper_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 	.max_chr = UCHAR_MAX,
@@ -384,6 +400,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
 	.char_tolower = char_tolower_libc,
+	.char_toupper = char_toupper_libc,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 683e1a0eef8..790db566e91 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -113,13 +113,13 @@ struct ctype_methods
 
 	/* required */
 	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+	char		(*char_toupper) (unsigned char ch, pg_locale_t locale);
 
 	/*
-	 * Optional. If defined, will only be called for single-byte encodings. If
-	 * not defined, or if the encoding is multibyte, will fall back to
-	 * pg_strlower().
+	 * Use byte-at-a-time case folding for case-insensitive patterns.
 	 */
-	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+	bool		pattern_casefold_char;
 
 	/*
 	 * For regex and pattern matching efficiency, the maximum char value
@@ -177,8 +177,8 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 
 extern bool char_is_cased(char ch, pg_locale_t locale);
-extern bool char_tolower_enabled(pg_locale_t locale);
 extern char char_tolower(unsigned char ch, pg_locale_t locale);
+extern char char_toupper(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.43.0

v6-0003-Avoid-global-LC_CTYPE-dependency-in-like.c.patchtext/x-patch; charset=UTF-8; name=v6-0003-Avoid-global-LC_CTYPE-dependency-in-like.c.patchDownload

From a9f365b0ebd0c71ad2fec3bba8dbf7a21b502e3a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 14:59:40 -0700
Subject: [PATCH v6 3/9] Avoid global LC_CTYPE dependency in like.c.

Call char_tolower() instead of pg_tolower().
---
 src/backend/utils/adt/like.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 37c1c86aee8..364c39cf4fb 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -96,7 +96,14 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	if (locale->ctype_is_c)
 		return pg_ascii_tolower(c);
 	else if (locale->is_default)
-		return pg_tolower(c);
+	{
+		if (c >= 'A' && c <= 'Z')
+			return c + ('a' - 'A');
+		else if (IS_HIGHBIT_SET(c))
+			return char_tolower(c, locale);
+		else
+			return c;
+	}
 	else
 		return char_tolower(c, locale);
 }
-- 
2.43.0

v6-0004-Avoid-global-LC_CTYPE-dependency-in-scansup.c.patchtext/x-patch; charset=UTF-8; name=v6-0004-Avoid-global-LC_CTYPE-dependency-in-scansup.c.patchDownload

From 0dad412eb555550dd8a5d4ef3581695328fb8f12 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 15:04:58 -0700
Subject: [PATCH v6 4/9] Avoid global LC_CTYPE dependency in scansup.c.

Call char_tolower() instead of tolower() in downcase_identifier().

The function downcase_identifier() may be called before locale support
is initialized -- e.g. during GUC processing in the postmaster -- so
if the locale is unavailable, char_tolower() uses plain ASCII
semantics.

That can result in a difference in behavior during that early stage of
processing, but previously it would have depended on the postmaster
environment variable LC_CTYPE, which would have been fragile anyway.
---
 src/backend/parser/scansup.c      |  5 +++--
 src/backend/utils/adt/pg_locale.c | 16 ++++++++++++++--
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..872075ba220 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -67,8 +68,8 @@ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 
 		if (ch >= 'A' && ch <= 'Z')
 			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch))
+			ch = char_tolower(ch, NULL);
 		result[i] = (char) ch;
 	}
 	result[i] = '\0';
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 26a7244c3db..363215edb80 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1555,11 +1555,17 @@ char_is_cased(char ch, pg_locale_t locale)
  *
  * Convert single-byte char to lowercase. Not correct for multibyte encodings,
  * but needed for historical compatibility purposes.
+ *
+ * If locale is NULL, use the default database locale. This function may be
+ * called before the database locale is initialized, in which case it uses
+ * plain ASCII semantics.
  */
 char
 char_tolower(unsigned char ch, pg_locale_t locale)
 {
-	if (locale->ctype == NULL)
+	if (locale == NULL)
+		locale = default_locale;
+	if (locale == NULL || locale->ctype == NULL)
 	{
 		if (ch >= 'A' && ch <= 'Z')
 			return ch + ('a' - 'A');
@@ -1573,11 +1579,17 @@ char_tolower(unsigned char ch, pg_locale_t locale)
  *
  * Convert single-byte char to uppercase. Not correct for multibyte encodings,
  * but needed for historical compatibility purposes.
+ *
+ * If locale is NULL, use the default database locale. This function may be
+ * called before the database locale is initialized, in which case it uses
+ * plain ASCII semantics.
  */
 char
 char_toupper(unsigned char ch, pg_locale_t locale)
 {
-	if (locale->ctype == NULL)
+	if (locale == NULL)
+		locale = default_locale;
+	if (locale == NULL || locale->ctype == NULL)
 	{
 		if (ch >= 'a' && ch <= 'z')
 			return ch - ('a' - 'A');
-- 
2.43.0

v6-0005-Avoid-global-LC_CTYPE-dependency-in-pg_locale_icu.patchtext/x-patch; charset=UTF-8; name=v6-0005-Avoid-global-LC_CTYPE-dependency-in-pg_locale_icu.patchDownload

From af958c9318ade598d74ea1e7ae720c287c83dee0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 15:12:38 -0700
Subject: [PATCH v6 5/9] Avoid global LC_CTYPE dependency in pg_locale_icu.c.

ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object in the pg_locale_t object, so that at least
it does not depend on the global LC_CTYPE setting.
---
 src/backend/utils/adt/pg_locale_icu.c | 66 ++++++++++++++++++++++-----
 src/include/utils/pg_locale.h         |  1 +
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 449e3bbb7a6..da250a23630 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,25 +121,34 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
-/*
- * ICU still depends on libc for compatibility with certain historical
- * behavior for single-byte encodings.  XXX: consider fixing by decoding the
- * single byte into a code point, and using u_tolower().
- */
 static char
 char_tolower_icu(unsigned char ch, pg_locale_t locale)
 {
-	if (isupper(ch))
-		return tolower(ch);
-	return ch;
+	locale_t	loc = locale->icu.lt;
+
+	if (loc)
+	{
+		if (isupper_l(ch, loc))
+			return tolower_l(ch, loc);
+		return ch;
+	}
+	else
+		return pg_ascii_tolower(ch);
 }
 
 static char
 char_toupper_icu(unsigned char ch, pg_locale_t locale)
 {
-	if (islower(ch))
-		return toupper(ch);
-	return ch;
+	locale_t	loc = locale->icu.lt;
+
+	if (loc)
+	{
+		if (islower_l(ch, loc))
+			return toupper_l(ch, loc);
+		return ch;
+	}
+	else
+		return pg_ascii_toupper(ch);
 }
 
 static bool
@@ -265,6 +274,29 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
 };
+
+/*
+ * ICU still depends on libc for compatibility with certain historical
+ * behavior for single-byte encodings.  See char_tolower_libc().
+ *
+ * XXX: consider fixing by decoding the single byte into a code point, and
+ * using u_tolower().
+ */
+static locale_t
+make_libc_ctype_locale(const char *ctype)
+{
+	locale_t	loc;
+
+#ifndef WIN32
+	loc = newlocale(LC_CTYPE_MASK, ctype, NULL);
+#else
+	loc = _create_locale(LC_ALL, ctype);
+#endif
+	if (!loc)
+		report_newlocale_failure(ctype);
+
+	return loc;
+}
 #endif
 
 pg_locale_t
@@ -275,11 +307,13 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	const char *iculocstr;
 	const char *icurules = NULL;
 	UCollator  *collator;
+	locale_t	loc = (locale_t) 0;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
 	{
 		HeapTuple	tp;
+		const char *ctype;
 		Datum		datum;
 		bool		isnull;
 
@@ -297,6 +331,15 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		if (!isnull)
 			icurules = TextDatumGetCString(datum);
 
+		if (pg_database_encoding_max_length() == 1)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+										   Anum_pg_database_datctype);
+			ctype = TextDatumGetCString(datum);
+
+			loc = make_libc_ctype_locale(ctype);
+		}
+
 		ReleaseSysCache(tp);
 	}
 	else
@@ -327,6 +370,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->icu.ucol = collator;
+	result->icu.lt = loc;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 790db566e91..c5978d903cc 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -165,6 +165,7 @@ struct pg_locale_struct
 		{
 			const char *locale;
 			UCollator  *ucol;
+			locale_t	lt;
 		}			icu;
 #endif
 	};
-- 
2.43.0

v6-0006-Avoid-global-LC_CTYPE-dependency-in-ltree-crc32.c.patchtext/x-patch; charset=UTF-8; name=v6-0006-Avoid-global-LC_CTYPE-dependency-in-ltree-crc32.c.patchDownload

From 5ffbafb4051e0bfd763a64a134d71644e66847a4 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 27 Oct 2025 16:23:14 -0700
Subject: [PATCH v6 6/9] Avoid global LC_CTYPE dependency in ltree/crc32.c.

Use char_tolower() instead of tolower().
---
 contrib/ltree/crc32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..5969f75c158 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -12,7 +12,7 @@
 
 #ifdef LOWER_NODE
 #include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
+#define TOLOWER(x)	char_tolower((unsigned char) (x), NULL)
 #else
 #define TOLOWER(x)	(x)
 #endif
-- 
2.43.0

v6-0007-Avoid-global-LC_CTYPE-dependency-in-fuzzystrmatch.patchtext/x-patch; charset=UTF-8; name=v6-0007-Avoid-global-LC_CTYPE-dependency-in-fuzzystrmatch.patchDownload

From 7399368ce4ee497cf26c1a1f4abfe0fdf192bbd8 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 27 Oct 2025 16:24:18 -0700
Subject: [PATCH v6 7/9] Avoid global LC_CTYPE dependency in fuzzystrmatch.

Use char_toupper() instead of toupper().
---
 contrib/fuzzystrmatch/dmetaphone.c    |  5 ++++-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 19 +++++++++++--------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..152eb4b2ddf 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 
 /* turn off assertions for embedded function */
 #define NDEBUG
@@ -116,6 +117,8 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include <assert.h>
 #include <ctype.h>
 
+#define TOUPPER(x) char_toupper(x, NULL)
+
 /* prototype for the main function we got from the perl module */
 static void DoubleMetaphone(char *str, char **codes);
 
@@ -284,7 +287,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = TOUPPER((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..03530fb73ab 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -41,6 +41,7 @@
 #include <ctype.h>
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -49,6 +50,8 @@ PG_MODULE_MAGIC_EXT(
 					.version = PG_VERSION
 );
 
+#define TOUPPER(x) char_toupper(x, NULL)
+
 /*
  * Soundex
  */
@@ -62,7 +65,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = TOUPPER((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +127,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = TOUPPER((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +304,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (TOUPPER((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (TOUPPER((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? TOUPPER((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? TOUPPER((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) TOUPPER((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +745,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) TOUPPER((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v6-0008-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v6-0008-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From 46420299904cfe1829896446a860c39f0824551e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v6 8/9] Don't include ICU headers in pg_locale.h.

Needed in order to include pg_locale.h in strcasecmp.c.
---
 src/backend/commands/collationcmds.c  |  4 ++++
 src/backend/utils/adt/formatting.c    |  4 ----
 src/backend/utils/adt/pg_locale.c     |  4 ++++
 src/backend/utils/adt/pg_locale_icu.c |  1 +
 src/backend/utils/adt/varlena.c       |  4 ++++
 src/include/utils/pg_locale.h         | 14 +++++---------
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8acbfbbeda0..a57fe93c387 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 78e19ac39ac..9d0dfc48671 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -70,10 +70,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/int.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 363215edb80..255f660c644 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -33,6 +33,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index da250a23630..0fd8171c1da 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 2c398cd9e5c..cf34a96b988 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index c5978d903cc..b668f77e1ca 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,15 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-/* only include the C APIs, to avoid errors in cpluspluscheck */
-#undef U_SHOW_CPLUSPLUS_API
-#define U_SHOW_CPLUSPLUS_API 0
-#undef U_SHOW_CPLUSPLUS_HEADER_API
-#define U_SHOW_CPLUSPLUS_HEADER_API 0
-#include <unicode/ucol.h>
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
@@ -54,6 +45,11 @@ extern void cache_locale_time(void);
 struct pg_locale_struct;
 typedef struct pg_locale_struct *pg_locale_t;
 
+#ifdef USE_ICU
+struct UCollator;
+typedef struct UCollator UCollator;
+#endif
+
 /* methods that define collation behavior */
 struct collate_methods
 {
-- 
2.43.0

v6-0009-Avoid-global-LC_CTYPE-dependency-in-strcasecmp.c-.patchtext/x-patch; charset=UTF-8; name=v6-0009-Avoid-global-LC_CTYPE-dependency-in-strcasecmp.c-.patchDownload

From 22a6d36d82a26269f406b64cd0865a360224eb63 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 27 Oct 2025 16:08:54 -0700
Subject: [PATCH v6 9/9] Avoid global LC_CTYPE dependency in strcasecmp.c for
 server.

For the server (but not the frontend), change to use
char_tolower()/char_toupper() instead of tolower()/toupper().
---
 src/port/pgstrcasecmp.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..f295df6ef51 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -28,6 +28,17 @@
 
 #include <ctype.h>
 
+/*
+ * Beware multiple evaluation hazards.
+ */
+#ifndef FRONTEND
+#include "utils/pg_locale.h"
+#define TOLOWER(x) char_tolower(x, NULL)
+#define TOUPPER(x) char_toupper(x, NULL)
+#else
+#define TOLOWER(x) (isupper(x) ? tolower(x) : x)
+#define TOUPPER(x) (islower(x) ? toupper(x) : x)
+#endif
 
 /*
  * Case-independent comparison of two null-terminated strings.
@@ -44,13 +55,13 @@ pg_strcasecmp(const char *s1, const char *s2)
 		{
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+			else if (IS_HIGHBIT_SET(ch1))
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+			else if (IS_HIGHBIT_SET(ch2))
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -77,13 +88,13 @@ pg_strncasecmp(const char *s1, const char *s2, size_t n)
 		{
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+			else if (IS_HIGHBIT_SET(ch1))
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+			else if (IS_HIGHBIT_SET(ch2))
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -106,8 +117,8 @@ pg_toupper(unsigned char ch)
 {
 	if (ch >= 'a' && ch <= 'z')
 		ch += 'A' - 'a';
-	else if (IS_HIGHBIT_SET(ch) && islower(ch))
-		ch = toupper(ch);
+	else if (IS_HIGHBIT_SET(ch))
+		ch = TOUPPER(ch);
 	return ch;
 }
 
@@ -123,8 +134,8 @@ pg_tolower(unsigned char ch)
 {
 	if (ch >= 'A' && ch <= 'Z')
 		ch += 'a' - 'A';
-	else if (IS_HIGHBIT_SET(ch) && isupper(ch))
-		ch = tolower(ch);
+	else if (IS_HIGHBIT_SET(ch))
+		ch = TOLOWER(ch);
 	return ch;
 }
 
-- 
2.43.0

#56

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Jeff Davis (#55)

Re: Remaining dependency on setlocale()

On Tue, 2025-10-28 at 17:19 -0700, Jeff Davis wrote:

Attached a new patch series, v6.

I'm eager to start committing this series so that we have plenty of
time to sort out any problems. I welcome feedback before or after
commit, and I can revert if necessary.

The goal here is to do a permanent:

setlocale(LC_CTYPE, "C")

in the postmaster, and instead use _l() variants where necessary.

Forcing the global LC_CTYPE to C will avoid platform-specific nuances
spread throughout the code, and prevent new code from accidentally
depending on platform-specific libc behavior. Instead, libc ctype
behavior will only happen through a pg_locale_t object.

It also takes us a step closer to thread safety.

LC_COLLATE was already permenently set to "C" (5e6e42e4), and most of
LC_CTYPE behavior already uses a pg_locale_t object. This series is
about removing the last few places that rely on raw calls to
tolower()/toupper() (sometimes through pg_tolower()). Where there isn't
a pg_locale_t immediately available it uses the database default locale
(which might or might not be libc).

There's another thread for what to do about strerror_r[1], which
depends on LC_CTYPE for the encoding:

/messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com

pg_localeconv_r() does depend on the LC_CTYPE for the encoding, but it
already sets it from lc_monetary and lc_numeric, without using datctype
or the global setting. Then PGLC_localeconv() converts to the database
encoding, if necessary. So it's an exception to the rule that all ctype
behavior goes through a pg_locale_t, but it's not a problem. (Aside: we
could consider this approach as a narrower fix for strerror_r(), as
well.)

There may be a loose end around plperl, as well, but not sure if this
will make it any worse.

Some other LC_* settings still rely on setlocale(), which can be
considered separately unless there's some interaction that I missed.

Note that the datcollate and datctype fields are already mostly
irrelevant for non-libc providers. We could set those to NULL, but for
now I don't intend to do that.

Regards,
Jeff Davis

#57

Daniel Verite

daniel@manitou-mail.org

2 months ago

In reply to: Jeff Davis (#56)

Re: Remaining dependency on setlocale()

Jeff Davis wrote:

The goal here is to do a permanent:

setlocale(LC_CTYPE, "C")

in the postmaster, and instead use _l() variants where necessary.

What about code in extensions? AFAIU a user can control the
locale in effect by setting the LC_CTYPE argument of
CREATE DATABASE, which ends up in the environment
of backends serving that database.
If it's forced to "C", how can an extension use locale-aware
libc functions?

In theory it's the same problem with LC_COLLATE, except
that functions like tolower()/toupper() are much more likely
to be used in extensions than strcoll().

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

#58

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Daniel Verite (#57)

Re: Remaining dependency on setlocale()

On Thu, 2025-10-30 at 21:41 +0100, Daniel Verite wrote:

What about code in extensions? AFAIU a user can control the
locale in effect by setting the LC_CTYPE argument of
CREATE DATABASE, which ends up in the environment
of backends serving that database.
If it's forced to "C", how can an extension use locale-aware
libc functions?

Extensions often need to be updated for a new major version.

The extension should call pg_database_locale(), and pass that to a
function exposed in pg_locale.h. A recent commit exposed pg_iswalpha(),
etc., so there's a reasonable set of functions that should be suitable
for most purposes.

If it's not available in pg_locale.h, or the extension really needs to
use a different LC_CTYPE for some reason, it can use an _l() variant of
the function.

Regards,
Jeff Davis

#59

Peter Eisentraut

peter@eisentraut.org

2 months ago

In reply to: Jeff Davis (#56)

Re: Remaining dependency on setlocale()

On 30.10.25 18:17, Jeff Davis wrote:

On Tue, 2025-10-28 at 17:19 -0700, Jeff Davis wrote:

Attached a new patch series, v6.

I'm eager to start committing this series so that we have plenty of
time to sort out any problems. I welcome feedback before or after
commit, and I can revert if necessary.

What is one supposed to do with this statement? You post a series of 9
patches and the next day you say you are eager to commit it? Do you not
want to give others the time to properly review this? The patches say
they are "v6", but AFAICT the previous patches "v5" and "v4" in this
thread are substantially different from these.

The goal here is to do a permanent:

setlocale(LC_CTYPE, "C")

in the postmaster, and instead use _l() variants where necessary.

Forcing the global LC_CTYPE to C will avoid platform-specific nuances
spread throughout the code, and prevent new code from accidentally
depending on platform-specific libc behavior. Instead, libc ctype
behavior will only happen through a pg_locale_t object.

It also takes us a step closer to thread safety.

At first glance, these patches seem reasonable steps into that direction.

But I'm not sure that we actually want to make that switch. It would be
good if our code is independent of the global locale settings, but that
doesn't mean that there couldn't be code in extensions, other libraries,
or other corners of the operating system that relies on this. In
general, and I haven't looked this up in the applicable standards, it
seems like a good idea to accurately declare what encoding you operate in.

#60

Daniel Verite

daniel@manitou-mail.org

2 months ago

In reply to: Jeff Davis (#58)

Re: Remaining dependency on setlocale()

Jeff Davis wrote:

On Thu, 2025-10-30 at 21:41 +0100, Daniel Verite wrote:

What about code in extensions? AFAIU a user can control the
locale in effect by setting the LC_CTYPE argument of
CREATE DATABASE, which ends up in the environment
of backends serving that database.
If it's forced to "C", how can an extension use locale-aware
libc functions?

Extensions often need to be updated for a new major version.

I think forcing the C locale is not comparable to API changes,
and the consequences are not even necessarily fixable for extensions.

For instance, consider the following function, when run in a database
with en_US.utf8 as locale.

CREATE FUNCTION lt_test(text,text) RETURNS boolean as $$
use locale; return ($_[0] lt $_[1])?1:0;
$$ LANGUAGE plperlu;

select lt_test('a', 'B');

With PG 18 it returns true
With 19devel it returns false.

This is since commit 5e6e42e4 doing that:

+	 * Collation is handled by pg_locale.c, and the behavior is dependent
on
+	 * the provider. strcoll(), etc., should not be called directly.
+	 */
+	init_locale("LC_COLLATE", LC_COLLATE, "C");
+
+	/*

Obviously libperl is not going to be updated to call Postgres
string comparisons functions instead of strcoll().
The same is probably true for other languages available as
extensions that expose POSIX locale-aware functions.

Extending this logic to LC_CTYPE will extend the breakage.

While I agree with the goal of not depending on setlocale()
in the core code for anything that should be locale-provider
dependent, making this goal leak into extensions seems
unnecessarily user-hostile. What it's saying to users is,
before v19 you could choose your locale, and starting
with v19 you'll have "C" whether you want it or not.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

#61

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Daniel Verite (#60)

Re: Remaining dependency on setlocale()

On Fri, 2025-10-31 at 15:01 +0100, Daniel Verite wrote:

Extensions often need to be updated for a new major version.

I think forcing the C locale is not comparable to API changes,
and the consequences are not even necessarily fixable for extensions.

Are we in agreement that it's fine for C extensions?

For instance, consider the following function, when run in a database
with en_US.utf8 as locale.

CREATE FUNCTION lt_test(text,text) RETURNS boolean as $$
use locale; return ($_[0] lt $_[1])?1:0;
$$ LANGUAGE plperlu;

select lt_test('a', 'B');

Are you aware of PL code that does things like that? If the database
locale is ICU, that would be at least a little bit confusing.

Regards,
Jeff Davis

#62

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Peter Eisentraut (#59)

Re: Remaining dependency on setlocale()

On Fri, 2025-10-31 at 10:40 +0100, Peter Eisentraut wrote:

But I'm not sure that we actually want to make that switch. It would
be
good if our code is independent of the global locale settings, but
that
doesn't mean that there couldn't be code in extensions, other
libraries,
or other corners of the operating system that relies on this.

This question has been brewing for a while. How should we make this
decision?

In
general, and I haven't looked this up in the applicable standards, it
seems like a good idea to accurately declare what encoding you
operate in.

One frustration (for me, at least) is that there is no way to set the
encoding without specifying the locale. LC_CTYPE=C.UTF-8 is close, but
the libc version is not available on all platforms and has some quirks.

That makes any changes to the initdb default logic difficult to sort
out. Some combinations which seem simple -- like ICU/UTF8 -- need to
handle the case when LC_CTYPE is not compatible with UTF-8, even though
the LC_CTYPE has no effect in that case.

Regards,
Jeff Davis

#63

Daniel Verite

daniel@manitou-mail.org

2 months ago

In reply to: Jeff Davis (#61)

Re: Remaining dependency on setlocale()

Jeff Davis wrote:

Extensions often need to be updated for a new major version.

I think forcing the C locale is not comparable to API changes,
and the consequences are not even necessarily fixable for extensions.

Are we in agreement that it's fine for C extensions?

No, I think we should put the database's lc_ctype
into LC_CTYPE and the database's lc_collate into
LC_COLLATE, independently of anything else,
like it was done until commit 5e6e42e.
I believe that's the purpose of these database
properties, whether the provider is libc or ICU or builtin.

Forcing "C" is a disruptive change, that IMO does
not seem compensated by substantial advantages
that would justify the disruption.

CREATE FUNCTION lt_test(text,text) RETURNS boolean as $$
use locale; return ($_[0] lt $_[1])?1:0;
$$ LANGUAGE plperlu;

select lt_test('a', 'B');

Are you aware of PL code that does things like that? If the database
locale is ICU, that would be at least a little bit confusing.

plperl users writing "use locale" should understand that
it's the libc locale, like when this code is run outside Postgres.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/

#64

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Daniel Verite (#63)

Re: Remaining dependency on setlocale()

On Mon, 2025-11-03 at 20:14 +0100, Daniel Verite wrote:

No, I think we should put the database's lc_ctype
into LC_CTYPE and the database's lc_collate into
LC_COLLATE, independently of anything else,
like it was done until commit 5e6e42e.
I believe that's the purpose of these database
properties, whether the provider is libc or ICU or builtin.

Is there a clean way to document this behavior? I have tried to improve
the documentation in this area before, but it's not easy because the
behavior is so nuanced.

The CREATE DATABASE command needs LOCALE (or LC_CTYPE/LC_COLLATE). But
it's easy to get LOCALE and ICU_LOCALE or BUILTIN_LOCALE confused. And
similarly for initdb. We could clean those up a lot if
LC_CTYPE/LC_COLLATE didn't even need to be set for non-libc providers.

Users can end up in a situation where they need to define
LC_CTYPE/LC_COLLATE, even though it has almost (but not entirely) no
effect:

/messages/by-id/f794e177b0b1ed8917e75258726ae315cf8fbbef.camel@j-davis.com

Reverting commit 5e6e42e may be the right thing, but I'd like to hear
what others have to say on this point first. In particualr, I'd like to
know whether such a revert is based on principle, a practical problem,
or just an abundance of caution.

Another option we have here is LC_CTYPE=LC_COLLATE=C for non-libc
providers, but leave it as in v17 for libc providers. If someone
explicitly wants libc behavior (by specifying something like "use
locale" in plperl), they probably want to be using the libc provider
for the database.

Regards,
Jeff Davis

#65

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Jeff Davis (#64)

Re: Remaining dependency on setlocale()

On Mon, 2025-11-03 at 11:59 -0800, Jeff Davis wrote:

Reverting commit 5e6e42e may be the right thing, but I'd like to hear
what others have to say on this point first. In particualr, I'd like
to
know whether such a revert is based on principle, a practical
problem,
or just an abundance of caution.

Another option we have here is LC_CTYPE=LC_COLLATE=C for non-libc
providers, but leave it as in v17 for libc providers. If someone
explicitly wants libc behavior (by specifying something like "use
locale" in plperl), they probably want to be using the libc provider
for the database.

Actually, there's yet another option: lc_ctype and lc_collate could be
GUCs again. If they don't affect any backend behavior, then why not?
The user would have complete control.

(Probably PGC_POSTMASTER, because one of the goals is to not rely on
setlocale() in the backends.)

Of course, we need to be sure that they are compatible with the
database encoding. We have code to do that, but not sure how reliable
it is across platforms.

Regards,
Jeff Davis

#66

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Daniel Verite (#63)

Re: Remaining dependency on setlocale()

On Mon, 2025-11-03 at 20:14 +0100, Daniel Verite wrote:

No, I think we should put the database's lc_ctype
into LC_CTYPE and the database's lc_collate into
LC_COLLATE, independently of anything else,
like it was done until commit 5e6e42e.
I believe that's the purpose of these database
properties, whether the provider is libc or ICU or builtin.

As phrased, that appears to be a promise that we will never support
thread-per-connection. setlocale() is not thread-safe, and uselocale()
is not available on NetBSD.

Regards,
Jeff Davis

#67

Peter Eisentraut

peter@eisentraut.org

2 months ago

In reply to: Daniel Verite (#63)

Re: Remaining dependency on setlocale()

On 03.11.25 20:14, Daniel Verite wrote:

No, I think we should put the database's lc_ctype
into LC_CTYPE and the database's lc_collate into
LC_COLLATE, independently of anything else,
like it was done until commit 5e6e42e.
I believe that's the purpose of these database
properties, whether the provider is libc or ICU or builtin.

Forcing "C" is a disruptive change, that IMO does
not seem compensated by substantial advantages
that would justify the disruption.

From my perspective, the difference between LC_COLLATE and LC_CTYPE is
that LC_COLLATE has a quite limited impact area. Either your code uses
strcoll() (or strxfrm()) or it does not. And if it does, you can find
all the places and adjust them, and it probably won't be that many
places. The impact area of LC_CTYPE is much larger and more complicated
and possibly interacts with other settings and third-party libraries in
ways that we don't understand yet and might not be able to change.
That's why I'm more hesitant about it. But I don't see any reason to
keep LC_COLLATE set going forward.

#68

Peter Eisentraut

peter@eisentraut.org

2 months ago

In reply to: Jeff Davis (#55)

Re: Remaining dependency on setlocale()

On 29.10.25 01:19, Jeff Davis wrote:

On Wed, 2025-07-23 at 19:11 -0700, Jeff Davis wrote:

On Fri, 2025-07-11 at 11:48 +1200, Thomas Munro wrote:

On Fri, Jul 11, 2025 at 6:22 AM Jeff Davis <pgsql@j-davis.com>
wrote:

I don't have a great windows development environment, and it
appears CI
and the buildfarm don't offer great coverage either. Can I ask
for
a
volunteer to do the windows side of this work?

Me neither but I'm willing to help with that, and have done lots of
closely related things through trial-by-CI...

Attached a new patch series, v6.

Rather than creating new global locale_t objects, this series (along
with a separate patch for NLS[1]) removes the dependency on the global
LC_CTYPE entirely. It's a bunch of small patches that replace direct
calls to tolower()/toupper() with calls into the provider.

An assumption of these patches is that, in the UTF-8 encoding, the
logic in pg_tolower()/pg_toupper() is equivalent to
pg_ascii_tolower()/pg_ascii_toupper().

I'm getting a bit confused by all these different variant function
names. Like we have now

tolower
TOLOWER
char_tolower
pg_tolower
pg_strlower
pg_ascii_tolower
downcase_identifier

and maybe more, and upper versions.

This patch set makes changes like

-           else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-               ch2 = tolower(ch2);
+           else if (IS_HIGHBIT_SET(ch2))
+               ch2 = TOLOWER(ch2);

So there is apparently some semantic difference between tolower() and
TOLOWER(), which is represented by the fact that the function name is
all upper case? Actually, it's a macro and could mean different things
in different contexts.

And there is very little documentation accompanying all these different
functions. For example, struct collate_methods and struct ctype_methods
contain barely any documentation at all.

Many of these issues are pre-existing, but I just figured it has reached
a point where we need to do something about it.

#69

Jeff Davis

pgsql@j-davis.com

2 months ago

In reply to: Peter Eisentraut (#67)

Re: Remaining dependency on setlocale()

On Wed, 2025-11-12 at 19:41 +0100, Peter Eisentraut wrote:

The impact area of LC_CTYPE is much larger and more complicated
and possibly interacts with other settings and third-party libraries
in
ways that we don't understand yet and might not be able to change.
That's why I'm more hesitant about it.

What do you think about making lc_ctype and/or lc_collate into GUCs
(like lc_messages), assuming we remove all known effects in the backend
first?

If we make the setting PGC_POSTMASTER, then it eliminates potential
problems with threading, because setlocale() would be called only once
before allowing connections. For platforms that support uselocale(), we
could allow it to be set more freely, for those who need it set to
different values in different backends.

It would also be easier to document, which would be nice. There could
be some confusion if various settings are inconsistent with each other,
but that's true currently. And we'd still enforce a ctype that's
consistent with the encoding, at least.

Regards,
Jeff Davis

#70

Jeff Davis

pgsql@j-davis.com

about 2 months ago

In reply to: Peter Eisentraut (#68)

9 attachment(s)

Re: Remaining dependency on setlocale()

On Wed, 2025-11-12 at 19:59 +0100, Peter Eisentraut wrote:

I'm getting a bit confused by all these different variant function
names.

One way of looking at it is that the functions in this patch series
mostly affect how identifiers are treated, whereas earlier collation-
related work affects how text data is treated. Ideally, they should be
similar, but for historical reasons they're not.

There are a lot of subtle behaviors for identifiers, which individually
make some sense, but over time have just become edge cases and sources
of inconsistency:

downcase_identifier() is a server function to casefold unquoted
identifiers during parsing (used by other callers, too). For non-ascii
characters in a single-byte encoding, it uses tolower(); otherwise the
lowercasing is ascii-only. Note: if an application is reliant on the
casefolding of non-ascii identifiers, such as SELECT * FROM É finding
the table named "é", that application would not work in UTF-8 even with
a dump/restore.

pg_strcasecmp() and pg_tolower() are used from the server and the
client to do case-insensitive comparison of option names. They're
supposed to use the same casing semantics as downcase_identifier(), but
they don't differentiate between single-byte and multi-byte encodings;
they just call tolower() on any non-ascii byte. That difference
probably doesn't matter for UTF8, because tolower() on a single byte in
a multibyte sequence should be a no-op, but perhaps it can matter in
non-UTF-8 multibyte encodings.

It's hard to avoid some confusion unless we're able to simplify some of
these behaviors. Let me know if you think we can tolerate some
simplifications in these edge cases without breaking anything too
badly.

Many of these issues are pre-existing, but I just figured it has
reached
a point where we need to do something about it.

Starting from first principles, individual character operations should
be mostly for parsing (e.g. tsearch) or pattern matching. Case folding
and caseless matching should be done with string operations. And
obviously all of this should be multibyte aware and work consistently
in different encodings (to the extent possible given the
representational constraints).

Our APIs in pg_locale.c do a good job of offering that, and do not
depend on the global LC_CTYPE. (There are a few things I'd like to add
or clean up, but it offers most of what we need.)

The problem, of course, is migrating the callers to use pg_locale.c
APIs without breaking things. This patch series is intended to make
everything locale-sensitive in the backend go through pg_locale_t
without any behavior changes. The benefit is that it would at least
remove the global LC_CTYPE dependency, but it ends up with hacky
compatibility methods like char_tolower(), which piles on to the
already-confusing set of tolower-like functions.

In an earlier approach:

/messages/by-id/5f95b81af1e81b28b8a9ac5929f199b2f4091fdf.camel@j-davis.com

I added a strfold_ident() method. That's easier to understand for
downcase_identifier(), but didn't solve the problems for other
callsites that depend on tolower(), and so I'd need to add more methods
for those places, and started to look unpleasant.

And earlier in this thread, I had tried the approach of using a global
variable to hold a locale representing datctype. That felt a bit weird,
though, because it mostly only matters when datlocprovider='c', and in
that case, there's already a locale_t initialized along with the
default collation. So why not find a way to go through the default
collation?

I still favor the approach used in the current patch series to remove
the dependency on the global LC_CTYPE, but I'm open to suggestion.
Whatever we do will probably require some additional hacking later
anyway.

I tried to improve the comments in pgstrcasecmp.c, and I rebased.

Regards,
Jeff Davis

Attachments:

v7-0009-Avoid-global-LC_CTYPE-dependency-in-strcasecmp.c-.patchtext/x-patch; charset=UTF-8; name=v7-0009-Avoid-global-LC_CTYPE-dependency-in-strcasecmp.c-.patchDownload

From 9ae6c6f9a0994fb694041d587acb81df45156984 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 27 Oct 2025 16:08:54 -0700
Subject: [PATCH v7 9/9] Avoid global LC_CTYPE dependency in strcasecmp.c for
 server.

For the server (but not the frontend), change to use
char_tolower()/char_toupper() instead of tolower()/toupper().
---
 src/port/pgstrcasecmp.c | 73 ++++++++++++++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 20 deletions(-)

diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..2184f132f3a 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -3,15 +3,31 @@
  * pgstrcasecmp.c
  *	   Portable SQL-like case-independent comparisons and conversions.
  *
- * SQL99 specifies Unicode-aware case normalization, which we don't yet
- * have the infrastructure for.  Instead we use tolower() to provide a
- * locale-aware translation.  However, there are some locales where this
- * is not right either (eg, Turkish may do strange things with 'i' and
- * 'I').  Our current compromise is to use tolower() for characters with
- * the high bit set, and use an ASCII-only downcasing for 7-bit
- * characters.
+ * These functions are for case-insensitive identifier matching and related
+ * functionality, and may be called either from the client or from the
+ * server. These functions are not intended for text data stored in the
+ * database; see pg_locale.h.
  *
- * NB: this code should match downcase_truncate_identifier() in scansup.c.
+ * In the server, the casing behavior is determined by the database default
+ * collation, which may be different depending on the provider and locale.
+ * In the client, casing behavior is determined by libc's tolower() and
+ * toupper(), which depends on the locale settings on the client (and
+ * therefore may not match the server's semantics).  In any case, the ASCII
+ * range is guaranteed to use plain ASCII casing semantics.
+ *
+ * SQL99 specifies Unicode-aware case normalization, but for historical
+ * compatibility reasons, we don't do so.  Instead we do char-at-a-time
+ * lowercasing to provide a locale-aware translation for single-byte
+ * encodings.  However, there are some locales where this is not right either
+ * (eg, Turkish may do strange things with 'i' and 'I').  Our current
+ * compromise is to use tolower()/char_tolower() for characters with the high
+ * bit set, and use an ASCII-only downcasing for 7-bit characters.
+ *
+ * NB: these functions are not multibyte-aware. For UTF8, the behavior
+ * degenerates to plain ASCII casing semantics.
+ *
+ * NB: this code should match downcase_truncate_identifier() in scansup.c,
+ * except that we don't check for multibyte encodings.
  *
  * We also provide strict ASCII-only case conversion functions, which can
  * be used to implement C/POSIX case folding semantics no matter what the
@@ -28,6 +44,23 @@
 
 #include <ctype.h>
 
+/*
+ * In the server, use char_tolower()/char_toupper() with the database default
+ * locale; in the client, use tolower()/toupper().
+ */
+#ifndef FRONTEND
+
+#include "utils/pg_locale.h"
+/* char_tolower()/char_toupper() don't need isupper()/islower() test */
+#define TOLOWER(x) char_tolower(x, NULL)
+#define TOUPPER(x) char_toupper(x, NULL)
+
+#else
+
+#define TOLOWER(x) (isupper(x) ? tolower(x) : x)
+#define TOUPPER(x) (islower(x) ? toupper(x) : x)
+
+#endif
 
 /*
  * Case-independent comparison of two null-terminated strings.
@@ -44,13 +77,13 @@ pg_strcasecmp(const char *s1, const char *s2)
 		{
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+			else if (IS_HIGHBIT_SET(ch1))
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+			else if (IS_HIGHBIT_SET(ch2))
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -77,13 +110,13 @@ pg_strncasecmp(const char *s1, const char *s2, size_t n)
 		{
 			if (ch1 >= 'A' && ch1 <= 'Z')
 				ch1 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch1) && isupper(ch1))
-				ch1 = tolower(ch1);
+			else if (IS_HIGHBIT_SET(ch1))
+				ch1 = TOLOWER(ch1);
 
 			if (ch2 >= 'A' && ch2 <= 'Z')
 				ch2 += 'a' - 'A';
-			else if (IS_HIGHBIT_SET(ch2) && isupper(ch2))
-				ch2 = tolower(ch2);
+			else if (IS_HIGHBIT_SET(ch2))
+				ch2 = TOLOWER(ch2);
 
 			if (ch1 != ch2)
 				return (int) ch1 - (int) ch2;
@@ -106,8 +139,8 @@ pg_toupper(unsigned char ch)
 {
 	if (ch >= 'a' && ch <= 'z')
 		ch += 'A' - 'a';
-	else if (IS_HIGHBIT_SET(ch) && islower(ch))
-		ch = toupper(ch);
+	else if (IS_HIGHBIT_SET(ch))
+		ch = TOUPPER(ch);
 	return ch;
 }
 
@@ -123,8 +156,8 @@ pg_tolower(unsigned char ch)
 {
 	if (ch >= 'A' && ch <= 'Z')
 		ch += 'a' - 'A';
-	else if (IS_HIGHBIT_SET(ch) && isupper(ch))
-		ch = tolower(ch);
+	else if (IS_HIGHBIT_SET(ch))
+		ch = TOLOWER(ch);
 	return ch;
 }
 
-- 
2.43.0

v7-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patchtext/x-patch; charset=UTF-8; name=v7-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patchDownload

From 1800d9e253a5fe29e4310c1615820ea29cb51988 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 14:58:02 -0700
Subject: [PATCH v7 1/9] Avoid global LC_CTYPE dependency in pg_locale_libc.c.

Call tolower_l() directly instead of through pg_tolower(), because the
latter depends on the global LC_CTYPE.
---
 src/backend/utils/adt/pg_locale_libc.c | 28 ++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 9c7fcd1fc7a..716f005066a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -450,7 +450,12 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (locale->is_default)
-				*p = pg_tolower((unsigned char) *p);
+			{
+				if (*p >= 'A' && *p <= 'Z')
+					*p += 'a' - 'A';
+				else if (IS_HIGHBIT_SET(*p) && isupper_l(*p, loc))
+					*p = tolower_l((unsigned char) *p, loc);
+			}
 			else
 				*p = tolower_l((unsigned char) *p, loc);
 		}
@@ -535,9 +540,19 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 			{
 				if (wasalnum)
-					*p = pg_tolower((unsigned char) *p);
+				{
+					if (*p >= 'A' && *p <= 'Z')
+						*p += 'a' - 'A';
+					else if (IS_HIGHBIT_SET(*p) && isupper_l(*p, loc))
+						*p = tolower_l((unsigned char) *p, loc);
+				}
 				else
-					*p = pg_toupper((unsigned char) *p);
+				{
+					if (*p >= 'a' && *p <= 'z')
+						*p -= 'a' - 'A';
+					else if (IS_HIGHBIT_SET(*p) && islower_l(*p, loc))
+						*p = toupper_l((unsigned char) *p, loc);
+				}
 			}
 			else
 			{
@@ -633,7 +648,12 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (locale->is_default)
-				*p = pg_toupper((unsigned char) *p);
+			{
+				if (*p >= 'a' && *p <= 'z')
+					*p -= 'a' - 'A';
+				else if (IS_HIGHBIT_SET(*p) && islower_l(*p, loc))
+					*p = toupper_l((unsigned char) *p, loc);
+			}
 			else
 				*p = toupper_l((unsigned char) *p, loc);
 		}
-- 
2.43.0

v7-0002-Define-char_tolower-char_toupper-for-all-locale-p.patchtext/x-patch; charset=UTF-8; name=v7-0002-Define-char_tolower-char_toupper-for-all-locale-p.patchDownload

From c7405ecc07e552fa9bdf4cf535b4757c2de7e9e4 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 14:51:47 -0700
Subject: [PATCH v7 2/9] Define char_tolower()/char_toupper() for all locale
 providers.

The behavior is defined for each locale provider rather than
unconditionally depending on the global LC_CTYPE setting. Needed as an
alternative for tolower()/toupper() for some callers.
---
 src/backend/utils/adt/like.c              |  4 +--
 src/backend/utils/adt/pg_locale.c         | 32 ++++++++++++++++-------
 src/backend/utils/adt/pg_locale_builtin.c | 18 +++++++++++++
 src/backend/utils/adt/pg_locale_icu.c     | 23 ++++++++++++++++
 src/backend/utils/adt/pg_locale_libc.c    | 21 +++++++++++++--
 src/include/utils/pg_locale.h             | 10 +++----
 6 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4216ac17f43..37c1c86aee8 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -209,9 +209,7 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 	 * way.
 	 */
 
-	if (locale->ctype_is_c ||
-		(char_tolower_enabled(locale) &&
-		 pg_database_encoding_max_length() == 1))
+	if (locale->ctype_is_c || locale->ctype->pattern_casefold_char)
 	{
 		p = VARDATA_ANY(pat);
 		plen = VARSIZE_ANY_EXHDR(pat);
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b14c7837938..9631d274611 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1564,25 +1564,39 @@ char_is_cased(char ch, pg_locale_t locale)
 }
 
 /*
- * char_tolower_enabled()
+ * char_tolower()
  *
- * Does the provider support char_tolower()?
+ * Convert single-byte char to lowercase. Not correct for multibyte encodings,
+ * but needed for historical compatibility purposes.
  */
-bool
-char_tolower_enabled(pg_locale_t locale)
+char
+char_tolower(unsigned char ch, pg_locale_t locale)
 {
-	return (locale->ctype->char_tolower != NULL);
+	if (locale->ctype == NULL)
+	{
+		if (ch >= 'A' && ch <= 'Z')
+			return ch + ('a' - 'A');
+		return ch;
+	}
+	return locale->ctype->char_tolower(ch, locale);
 }
 
 /*
- * char_tolower()
+ * char_toupper()
  *
- * Convert char (single-byte encoding) to lowercase.
+ * Convert single-byte char to uppercase. Not correct for multibyte encodings,
+ * but needed for historical compatibility purposes.
  */
 char
-char_tolower(unsigned char ch, pg_locale_t locale)
+char_toupper(unsigned char ch, pg_locale_t locale)
 {
-	return locale->ctype->char_tolower(ch, locale);
+	if (locale->ctype == NULL)
+	{
+		if (ch >= 'a' && ch <= 'z')
+			return ch - ('a' - 'A');
+		return ch;
+	}
+	return locale->ctype->char_toupper(ch, locale);
 }
 
 /*
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 1021e0d129b..5059b2bb59a 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -185,6 +185,22 @@ wc_isxdigit_builtin(pg_wchar wc, pg_locale_t locale)
 	return pg_u_isxdigit(to_char32(wc), !locale->builtin.casemap_full);
 }
 
+static char
+char_tolower_builtin(unsigned char ch, pg_locale_t locale)
+{
+	if (ch >= 'A' && ch <= 'Z')
+		return ch + ('a' - 'A');
+	return ch;
+}
+
+static char
+char_toupper_builtin(unsigned char ch, pg_locale_t locale)
+{
+	if (ch >= 'a' && ch <= 'z')
+		return ch - ('a' - 'A');
+	return ch;
+}
+
 static bool
 char_is_cased_builtin(char ch, pg_locale_t locale)
 {
@@ -219,6 +235,8 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_ispunct = wc_ispunct_builtin,
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
+	.char_tolower = char_tolower_builtin,
+	.char_toupper = char_toupper_builtin,
 	.char_is_cased = char_is_cased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index f5a0cc8fe41..449e3bbb7a6 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,6 +121,27 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
+/*
+ * ICU still depends on libc for compatibility with certain historical
+ * behavior for single-byte encodings.  XXX: consider fixing by decoding the
+ * single byte into a code point, and using u_tolower().
+ */
+static char
+char_tolower_icu(unsigned char ch, pg_locale_t locale)
+{
+	if (isupper(ch))
+		return tolower(ch);
+	return ch;
+}
+
+static char
+char_toupper_icu(unsigned char ch, pg_locale_t locale)
+{
+	if (islower(ch))
+		return toupper(ch);
+	return ch;
+}
+
 static bool
 char_is_cased_icu(char ch, pg_locale_t locale)
 {
@@ -238,6 +259,8 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_ispunct = wc_ispunct_icu,
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
+	.char_tolower = char_tolower_icu,
+	.char_toupper = char_toupper_icu,
 	.char_is_cased = char_is_cased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 716f005066a..b0428ad288e 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -251,8 +251,21 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 static char
 char_tolower_libc(unsigned char ch, pg_locale_t locale)
 {
-	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->lt);
+	locale_t	loc = locale->lt;
+
+	if (isupper_l(ch, loc))
+		return tolower_l(ch, loc);
+	return ch;
+}
+
+static char
+char_toupper_libc(unsigned char ch, pg_locale_t locale)
+{
+	locale_t	loc = locale->lt;
+
+	if (islower_l(ch, loc))
+		return toupper_l(ch, loc);
+	return ch;
 }
 
 static bool
@@ -338,9 +351,11 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
 	.char_tolower = char_tolower_libc,
+	.char_toupper = char_toupper_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 	.max_chr = UCHAR_MAX,
+	.pattern_casefold_char = true,
 };
 
 /*
@@ -363,6 +378,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
 	.char_tolower = char_tolower_libc,
+	.char_toupper = char_toupper_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 	.max_chr = UCHAR_MAX,
@@ -384,6 +400,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
 	.char_tolower = char_tolower_libc,
+	.char_toupper = char_toupper_libc,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 683e1a0eef8..790db566e91 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -113,13 +113,13 @@ struct ctype_methods
 
 	/* required */
 	bool		(*char_is_cased) (char ch, pg_locale_t locale);
+	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+	char		(*char_toupper) (unsigned char ch, pg_locale_t locale);
 
 	/*
-	 * Optional. If defined, will only be called for single-byte encodings. If
-	 * not defined, or if the encoding is multibyte, will fall back to
-	 * pg_strlower().
+	 * Use byte-at-a-time case folding for case-insensitive patterns.
 	 */
-	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
+	bool		pattern_casefold_char;
 
 	/*
 	 * For regex and pattern matching efficiency, the maximum char value
@@ -177,8 +177,8 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 
 extern bool char_is_cased(char ch, pg_locale_t locale);
-extern bool char_tolower_enabled(pg_locale_t locale);
 extern char char_tolower(unsigned char ch, pg_locale_t locale);
+extern char char_toupper(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.43.0

v7-0003-Avoid-global-LC_CTYPE-dependency-in-like.c.patchtext/x-patch; charset=UTF-8; name=v7-0003-Avoid-global-LC_CTYPE-dependency-in-like.c.patchDownload

From 103cfd580a098fbdd113384e721ecbda68a966ea Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 14:59:40 -0700
Subject: [PATCH v7 3/9] Avoid global LC_CTYPE dependency in like.c.

Call char_tolower() instead of pg_tolower().
---
 src/backend/utils/adt/like.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 37c1c86aee8..364c39cf4fb 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -96,7 +96,14 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 	if (locale->ctype_is_c)
 		return pg_ascii_tolower(c);
 	else if (locale->is_default)
-		return pg_tolower(c);
+	{
+		if (c >= 'A' && c <= 'Z')
+			return c + ('a' - 'A');
+		else if (IS_HIGHBIT_SET(c))
+			return char_tolower(c, locale);
+		else
+			return c;
+	}
 	else
 		return char_tolower(c, locale);
 }
-- 
2.43.0

v7-0004-Avoid-global-LC_CTYPE-dependency-in-scansup.c.patchtext/x-patch; charset=UTF-8; name=v7-0004-Avoid-global-LC_CTYPE-dependency-in-scansup.c.patchDownload

From ddc5eb689d4d5801778e1f8f5b5a1056a44a001b Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 15:04:58 -0700
Subject: [PATCH v7 4/9] Avoid global LC_CTYPE dependency in scansup.c.

Call char_tolower() instead of tolower() in downcase_identifier().

The function downcase_identifier() may be called before locale support
is initialized -- e.g. during GUC processing in the postmaster -- so
if the locale is unavailable, char_tolower() uses plain ASCII
semantics.

That can result in a difference in behavior during that early stage of
processing, but previously it would have depended on the postmaster
environment variable LC_CTYPE, which would have been fragile anyway.
---
 src/backend/parser/scansup.c      |  5 +++--
 src/backend/utils/adt/pg_locale.c | 16 ++++++++++++++--
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..872075ba220 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -67,8 +68,8 @@ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 
 		if (ch >= 'A' && ch <= 'Z')
 			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch))
+			ch = char_tolower(ch, NULL);
 		result[i] = (char) ch;
 	}
 	result[i] = '\0';
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9631d274611..7e13c601643 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1568,11 +1568,17 @@ char_is_cased(char ch, pg_locale_t locale)
  *
  * Convert single-byte char to lowercase. Not correct for multibyte encodings,
  * but needed for historical compatibility purposes.
+ *
+ * If locale is NULL, use the default database locale. This function may be
+ * called before the database locale is initialized, in which case it uses
+ * plain ASCII semantics.
  */
 char
 char_tolower(unsigned char ch, pg_locale_t locale)
 {
-	if (locale->ctype == NULL)
+	if (locale == NULL)
+		locale = default_locale;
+	if (locale == NULL || locale->ctype == NULL)
 	{
 		if (ch >= 'A' && ch <= 'Z')
 			return ch + ('a' - 'A');
@@ -1586,11 +1592,17 @@ char_tolower(unsigned char ch, pg_locale_t locale)
  *
  * Convert single-byte char to uppercase. Not correct for multibyte encodings,
  * but needed for historical compatibility purposes.
+ *
+ * If locale is NULL, use the default database locale. This function may be
+ * called before the database locale is initialized, in which case it uses
+ * plain ASCII semantics.
  */
 char
 char_toupper(unsigned char ch, pg_locale_t locale)
 {
-	if (locale->ctype == NULL)
+	if (locale == NULL)
+		locale = default_locale;
+	if (locale == NULL || locale->ctype == NULL)
 	{
 		if (ch >= 'a' && ch <= 'z')
 			return ch - ('a' - 'A');
-- 
2.43.0

v7-0005-Avoid-global-LC_CTYPE-dependency-in-pg_locale_icu.patchtext/x-patch; charset=UTF-8; name=v7-0005-Avoid-global-LC_CTYPE-dependency-in-pg_locale_icu.patchDownload

From 0d040045fd9959e01bc3bb3b54e13eb3a8bddc4a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 15:12:38 -0700
Subject: [PATCH v7 5/9] Avoid global LC_CTYPE dependency in pg_locale_icu.c.

ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object in the pg_locale_t object, so that at least
it does not depend on the global LC_CTYPE setting.
---
 src/backend/utils/adt/pg_locale_icu.c | 66 ++++++++++++++++++++++-----
 src/include/utils/pg_locale.h         |  1 +
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 449e3bbb7a6..da250a23630 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,25 +121,34 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
-/*
- * ICU still depends on libc for compatibility with certain historical
- * behavior for single-byte encodings.  XXX: consider fixing by decoding the
- * single byte into a code point, and using u_tolower().
- */
 static char
 char_tolower_icu(unsigned char ch, pg_locale_t locale)
 {
-	if (isupper(ch))
-		return tolower(ch);
-	return ch;
+	locale_t	loc = locale->icu.lt;
+
+	if (loc)
+	{
+		if (isupper_l(ch, loc))
+			return tolower_l(ch, loc);
+		return ch;
+	}
+	else
+		return pg_ascii_tolower(ch);
 }
 
 static char
 char_toupper_icu(unsigned char ch, pg_locale_t locale)
 {
-	if (islower(ch))
-		return toupper(ch);
-	return ch;
+	locale_t	loc = locale->icu.lt;
+
+	if (loc)
+	{
+		if (islower_l(ch, loc))
+			return toupper_l(ch, loc);
+		return ch;
+	}
+	else
+		return pg_ascii_toupper(ch);
 }
 
 static bool
@@ -265,6 +274,29 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
 };
+
+/*
+ * ICU still depends on libc for compatibility with certain historical
+ * behavior for single-byte encodings.  See char_tolower_libc().
+ *
+ * XXX: consider fixing by decoding the single byte into a code point, and
+ * using u_tolower().
+ */
+static locale_t
+make_libc_ctype_locale(const char *ctype)
+{
+	locale_t	loc;
+
+#ifndef WIN32
+	loc = newlocale(LC_CTYPE_MASK, ctype, NULL);
+#else
+	loc = _create_locale(LC_ALL, ctype);
+#endif
+	if (!loc)
+		report_newlocale_failure(ctype);
+
+	return loc;
+}
 #endif
 
 pg_locale_t
@@ -275,11 +307,13 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	const char *iculocstr;
 	const char *icurules = NULL;
 	UCollator  *collator;
+	locale_t	loc = (locale_t) 0;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
 	{
 		HeapTuple	tp;
+		const char *ctype;
 		Datum		datum;
 		bool		isnull;
 
@@ -297,6 +331,15 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		if (!isnull)
 			icurules = TextDatumGetCString(datum);
 
+		if (pg_database_encoding_max_length() == 1)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+										   Anum_pg_database_datctype);
+			ctype = TextDatumGetCString(datum);
+
+			loc = make_libc_ctype_locale(ctype);
+		}
+
 		ReleaseSysCache(tp);
 	}
 	else
@@ -327,6 +370,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->icu.ucol = collator;
+	result->icu.lt = loc;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 790db566e91..c5978d903cc 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -165,6 +165,7 @@ struct pg_locale_struct
 		{
 			const char *locale;
 			UCollator  *ucol;
+			locale_t	lt;
 		}			icu;
 #endif
 	};
-- 
2.43.0

v7-0006-Avoid-global-LC_CTYPE-dependency-in-ltree-crc32.c.patchtext/x-patch; charset=UTF-8; name=v7-0006-Avoid-global-LC_CTYPE-dependency-in-ltree-crc32.c.patchDownload

From 6814389449d1c1ad794d7658dff3055dd0ab7b10 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 27 Oct 2025 16:23:14 -0700
Subject: [PATCH v7 6/9] Avoid global LC_CTYPE dependency in ltree/crc32.c.

Use char_tolower() instead of tolower().
---
 contrib/ltree/crc32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..5969f75c158 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -12,7 +12,7 @@
 
 #ifdef LOWER_NODE
 #include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
+#define TOLOWER(x)	char_tolower((unsigned char) (x), NULL)
 #else
 #define TOLOWER(x)	(x)
 #endif
-- 
2.43.0

v7-0007-Avoid-global-LC_CTYPE-dependency-in-fuzzystrmatch.patchtext/x-patch; charset=UTF-8; name=v7-0007-Avoid-global-LC_CTYPE-dependency-in-fuzzystrmatch.patchDownload

From d52d9fa37c7d48b7728d16144b96b49b4085b3f0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 27 Oct 2025 16:24:18 -0700
Subject: [PATCH v7 7/9] Avoid global LC_CTYPE dependency in fuzzystrmatch.

Use char_toupper() instead of toupper().
---
 contrib/fuzzystrmatch/dmetaphone.c    |  5 ++++-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 19 +++++++++++--------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..152eb4b2ddf 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 
 /* turn off assertions for embedded function */
 #define NDEBUG
@@ -116,6 +117,8 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include <assert.h>
 #include <ctype.h>
 
+#define TOUPPER(x) char_toupper(x, NULL)
+
 /* prototype for the main function we got from the perl module */
 static void DoubleMetaphone(char *str, char **codes);
 
@@ -284,7 +287,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = TOUPPER((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..03530fb73ab 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -41,6 +41,7 @@
 #include <ctype.h>
 
 #include "utils/builtins.h"
+#include "utils/pg_locale.h"
 #include "utils/varlena.h"
 #include "varatt.h"
 
@@ -49,6 +50,8 @@ PG_MODULE_MAGIC_EXT(
 					.version = PG_VERSION
 );
 
+#define TOUPPER(x) char_toupper(x, NULL)
+
 /*
  * Soundex
  */
@@ -62,7 +65,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = TOUPPER((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +127,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = TOUPPER((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +304,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (TOUPPER((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (TOUPPER((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? TOUPPER((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? TOUPPER((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) TOUPPER((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +745,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) TOUPPER((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v7-0008-Don-t-include-ICU-headers-in-pg_locale.h.patchtext/x-patch; charset=UTF-8; name=v7-0008-Don-t-include-ICU-headers-in-pg_locale.h.patchDownload

From c5b40bcaae32d03db47c4a217611747daa4a8dac Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 9 Oct 2024 10:00:58 -0700
Subject: [PATCH v7 8/9] Don't include ICU headers in pg_locale.h.

Needed in order to include pg_locale.h in strcasecmp.c.
---
 src/backend/commands/collationcmds.c  |  4 ++++
 src/backend/utils/adt/formatting.c    |  4 ----
 src/backend/utils/adt/pg_locale.c     |  4 ++++
 src/backend/utils/adt/pg_locale_icu.c |  1 +
 src/backend/utils/adt/varlena.c       |  4 ++++
 src/include/utils/pg_locale.h         | 14 +++++---------
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8acbfbbeda0..a57fe93c387 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -14,6 +14,10 @@
  */
 #include "postgres.h"
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "access/table.h"
 #include "access/xact.h"
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 5f7b3114da7..0205075850d 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -70,10 +70,6 @@
 #include <limits.h>
 #include <wctype.h>
 
-#ifdef USE_ICU
-#include <unicode/ustring.h>
-#endif
-
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/int.h"
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7e13c601643..619689d6570 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -33,6 +33,10 @@
 
 #include <time.h>
 
+#ifdef USE_ICU
+#include <unicode/ucol.h>
+#endif
+
 #include "access/htup_details.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_database.h"
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index da250a23630..0fd8171c1da 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -13,6 +13,7 @@
 
 #ifdef USE_ICU
 #include <unicode/ucnv.h>
+#include <unicode/ucol.h>
 #include <unicode/ustring.h>
 
 /*
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 3894457ab40..3992d9af32b 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -17,6 +17,10 @@
 #include <ctype.h>
 #include <limits.h>
 
+#ifdef USE_ICU
+#include <unicode/uchar.h>
+#endif
+
 #include "access/detoast.h"
 #include "access/toast_compression.h"
 #include "catalog/pg_collation.h"
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index c5978d903cc..b668f77e1ca 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -14,15 +14,6 @@
 
 #include "mb/pg_wchar.h"
 
-#ifdef USE_ICU
-/* only include the C APIs, to avoid errors in cpluspluscheck */
-#undef U_SHOW_CPLUSPLUS_API
-#define U_SHOW_CPLUSPLUS_API 0
-#undef U_SHOW_CPLUSPLUS_HEADER_API
-#define U_SHOW_CPLUSPLUS_HEADER_API 0
-#include <unicode/ucol.h>
-#endif
-
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
@@ -54,6 +45,11 @@ extern void cache_locale_time(void);
 struct pg_locale_struct;
 typedef struct pg_locale_struct *pg_locale_t;
 
+#ifdef USE_ICU
+struct UCollator;
+typedef struct UCollator UCollator;
+#endif
+
 /* methods that define collation behavior */
 struct collate_methods
 {
-- 
2.43.0

#71

Jeff Davis

pgsql@j-davis.com

about 2 months ago

In reply to: Peter Eisentraut (#68)

7 attachment(s)

Re: Remaining dependency on setlocale()

On Wed, 2025-11-12 at 19:59 +0100, Peter Eisentraut wrote:

Many of these issues are pre-existing, but I just figured it has
reached
a point where we need to do something about it.

I tried to simplify things in this patch series, assuming that we have
some tolerance for small behavior changes.

0001: No behavior change here, same patch as before. Uncontroversial
simplification, so I plan to commit this soon.

0002: change fuzzystrmatch to use ASCII semantics. As far as I can
tell, this only affects the results of soundex(). Before the patch, in
en_US.iso885915, soundex('réd') was 'RÉ30', after the patch it's
'Ré30'. I'm not sure whether the current behavior is intentional or
not. Other functions (daitch_mokotoff, levenshtein, and metaphone) are
unaffected as far as I can tell.

0003+0005: change ltree to use case folding instead of tolower(). I
believe this is a bug fix, because the current code is inconsistent
between ltree_strncasecmp() and ltree_crc32_sz().

0006-0007: Remove char_tolower() API. This also removes the
optimization for single-byte encodings with the libc provider and a
non-C locale, but simplifies the code (the optimization is retained for
the C locale). It's possible to make the lazy-folding optimization work
for all locales without the char_tolower() API by doing something
simlar to what 0004 does for ltree. But to make this work efficiently
for Generic_Text_IC_like() would be a bit more complex: we'd need to
adjust MatchText() to be able to fold the arguments lazily, and perhaps
introduce some kind of casemapping iterator. That's already a pretty
complex function, so I'm hesitant to do that work unless the
optimization is important.

These patches don't get us quite to the point of eliminating the
LC_CTYPE dependency (there's still downcase_identifier() and
pg_strcasecmp() to worry about, and some assorted isxyz() calls to
examine), but they simplify things enough that the path forward will be
easier.

Regards,
Jeff Davis

Attachments:

v8-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patchtext/x-patch; charset=UTF-8; name=v8-0001-Avoid-global-LC_CTYPE-dependency-in-pg_locale_lib.patchDownload

From 82ef752da7f25d0d718f98ef74748a3b3555d1df Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 14:58:02 -0700
Subject: [PATCH v8 1/7] Avoid global LC_CTYPE dependency in pg_locale_libc.c.

Call tolower_l() directly instead of through pg_tolower(), because the
latter depends on the global LC_CTYPE.
---
 src/backend/utils/adt/pg_locale_libc.c | 28 ++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 9c7fcd1fc7a..716f005066a 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -450,7 +450,12 @@ strlower_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (locale->is_default)
-				*p = pg_tolower((unsigned char) *p);
+			{
+				if (*p >= 'A' && *p <= 'Z')
+					*p += 'a' - 'A';
+				else if (IS_HIGHBIT_SET(*p) && isupper_l(*p, loc))
+					*p = tolower_l((unsigned char) *p, loc);
+			}
 			else
 				*p = tolower_l((unsigned char) *p, loc);
 		}
@@ -535,9 +540,19 @@ strtitle_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 			if (locale->is_default)
 			{
 				if (wasalnum)
-					*p = pg_tolower((unsigned char) *p);
+				{
+					if (*p >= 'A' && *p <= 'Z')
+						*p += 'a' - 'A';
+					else if (IS_HIGHBIT_SET(*p) && isupper_l(*p, loc))
+						*p = tolower_l((unsigned char) *p, loc);
+				}
 				else
-					*p = pg_toupper((unsigned char) *p);
+				{
+					if (*p >= 'a' && *p <= 'z')
+						*p -= 'a' - 'A';
+					else if (IS_HIGHBIT_SET(*p) && islower_l(*p, loc))
+						*p = toupper_l((unsigned char) *p, loc);
+				}
 			}
 			else
 			{
@@ -633,7 +648,12 @@ strupper_libc_sb(char *dest, size_t destsize, const char *src, ssize_t srclen,
 		for (p = dest; *p; p++)
 		{
 			if (locale->is_default)
-				*p = pg_toupper((unsigned char) *p);
+			{
+				if (*p >= 'a' && *p <= 'z')
+					*p -= 'a' - 'A';
+				else if (IS_HIGHBIT_SET(*p) && islower_l(*p, loc))
+					*p = toupper_l((unsigned char) *p, loc);
+			}
 			else
 				*p = toupper_l((unsigned char) *p, loc);
 		}
-- 
2.43.0

v8-0002-fuzzystrmatch-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v8-0002-fuzzystrmatch-use-pg_ascii_toupper.patchDownload

From 09b3be1438da3561562042b86985439f7a206bf1 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 13:24:38 -0800
Subject: [PATCH v8 2/7] fuzzystrmatch: use pg_ascii_toupper().

fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.
---
 contrib/fuzzystrmatch/dmetaphone.c    |  2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..bb5d3e90756 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -284,7 +284,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = pg_ascii_toupper((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..7f07efc2c35 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -62,7 +62,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = pg_ascii_toupper((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +124,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = pg_ascii_toupper((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +301,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (pg_ascii_toupper((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (pg_ascii_toupper((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? pg_ascii_toupper((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? pg_ascii_toupper((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) pg_ascii_toupper((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +742,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) pg_ascii_toupper((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v8-0003-Add-define-for-UNICODE_CASEMAP_BUFSZ.patchtext/x-patch; charset=UTF-8; name=v8-0003-Add-define-for-UNICODE_CASEMAP_BUFSZ.patchDownload

From 7190291ec2acfab55f90504cc3a9c13bafc87364 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 10:11:52 -0800
Subject: [PATCH v8 3/7] Add #define for UNICODE_CASEMAP_BUFSZ.

Useful for mapping a single codepoint at a time into a
statically-allocated buffer.
---
 src/include/utils/pg_locale.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 683e1a0eef8..49fd22bf8eb 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -26,6 +26,17 @@
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
+/*
+ * Maximum number of bytes needed to map a single codepoint. Useful for
+ * mapping and processing a single input codepoint at a time with a
+ * statically-allocated buffer.
+ *
+ * With full case mapping, an input codepoint may be mapped to as many as
+ * three output codepoints. See Unicode 5.18.2, "Change in Length".
+ */
+#define UNICODE_CASEMAP_LEN		3
+#define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
+
 /* GUC settings */
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
-- 
2.43.0

v8-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patchtext/x-patch; charset=UTF-8; name=v8-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patchDownload

From 735ee6342c2365f879c47c3aa0867c58174402aa Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 7 Nov 2025 12:11:34 -0800
Subject: [PATCH v8 4/7] Allow pg_locale_t APIs to work when ctype_is_c.

Previously, the caller needed to check ctype_is_c first for some
routines and not others. Now, the APIs consistently work, and the
caller can just check ctype_is_c for optimization purposes.
---
 src/backend/utils/adt/like_support.c   | 34 ++++----------
 src/backend/utils/adt/pg_locale.c      | 63 ++++++++++++++++++++++++--
 src/backend/utils/adt/pg_locale_libc.c |  3 ++
 3 files changed, 72 insertions(+), 28 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 999f23f86d5..0debccfa67b 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -99,8 +99,6 @@ static Selectivity like_selectivity(const char *patt, int pattlen,
 static Selectivity regex_selectivity(const char *patt, int pattlen,
 									 bool case_insensitive,
 									 int fixed_prefix_len);
-static int	pattern_char_isalpha(char c, bool is_multibyte,
-								 pg_locale_t locale);
 static Const *make_greater_string(const Const *str_const, FmgrInfo *ltproc,
 								  Oid collation);
 static Datum string_to_datum(const char *str, Oid datatype);
@@ -995,7 +993,6 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 	Oid			typeid = patt_const->consttype;
 	int			pos,
 				match_pos;
-	bool		is_multibyte = (pg_database_encoding_max_length() > 1);
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1055,9 +1052,16 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				break;
 		}
 
-		/* Stop if case-varying character (it's sort of a wildcard) */
-		if (case_insensitive &&
-			pattern_char_isalpha(patt[pos], is_multibyte, locale))
+		/*
+		 * Stop if case-varying character (it's sort of a wildcard).
+		 *
+		 * In multibyte character sets or with non-libc providers, we can't
+		 * use isalpha, and it does not seem worth trying to convert to
+		 * wchar_t or char32_t.  Instead, just pass the single byte to the
+		 * provider, which will assume any non-ASCII char is potentially
+		 * case-varying.
+		 */
+		if (case_insensitive && char_is_cased(patt[pos], locale))
 			break;
 
 		match[match_pos++] = patt[pos];
@@ -1481,24 +1485,6 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
 	return sel;
 }
 
-/*
- * Check whether char is a letter (and, hence, subject to case-folding)
- *
- * In multibyte character sets or with ICU, we can't use isalpha, and it does
- * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha.
- * Instead, just assume any non-ASCII char is potentially case-varying, and
- * hard-wire knowledge of which ASCII chars are letters.
- */
-static int
-pattern_char_isalpha(char c, bool is_multibyte,
-					 pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else
-		return char_is_cased(c, locale);
-}
-
 
 /*
  * For bytea, the increment function need only increment the current byte
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b14c7837938..9319fb633b6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1261,6 +1261,17 @@ size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_tolower(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
 	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
@@ -1268,6 +1279,29 @@ size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+	{
+		bool		wasalnum = false;
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < Min(srclen, dstsize); i++)
+		{
+			char		c = src[i];
+
+			if (wasalnum)
+				dst[i] = pg_ascii_tolower(c);
+			else
+				dst[i] = pg_ascii_toupper(c);
+
+			wasalnum = ((c >= '0' && c <= '9') ||
+						(c >= 'A' && c <= 'Z') ||
+						(c >= 'a' && c <= 'z'));
+		}
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
 	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
@@ -1275,6 +1309,17 @@ size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_toupper(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
 	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
@@ -1282,10 +1327,18 @@ size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->ctype->strfold)
-		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
-	else
-		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_tolower(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
+	return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1560,6 +1613,8 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 bool
 char_is_cased(char ch, pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 716f005066a..942454de4ed 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -326,6 +326,7 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
+	.strfold = strlower_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -351,6 +352,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -372,6 +374,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
-- 
2.43.0

v8-0005-Fix-inconsistency-between-ltree_strncasecmp-and-l.patchtext/x-patch; charset=UTF-8; name=v8-0005-Fix-inconsistency-between-ltree_strncasecmp-and-l.patchDownload

From 9cc6025640a2fdb5bee4a84598a3fdb352d81954 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 10:20:36 -0800
Subject: [PATCH v8 5/7] Fix inconsistency between ltree_strncasecmp() and
 ltree_crc32_sz().

Previously, ltree_strncasecmp() used lowercasing with the default
collation; while ltree_crc32_sz used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding is single-byte.

Change both to use casefolding with the default collation.
---
 contrib/ltree/crc32.c     | 46 ++++++++++++++++++++++++++++++++-------
 contrib/ltree/lquery_op.c | 31 ++++++++++++++++++++++++--
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..3918d4a0ec2 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -10,31 +10,61 @@
 #include "postgres.h"
 #include "ltree.h"
 
+#include "crc32.h"
+#include "utils/pg_crc.h"
 #ifdef LOWER_NODE
-#include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
-#else
-#define TOLOWER(x)	(x)
+#include "utils/pg_locale.h"
 #endif
 
-#include "crc32.h"
-#include "utils/pg_crc.h"
+#ifdef LOWER_NODE
 
 unsigned int
 ltree_crc32_sz(const char *buf, int size)
 {
 	pg_crc32	crc;
 	const char *p = buf;
+	static pg_locale_t locale = NULL;
+
+	if (!locale)
+		locale = pg_database_locale();
 
 	INIT_TRADITIONAL_CRC32(crc);
 	while (size > 0)
 	{
-		char		c = (char) TOLOWER(*p);
+		char		foldstr[UNICODE_CASEMAP_BUFSZ];
+		int			srclen = pg_mblen(p);
+		size_t		foldlen;
+
+		/* fold one codepoint at a time */
+		foldlen = pg_strfold(foldstr, UNICODE_CASEMAP_BUFSZ, p, srclen,
+							 locale);
+
+		COMP_TRADITIONAL_CRC32(crc, foldstr, foldlen);
+
+		size -= srclen;
+		p += srclen;
+	}
+	FIN_TRADITIONAL_CRC32(crc);
+	return (unsigned int) crc;
+}
+
+#else
 
-		COMP_TRADITIONAL_CRC32(crc, &c, 1);
+unsigned int
+ltree_crc32_sz(const char *buf, int size)
+{
+	pg_crc32	crc;
+	const char *p = buf;
+
+	INIT_TRADITIONAL_CRC32(crc);
+	while (size > 0)
+	{
+		COMP_TRADITIONAL_CRC32(crc, p, 1);
 		size--;
 		p++;
 	}
 	FIN_TRADITIONAL_CRC32(crc);
 	return (unsigned int) crc;
 }
+
+#endif							/* !LOWER_NODE */
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index a6466f575fd..d6754eb613f 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -77,10 +77,37 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 int
 ltree_strncasecmp(const char *a, const char *b, size_t s)
 {
-	char	   *al = str_tolower(a, s, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, s, DEFAULT_COLLATION_OID);
+	static pg_locale_t locale = NULL;
+	size_t		al_sz = s + 1;
+	char	   *al = palloc(al_sz);
+	size_t		bl_sz = s + 1;
+	char	   *bl = palloc(bl_sz);
+	size_t		needed;
 	int			res;
 
+	if (!locale)
+		locale = pg_database_locale();
+
+	needed = pg_strfold(al, al_sz, a, s, locale);
+	if (needed + 1 > al_sz)
+	{
+		/* grow buffer if needed and retry */
+		al_sz = needed + 1;
+		al = repalloc(al, al_sz);
+		needed = pg_strfold(al, al_sz, a, s, locale);
+		Assert(needed + 1 <= al_sz);
+	}
+
+	needed = pg_strfold(bl, bl_sz, b, s, locale);
+	if (needed + 1 > bl_sz)
+	{
+		/* grow buffer if needed and retry */
+		bl_sz = needed + 1;
+		bl = repalloc(bl, bl_sz);
+		needed = pg_strfold(bl, bl_sz, b, s, locale);
+		Assert(needed + 1 <= bl_sz);
+	}
+
 	res = strncmp(al, bl, s);
 
 	pfree(al);
-- 
2.43.0

v8-0006-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v8-0006-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patchDownload

From 709c38c8a3b992e5ddf2c6d93a838d7ef588c0f9 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 20 Nov 2025 13:09:26 -0800
Subject: [PATCH v8 6/7] Inline pg_ascii_tolower() and pg_ascii_toupper().

---
 src/include/port.h      | 25 +++++++++++++++++++++++--
 src/port/pgstrcasecmp.c | 26 --------------------------
 2 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/src/include/port.h b/src/include/port.h
index 3964d3b1293..159c2bcd7e3 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -169,8 +169,29 @@ extern int	pg_strcasecmp(const char *s1, const char *s2);
 extern int	pg_strncasecmp(const char *s1, const char *s2, size_t n);
 extern unsigned char pg_toupper(unsigned char ch);
 extern unsigned char pg_tolower(unsigned char ch);
-extern unsigned char pg_ascii_toupper(unsigned char ch);
-extern unsigned char pg_ascii_tolower(unsigned char ch);
+
+/*
+ * Fold a character to upper case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_toupper(unsigned char ch)
+{
+	if (ch >= 'a' && ch <= 'z')
+		ch += 'A' - 'a';
+	return ch;
+}
+
+/*
+ * Fold a character to lower case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_tolower(unsigned char ch)
+{
+	if (ch >= 'A' && ch <= 'Z')
+		ch += 'a' - 'A';
+	return ch;
+}
+
 
 /*
  * Beginning in v12, we always replace snprintf() and friends with our own
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..17e93180381 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -13,10 +13,6 @@
  *
  * NB: this code should match downcase_truncate_identifier() in scansup.c.
  *
- * We also provide strict ASCII-only case conversion functions, which can
- * be used to implement C/POSIX case folding semantics no matter what the
- * C library thinks the locale is.
- *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  *
@@ -127,25 +123,3 @@ pg_tolower(unsigned char ch)
 		ch = tolower(ch);
 	return ch;
 }
-
-/*
- * Fold a character to upper case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_toupper(unsigned char ch)
-{
-	if (ch >= 'a' && ch <= 'z')
-		ch += 'A' - 'a';
-	return ch;
-}
-
-/*
- * Fold a character to lower case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_tolower(unsigned char ch)
-{
-	if (ch >= 'A' && ch <= 'Z')
-		ch += 'a' - 'A';
-	return ch;
-}
-- 
2.43.0

v8-0007-Remove-char_tolower-API.patchtext/x-patch; charset=UTF-8; name=v8-0007-Remove-char_tolower-API.patchDownload

From 81948ecc1e3c4f1b4bd79ecd96ac151e2332f3df Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 18:16:41 -0800
Subject: [PATCH v8 7/7] Remove char_tolower() API.

It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.
---
 src/backend/utils/adt/like.c           | 42 +++++++++-----------------
 src/backend/utils/adt/like_match.c     | 18 ++++++-----
 src/backend/utils/adt/pg_locale.c      | 22 --------------
 src/backend/utils/adt/pg_locale_libc.c | 10 ------
 src/include/utils/pg_locale.h          |  9 ------
 5 files changed, 25 insertions(+), 76 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4216ac17f43..4a7fc583c71 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -43,8 +43,8 @@ static text *MB_do_like_escape(text *pat, text *esc);
 static int	UTF8_MatchText(const char *t, int tlen, const char *p, int plen,
 						   pg_locale_t locale);
 
-static int	SB_IMatchText(const char *t, int tlen, const char *p, int plen,
-						  pg_locale_t locale);
+static int	C_IMatchText(const char *t, int tlen, const char *p, int plen,
+						 pg_locale_t locale);
 
 static int	GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation);
 static int	Generic_Text_IC_like(text *str, text *pat, Oid collation);
@@ -84,22 +84,10 @@ wchareq(const char *p1, const char *p2)
  * of getting a single character transformed to the system's wchar_t format.
  * So now, we just downcase the strings using lower() and apply regular LIKE
  * comparison.  This should be revisited when we install better locale support.
- */
-
-/*
- * We do handle case-insensitive matching for single-byte encodings using
+ *
+ * We do handle case-insensitive matching for the C locale using
  * fold-on-the-fly processing, however.
  */
-static char
-SB_lower_char(unsigned char c, pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return pg_ascii_tolower(c);
-	else if (locale->is_default)
-		return pg_tolower(c);
-	else
-		return char_tolower(c, locale);
-}
 
 
 #define NextByte(p, plen)	((p)++, (plen)--)
@@ -131,9 +119,9 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 #include "like_match.c"
 
 /* setup to compile like_match.c for single byte case insensitive matches */
-#define MATCH_LOWER(t, locale) SB_lower_char((unsigned char) (t), locale)
+#define MATCH_LOWER
 #define NextChar(p, plen) NextByte((p), (plen))
-#define MatchText SB_IMatchText
+#define MatchText C_IMatchText
 
 #include "like_match.c"
 
@@ -202,22 +190,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
-	 * For efficiency reasons, in the single byte case we don't call lower()
-	 * on the pattern and text, but instead call SB_lower_char on each
-	 * character.  In the multi-byte case we don't have much choice :-(. Also,
-	 * ICU does not support single-character case folding, so we go the long
-	 * way.
+	 * For efficiency reasons, in the C locale we don't call lower() on the
+	 * pattern and text, but instead call SB_lower_char on each character.
 	 */
 
-	if (locale->ctype_is_c ||
-		(char_tolower_enabled(locale) &&
-		 pg_database_encoding_max_length() == 1))
+	if (locale->ctype_is_c)
 	{
 		p = VARDATA_ANY(pat);
 		plen = VARSIZE_ANY_EXHDR(pat);
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
+		return C_IMatchText(s, slen, p, plen, locale);
 	}
 	else
 	{
@@ -229,10 +212,13 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 													 PointerGetDatum(str)));
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
+
 		if (GetDatabaseEncoding() == PG_UTF8)
 			return UTF8_MatchText(s, slen, p, plen, 0);
-		else
+		else if (pg_database_encoding_max_length() > 1)
 			return MB_MatchText(s, slen, p, plen, 0);
+		else
+			return SB_MatchText(s, slen, p, plen, 0);
 	}
 }
 
diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index 892f8a745ea..54846c9541d 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -70,10 +70,14 @@
  *--------------------
  */
 
+/*
+ * MATCH_LOWER is defined for ILIKE in the C locale as an optimization. Other
+ * locales must casefold the inputs before matching.
+ */
 #ifdef MATCH_LOWER
-#define GETCHAR(t, locale) MATCH_LOWER(t, locale)
+#define GETCHAR(t) pg_ascii_tolower(t)
 #else
-#define GETCHAR(t, locale) (t)
+#define GETCHAR(t) (t)
 #endif
 
 static int
@@ -105,7 +109,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 						 errmsg("LIKE pattern must not end with escape character")));
-			if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+			if (GETCHAR(*p) != GETCHAR(*t))
 				return LIKE_FALSE;
 		}
 		else if (*p == '%')
@@ -167,14 +171,14 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 							 errmsg("LIKE pattern must not end with escape character")));
-				firstpat = GETCHAR(p[1], locale);
+				firstpat = GETCHAR(p[1]);
 			}
 			else
-				firstpat = GETCHAR(*p, locale);
+				firstpat = GETCHAR(*p);
 
 			while (tlen > 0)
 			{
-				if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
+				if (GETCHAR(*t) == firstpat || (locale && !locale->deterministic))
 				{
 					int			matched = MatchText(t, tlen, p, plen, locale);
 
@@ -342,7 +346,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					NextChar(t1, t1len);
 			}
 		}
-		else if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+		else if (GETCHAR(*p) != GETCHAR(*t))
 		{
 			/* non-wildcard pattern char fails to match text char */
 			return LIKE_FALSE;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9319fb633b6..b3afa6cad6c 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1618,28 +1618,6 @@ char_is_cased(char ch, pg_locale_t locale)
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
-/*
- * char_tolower_enabled()
- *
- * Does the provider support char_tolower()?
- */
-bool
-char_tolower_enabled(pg_locale_t locale)
-{
-	return (locale->ctype->char_tolower != NULL);
-}
-
-/*
- * char_tolower()
- *
- * Convert char (single-byte encoding) to lowercase.
- */
-char
-char_tolower(unsigned char ch, pg_locale_t locale)
-{
-	return locale->ctype->char_tolower(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 942454de4ed..3407e15712b 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -248,13 +248,6 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
-static char
-char_tolower_libc(unsigned char ch, pg_locale_t locale)
-{
-	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->lt);
-}
-
 static bool
 char_is_cased_libc(char ch, pg_locale_t locale)
 {
@@ -338,7 +331,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 	.max_chr = UCHAR_MAX,
@@ -364,7 +356,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 	.max_chr = UCHAR_MAX,
@@ -386,7 +377,6 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 49fd22bf8eb..5e21b517e96 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -125,13 +125,6 @@ struct ctype_methods
 	/* required */
 	bool		(*char_is_cased) (char ch, pg_locale_t locale);
 
-	/*
-	 * Optional. If defined, will only be called for single-byte encodings. If
-	 * not defined, or if the encoding is multibyte, will fall back to
-	 * pg_strlower().
-	 */
-	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
-
 	/*
 	 * For regex and pattern matching efficiency, the maximum char value
 	 * supported by the above methods. If zero, limit is set by regex code.
@@ -188,8 +181,6 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 
 extern bool char_is_cased(char ch, pg_locale_t locale);
-extern bool char_tolower_enabled(pg_locale_t locale);
-extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.43.0

#72

Jeff Davis

pgsql@j-davis.com

about 2 months ago

In reply to: Jeff Davis (#71)

11 attachment(s)

Re: Remaining dependency on setlocale()

On Thu, 2025-11-20 at 16:58 -0800, Jeff Davis wrote:

On Wed, 2025-11-12 at 19:59 +0100, Peter Eisentraut wrote:

Many of these issues are pre-existing, but I just figured it has
reached
a point where we need to do something about it.

I tried to simplify things in this patch series, assuming that we
have
some tolerance for small behavior changes.

0001: No behavior change here, same patch as before. Uncontroversial
simplification, so I plan to commit this soon.

Committed.

New series attached, which I tried to put in an order that would be
reasonable for commit.

0001-0004: Pure refactoring patches. I intend to commit a couple of
these soon.

0005: No behavioral change, and not much change at all. Computes the
"max_chr" for regexes (a performance optimization for low codepoints)
more consistently and simply based on the encoding.

0006: fixes longstanding ltree bug due to inconsistency between the
database locale and the global LC_CTYPE setting when using a non-libc
provider. The end result is also cleaner: use the database locale
consistently, like tsearch. I don't intend to backport this, unless
someone thinks it should be, but it should come with a release note to
reindex ltree indexes if using a non-libc provider.

0007: remove the char_tolower() API completely. We'd lose a pattern
matching optimization for single-byte encodings with libc and a non-C
locale, but it's a significant simplification. We could go even further
and change this to use casefolding rather than lower(), but that seems
like a separate change.

0008: Multibyte-aware extraction of pattern prefixes. The previous code
gave up on any byte that it didn't understand, which made prefixes
unnecessarily short. This patch is also cleaner.

0009: Changes fuzzystrmatch to use pg_ascii_toupper(). Most functions
in the extension are unaffected, but soundex() can be affected, and I'm
not sure what exactly it's supposed to do with non-ASCII.

0010: For downcase_identifier(), use a new provider-specific
pg_strfold_ident() method. The ICU version of this method is a work-in-
progress, because right now it depends on libc. I suppose it should
decode to UTF-32, then go through u_tolower(), then re-encode -- but
can the re-encoding fail? In any case, it would be a behavior change
for identifier casefolding with ICU and a single-byte encoding, which
is probably OK but the risk is non-zero.

0011: POC patch to introduce lc_collate GUC. It would only affect
extensions, PLs, libraries, or other non-core code that happens to call
strcoll() or strxfrm(). This would address Daniel's complaint, but it's
more flexible. And by being a GUC, it's clear that we shouldn't depend
on it for any stored data. We can do something similar for LC_CTYPE
after we eliminate dependencies in core code.

Regards,
Jeff Davis

Attachments:

v9-0001-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v9-0001-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patchDownload

From 6e434d1f13f50654a89d19528b8f498c6cd10cca Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 20 Nov 2025 13:09:26 -0800
Subject: [PATCH v9 01/11] Inline pg_ascii_tolower() and pg_ascii_toupper().

---
 src/include/port.h      | 25 +++++++++++++++++++++++--
 src/port/pgstrcasecmp.c | 26 --------------------------
 2 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/src/include/port.h b/src/include/port.h
index 3964d3b1293..159c2bcd7e3 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -169,8 +169,29 @@ extern int	pg_strcasecmp(const char *s1, const char *s2);
 extern int	pg_strncasecmp(const char *s1, const char *s2, size_t n);
 extern unsigned char pg_toupper(unsigned char ch);
 extern unsigned char pg_tolower(unsigned char ch);
-extern unsigned char pg_ascii_toupper(unsigned char ch);
-extern unsigned char pg_ascii_tolower(unsigned char ch);
+
+/*
+ * Fold a character to upper case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_toupper(unsigned char ch)
+{
+	if (ch >= 'a' && ch <= 'z')
+		ch += 'A' - 'a';
+	return ch;
+}
+
+/*
+ * Fold a character to lower case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_tolower(unsigned char ch)
+{
+	if (ch >= 'A' && ch <= 'Z')
+		ch += 'a' - 'A';
+	return ch;
+}
+
 
 /*
  * Beginning in v12, we always replace snprintf() and friends with our own
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..17e93180381 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -13,10 +13,6 @@
  *
  * NB: this code should match downcase_truncate_identifier() in scansup.c.
  *
- * We also provide strict ASCII-only case conversion functions, which can
- * be used to implement C/POSIX case folding semantics no matter what the
- * C library thinks the locale is.
- *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  *
@@ -127,25 +123,3 @@ pg_tolower(unsigned char ch)
 		ch = tolower(ch);
 	return ch;
 }
-
-/*
- * Fold a character to upper case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_toupper(unsigned char ch)
-{
-	if (ch >= 'a' && ch <= 'z')
-		ch += 'A' - 'a';
-	return ch;
-}
-
-/*
- * Fold a character to lower case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_tolower(unsigned char ch)
-{
-	if (ch >= 'A' && ch <= 'Z')
-		ch += 'a' - 'A';
-	return ch;
-}
-- 
2.43.0

v9-0002-Add-define-for-UNICODE_CASEMAP_BUFSZ.patchtext/x-patch; charset=UTF-8; name=v9-0002-Add-define-for-UNICODE_CASEMAP_BUFSZ.patchDownload

From 5538939cae6210a4ee702253b0287e44993b98b4 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 10:11:52 -0800
Subject: [PATCH v9 02/11] Add #define for UNICODE_CASEMAP_BUFSZ.

Useful for mapping a single codepoint at a time into a
statically-allocated buffer.
---
 src/include/utils/pg_locale.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 683e1a0eef8..49fd22bf8eb 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -26,6 +26,17 @@
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
+/*
+ * Maximum number of bytes needed to map a single codepoint. Useful for
+ * mapping and processing a single input codepoint at a time with a
+ * statically-allocated buffer.
+ *
+ * With full case mapping, an input codepoint may be mapped to as many as
+ * three output codepoints. See Unicode 5.18.2, "Change in Length".
+ */
+#define UNICODE_CASEMAP_LEN		3
+#define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
+
 /* GUC settings */
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
-- 
2.43.0

v9-0003-Change-some-callers-to-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v9-0003-Change-some-callers-to-use-pg_ascii_toupper.patchDownload

From 1bd58b5aaf092257fa09a456fdb859328548afeb Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 09:09:06 -0800
Subject: [PATCH v9 03/11] Change some callers to use pg_ascii_toupper().

The input is ASCII anyway, so it's better to be clear that it's not
locale-dependent.
---
 src/backend/access/transam/xlogfuncs.c | 2 +-
 src/backend/utils/adt/cash.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e45fce43ed..a50345f9bf7 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -479,7 +479,7 @@ pg_split_walfile_name(PG_FUNCTION_ARGS)
 
 	/* Capitalize WAL file name. */
 	for (p = fname_upper; *p; p++)
-		*p = pg_toupper((unsigned char) *p);
+		*p = pg_ascii_toupper((unsigned char) *p);
 
 	if (!IsXLogFileName(fname_upper))
 		ereport(ERROR,
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index 611d23f3cb0..623f6eec056 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -1035,7 +1035,7 @@ cash_words(PG_FUNCTION_ARGS)
 	appendStringInfoString(&buf, m0 == 1 ? " cent" : " cents");
 
 	/* capitalize output */
-	buf.data[0] = pg_toupper((unsigned char) buf.data[0]);
+	buf.data[0] = pg_ascii_toupper((unsigned char) buf.data[0]);
 
 	/* return as text datum */
 	res = cstring_to_text_with_len(buf.data, buf.len);
-- 
2.43.0

v9-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patchtext/x-patch; charset=UTF-8; name=v9-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patchDownload

From 22419241e495c163d40e893d818741db7b1f3c78 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 7 Nov 2025 12:11:34 -0800
Subject: [PATCH v9 04/11] Allow pg_locale_t APIs to work when ctype_is_c.

Previously, the caller needed to check ctype_is_c first for some
routines and not others. Now, the APIs consistently work, and the
caller can just check ctype_is_c for optimization purposes.
---
 src/backend/utils/adt/like_support.c   | 34 ++++----------
 src/backend/utils/adt/pg_locale.c      | 63 ++++++++++++++++++++++++--
 src/backend/utils/adt/pg_locale_libc.c |  3 ++
 3 files changed, 72 insertions(+), 28 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 999f23f86d5..0debccfa67b 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -99,8 +99,6 @@ static Selectivity like_selectivity(const char *patt, int pattlen,
 static Selectivity regex_selectivity(const char *patt, int pattlen,
 									 bool case_insensitive,
 									 int fixed_prefix_len);
-static int	pattern_char_isalpha(char c, bool is_multibyte,
-								 pg_locale_t locale);
 static Const *make_greater_string(const Const *str_const, FmgrInfo *ltproc,
 								  Oid collation);
 static Datum string_to_datum(const char *str, Oid datatype);
@@ -995,7 +993,6 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 	Oid			typeid = patt_const->consttype;
 	int			pos,
 				match_pos;
-	bool		is_multibyte = (pg_database_encoding_max_length() > 1);
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1055,9 +1052,16 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				break;
 		}
 
-		/* Stop if case-varying character (it's sort of a wildcard) */
-		if (case_insensitive &&
-			pattern_char_isalpha(patt[pos], is_multibyte, locale))
+		/*
+		 * Stop if case-varying character (it's sort of a wildcard).
+		 *
+		 * In multibyte character sets or with non-libc providers, we can't
+		 * use isalpha, and it does not seem worth trying to convert to
+		 * wchar_t or char32_t.  Instead, just pass the single byte to the
+		 * provider, which will assume any non-ASCII char is potentially
+		 * case-varying.
+		 */
+		if (case_insensitive && char_is_cased(patt[pos], locale))
 			break;
 
 		match[match_pos++] = patt[pos];
@@ -1481,24 +1485,6 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
 	return sel;
 }
 
-/*
- * Check whether char is a letter (and, hence, subject to case-folding)
- *
- * In multibyte character sets or with ICU, we can't use isalpha, and it does
- * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha.
- * Instead, just assume any non-ASCII char is potentially case-varying, and
- * hard-wire knowledge of which ASCII chars are letters.
- */
-static int
-pattern_char_isalpha(char c, bool is_multibyte,
-					 pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else
-		return char_is_cased(c, locale);
-}
-
 
 /*
  * For bytea, the increment function need only increment the current byte
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b14c7837938..9319fb633b6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1261,6 +1261,17 @@ size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_tolower(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
 	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
@@ -1268,6 +1279,29 @@ size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+	{
+		bool		wasalnum = false;
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < Min(srclen, dstsize); i++)
+		{
+			char		c = src[i];
+
+			if (wasalnum)
+				dst[i] = pg_ascii_tolower(c);
+			else
+				dst[i] = pg_ascii_toupper(c);
+
+			wasalnum = ((c >= '0' && c <= '9') ||
+						(c >= 'A' && c <= 'Z') ||
+						(c >= 'a' && c <= 'z'));
+		}
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
 	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
@@ -1275,6 +1309,17 @@ size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_toupper(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
 	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
@@ -1282,10 +1327,18 @@ size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->ctype->strfold)
-		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
-	else
-		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_tolower(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
+	return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1560,6 +1613,8 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 bool
 char_is_cased(char ch, pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 716f005066a..942454de4ed 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -326,6 +326,7 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
+	.strfold = strlower_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -351,6 +352,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -372,6 +374,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
-- 
2.43.0

v9-0005-Make-regex-max_chr-depend-on-encoding-not-provide.patchtext/x-patch; charset=UTF-8; name=v9-0005-Make-regex-max_chr-depend-on-encoding-not-provide.patchDownload

From b1add0b2b4c9785b56e4dc222a89ec8f43b9c586 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Nov 2025 12:41:47 -0800
Subject: [PATCH v9 05/11] Make regex "max_chr" depend on encoding, not
 provider.

The previous per-provider "max_chr" field was there as a hack to
preserve the exact prior behavior, which depended on the
provider. Change to depend on the encoding, which makes more sense,
and remove the per-provider logic.

The only difference is for ICU: previously it always used
MAX_SIMPLE_CHR (0x7FF) regardless of the encoding; whereas now it will
match libc and use MAX_SIMPLE_CHR for UTF-8, and MAX_UCHAR for other
encodings. That's possibly a loss for non-UTF8 multibyte encodings,
but a win for single-byte encodings. Regardless, this distinction was
not worth the complexity.
---
 src/backend/regex/regc_pg_locale.c     | 18 ++++++++++--------
 src/backend/utils/adt/pg_locale_libc.c |  2 --
 src/include/utils/pg_locale.h          |  6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 4698f110a0c..bb0e3f1d139 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -320,16 +320,18 @@ regc_ctype_get_cache(regc_wc_probefunc probefunc, int cclasscode)
 		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
 	}
+	else if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+	}
 	else
 	{
-		if (pg_regex_locale->ctype->max_chr != 0 &&
-			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
-		{
-			max_chr = pg_regex_locale->ctype->max_chr;
-			pcc->cv.cclasscode = -1;
-		}
-		else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#if MAX_SIMPLE_CHR >= UCHAR_MAX
+		max_chr = (pg_wchar) UCHAR_MAX;
+		pcc->cv.cclasscode = -1;
+#else
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#endif
 	}
 
 	/*
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 942454de4ed..a55167b0697 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -341,7 +341,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 /*
@@ -367,7 +366,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 static const struct ctype_methods ctype_methods_libc_utf8 = {
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 49fd22bf8eb..40e58cc52b8 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -131,12 +131,6 @@ struct ctype_methods
 	 * pg_strlower().
 	 */
 	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
-
-	/*
-	 * For regex and pattern matching efficiency, the maximum char value
-	 * supported by the above methods. If zero, limit is set by regex code.
-	 */
-	pg_wchar	max_chr;
 };
 
 /*
-- 
2.43.0

v9-0006-Fix-inconsistency-between-ltree_strncasecmp-and-l.patchtext/x-patch; charset=UTF-8; name=v9-0006-Fix-inconsistency-between-ltree_strncasecmp-and-l.patchDownload

From ce19c7193a3b94a8afa0890f99338d5f4bc1aebe Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 10:20:36 -0800
Subject: [PATCH v9 06/11] Fix inconsistency between ltree_strncasecmp() and
 ltree_crc32_sz().

Previously, ltree_strncasecmp() used lowercasing with the default
collation; while ltree_crc32_sz used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding is single-byte.

Change both to use casefolding with the default collation.
---
 contrib/ltree/crc32.c     | 46 ++++++++++++++++++++++++++++++++-------
 contrib/ltree/lquery_op.c | 31 ++++++++++++++++++++++++--
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..3918d4a0ec2 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -10,31 +10,61 @@
 #include "postgres.h"
 #include "ltree.h"
 
+#include "crc32.h"
+#include "utils/pg_crc.h"
 #ifdef LOWER_NODE
-#include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
-#else
-#define TOLOWER(x)	(x)
+#include "utils/pg_locale.h"
 #endif
 
-#include "crc32.h"
-#include "utils/pg_crc.h"
+#ifdef LOWER_NODE
 
 unsigned int
 ltree_crc32_sz(const char *buf, int size)
 {
 	pg_crc32	crc;
 	const char *p = buf;
+	static pg_locale_t locale = NULL;
+
+	if (!locale)
+		locale = pg_database_locale();
 
 	INIT_TRADITIONAL_CRC32(crc);
 	while (size > 0)
 	{
-		char		c = (char) TOLOWER(*p);
+		char		foldstr[UNICODE_CASEMAP_BUFSZ];
+		int			srclen = pg_mblen(p);
+		size_t		foldlen;
+
+		/* fold one codepoint at a time */
+		foldlen = pg_strfold(foldstr, UNICODE_CASEMAP_BUFSZ, p, srclen,
+							 locale);
+
+		COMP_TRADITIONAL_CRC32(crc, foldstr, foldlen);
+
+		size -= srclen;
+		p += srclen;
+	}
+	FIN_TRADITIONAL_CRC32(crc);
+	return (unsigned int) crc;
+}
+
+#else
 
-		COMP_TRADITIONAL_CRC32(crc, &c, 1);
+unsigned int
+ltree_crc32_sz(const char *buf, int size)
+{
+	pg_crc32	crc;
+	const char *p = buf;
+
+	INIT_TRADITIONAL_CRC32(crc);
+	while (size > 0)
+	{
+		COMP_TRADITIONAL_CRC32(crc, p, 1);
 		size--;
 		p++;
 	}
 	FIN_TRADITIONAL_CRC32(crc);
 	return (unsigned int) crc;
 }
+
+#endif							/* !LOWER_NODE */
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index a6466f575fd..d6754eb613f 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -77,10 +77,37 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 int
 ltree_strncasecmp(const char *a, const char *b, size_t s)
 {
-	char	   *al = str_tolower(a, s, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, s, DEFAULT_COLLATION_OID);
+	static pg_locale_t locale = NULL;
+	size_t		al_sz = s + 1;
+	char	   *al = palloc(al_sz);
+	size_t		bl_sz = s + 1;
+	char	   *bl = palloc(bl_sz);
+	size_t		needed;
 	int			res;
 
+	if (!locale)
+		locale = pg_database_locale();
+
+	needed = pg_strfold(al, al_sz, a, s, locale);
+	if (needed + 1 > al_sz)
+	{
+		/* grow buffer if needed and retry */
+		al_sz = needed + 1;
+		al = repalloc(al, al_sz);
+		needed = pg_strfold(al, al_sz, a, s, locale);
+		Assert(needed + 1 <= al_sz);
+	}
+
+	needed = pg_strfold(bl, bl_sz, b, s, locale);
+	if (needed + 1 > bl_sz)
+	{
+		/* grow buffer if needed and retry */
+		bl_sz = needed + 1;
+		bl = repalloc(bl, bl_sz);
+		needed = pg_strfold(bl, bl_sz, b, s, locale);
+		Assert(needed + 1 <= bl_sz);
+	}
+
 	res = strncmp(al, bl, s);
 
 	pfree(al);
-- 
2.43.0

v9-0007-Remove-char_tolower-API.patchtext/x-patch; charset=UTF-8; name=v9-0007-Remove-char_tolower-API.patchDownload

From 4403ac9fdaa0a66ca905d8376313db9c7250d98a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 18:16:41 -0800
Subject: [PATCH v9 07/11] Remove char_tolower() API.

It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.
---
 src/backend/utils/adt/like.c           | 42 +++++++++-----------------
 src/backend/utils/adt/like_match.c     | 18 ++++++-----
 src/backend/utils/adt/pg_locale.c      | 22 --------------
 src/backend/utils/adt/pg_locale_libc.c | 10 ------
 src/include/utils/pg_locale.h          |  9 ------
 5 files changed, 25 insertions(+), 76 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4216ac17f43..4a7fc583c71 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -43,8 +43,8 @@ static text *MB_do_like_escape(text *pat, text *esc);
 static int	UTF8_MatchText(const char *t, int tlen, const char *p, int plen,
 						   pg_locale_t locale);
 
-static int	SB_IMatchText(const char *t, int tlen, const char *p, int plen,
-						  pg_locale_t locale);
+static int	C_IMatchText(const char *t, int tlen, const char *p, int plen,
+						 pg_locale_t locale);
 
 static int	GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation);
 static int	Generic_Text_IC_like(text *str, text *pat, Oid collation);
@@ -84,22 +84,10 @@ wchareq(const char *p1, const char *p2)
  * of getting a single character transformed to the system's wchar_t format.
  * So now, we just downcase the strings using lower() and apply regular LIKE
  * comparison.  This should be revisited when we install better locale support.
- */
-
-/*
- * We do handle case-insensitive matching for single-byte encodings using
+ *
+ * We do handle case-insensitive matching for the C locale using
  * fold-on-the-fly processing, however.
  */
-static char
-SB_lower_char(unsigned char c, pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return pg_ascii_tolower(c);
-	else if (locale->is_default)
-		return pg_tolower(c);
-	else
-		return char_tolower(c, locale);
-}
 
 
 #define NextByte(p, plen)	((p)++, (plen)--)
@@ -131,9 +119,9 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 #include "like_match.c"
 
 /* setup to compile like_match.c for single byte case insensitive matches */
-#define MATCH_LOWER(t, locale) SB_lower_char((unsigned char) (t), locale)
+#define MATCH_LOWER
 #define NextChar(p, plen) NextByte((p), (plen))
-#define MatchText SB_IMatchText
+#define MatchText C_IMatchText
 
 #include "like_match.c"
 
@@ -202,22 +190,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
-	 * For efficiency reasons, in the single byte case we don't call lower()
-	 * on the pattern and text, but instead call SB_lower_char on each
-	 * character.  In the multi-byte case we don't have much choice :-(. Also,
-	 * ICU does not support single-character case folding, so we go the long
-	 * way.
+	 * For efficiency reasons, in the C locale we don't call lower() on the
+	 * pattern and text, but instead call SB_lower_char on each character.
 	 */
 
-	if (locale->ctype_is_c ||
-		(char_tolower_enabled(locale) &&
-		 pg_database_encoding_max_length() == 1))
+	if (locale->ctype_is_c)
 	{
 		p = VARDATA_ANY(pat);
 		plen = VARSIZE_ANY_EXHDR(pat);
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
+		return C_IMatchText(s, slen, p, plen, locale);
 	}
 	else
 	{
@@ -229,10 +212,13 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 													 PointerGetDatum(str)));
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
+
 		if (GetDatabaseEncoding() == PG_UTF8)
 			return UTF8_MatchText(s, slen, p, plen, 0);
-		else
+		else if (pg_database_encoding_max_length() > 1)
 			return MB_MatchText(s, slen, p, plen, 0);
+		else
+			return SB_MatchText(s, slen, p, plen, 0);
 	}
 }
 
diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index 892f8a745ea..54846c9541d 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -70,10 +70,14 @@
  *--------------------
  */
 
+/*
+ * MATCH_LOWER is defined for ILIKE in the C locale as an optimization. Other
+ * locales must casefold the inputs before matching.
+ */
 #ifdef MATCH_LOWER
-#define GETCHAR(t, locale) MATCH_LOWER(t, locale)
+#define GETCHAR(t) pg_ascii_tolower(t)
 #else
-#define GETCHAR(t, locale) (t)
+#define GETCHAR(t) (t)
 #endif
 
 static int
@@ -105,7 +109,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 						 errmsg("LIKE pattern must not end with escape character")));
-			if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+			if (GETCHAR(*p) != GETCHAR(*t))
 				return LIKE_FALSE;
 		}
 		else if (*p == '%')
@@ -167,14 +171,14 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 							 errmsg("LIKE pattern must not end with escape character")));
-				firstpat = GETCHAR(p[1], locale);
+				firstpat = GETCHAR(p[1]);
 			}
 			else
-				firstpat = GETCHAR(*p, locale);
+				firstpat = GETCHAR(*p);
 
 			while (tlen > 0)
 			{
-				if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
+				if (GETCHAR(*t) == firstpat || (locale && !locale->deterministic))
 				{
 					int			matched = MatchText(t, tlen, p, plen, locale);
 
@@ -342,7 +346,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					NextChar(t1, t1len);
 			}
 		}
-		else if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+		else if (GETCHAR(*p) != GETCHAR(*t))
 		{
 			/* non-wildcard pattern char fails to match text char */
 			return LIKE_FALSE;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9319fb633b6..b3afa6cad6c 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1618,28 +1618,6 @@ char_is_cased(char ch, pg_locale_t locale)
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
-/*
- * char_tolower_enabled()
- *
- * Does the provider support char_tolower()?
- */
-bool
-char_tolower_enabled(pg_locale_t locale)
-{
-	return (locale->ctype->char_tolower != NULL);
-}
-
-/*
- * char_tolower()
- *
- * Convert char (single-byte encoding) to lowercase.
- */
-char
-char_tolower(unsigned char ch, pg_locale_t locale)
-{
-	return locale->ctype->char_tolower(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index a55167b0697..feb63bbdad1 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -248,13 +248,6 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
-static char
-char_tolower_libc(unsigned char ch, pg_locale_t locale)
-{
-	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->lt);
-}
-
 static bool
 char_is_cased_libc(char ch, pg_locale_t locale)
 {
@@ -338,7 +331,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -363,7 +355,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -384,7 +375,6 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 40e58cc52b8..e5aaf6422e8 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -124,13 +124,6 @@ struct ctype_methods
 
 	/* required */
 	bool		(*char_is_cased) (char ch, pg_locale_t locale);
-
-	/*
-	 * Optional. If defined, will only be called for single-byte encodings. If
-	 * not defined, or if the encoding is multibyte, will fall back to
-	 * pg_strlower().
-	 */
-	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
 };
 
 /*
@@ -182,8 +175,6 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 
 extern bool char_is_cased(char ch, pg_locale_t locale);
-extern bool char_tolower_enabled(pg_locale_t locale);
-extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.43.0

v9-0008-Use-multibyte-aware-extraction-of-pattern-prefixe.patchtext/x-patch; charset=UTF-8; name=v9-0008-Use-multibyte-aware-extraction-of-pattern-prefixe.patchDownload

From f8cf19f4764de42851f7b98ce652e8e2ece6af40 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Nov 2025 12:14:21 -0800
Subject: [PATCH v9 08/11] Use multibyte-aware extraction of pattern prefixes.

Previously, like_fixed_prefix() used char-at-a-time logic, which
forced it to be too conservative for case-insensitive matching.

Now, use pg_wchar-at-a-time loop for text types, along with proper
detection of cased characters; and preserve and char-at-a-time logic
for bytea.

Removes the pg_locale_t char_is_cased() single-byte method and
replaces it with a proper multibyte pg_iswcased() method.
---
 src/backend/utils/adt/like_support.c      | 111 +++++++++++++---------
 src/backend/utils/adt/pg_locale.c         |  26 +++--
 src/backend/utils/adt/pg_locale_builtin.c |   7 +-
 src/backend/utils/adt/pg_locale_icu.c     |  15 ++-
 src/backend/utils/adt/pg_locale_libc.c    |  23 +++--
 src/include/utils/pg_locale.h             |   5 +-
 6 files changed, 103 insertions(+), 84 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 0debccfa67b..e7255fa652a 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -987,12 +987,11 @@ static Pattern_Prefix_Status
 like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				  Const **prefix_const, Selectivity *rest_selec)
 {
-	char	   *match;
 	char	   *patt;
 	int			pattlen;
 	Oid			typeid = patt_const->consttype;
-	int			pos,
-				match_pos;
+	int			pos;
+	int			match_pos = 0;
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1020,67 +1019,91 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 		locale = pg_newlocale_from_collation(collation);
 	}
 
+	/* for text types, use pg_wchar; for BYTEA, use char */
 	if (typeid != BYTEAOID)
 	{
-		patt = TextDatumGetCString(patt_const->constvalue);
-		pattlen = strlen(patt);
+		text	   *val = DatumGetTextPP(patt_const->constvalue);
+		pg_wchar   *wpatt;
+		pg_wchar   *wmatch;
+		char	   *match;
+
+		patt = VARDATA_ANY(val);
+		pattlen = VARSIZE_ANY_EXHDR(val);
+		wpatt = palloc((pattlen + 1) * sizeof(pg_wchar));
+		wmatch = palloc((pattlen + 1) * sizeof(pg_wchar));
+		pg_mb2wchar_with_len(patt, wpatt, pattlen);
+
+		match = palloc(pattlen + 1);
+		for (pos = 0; pos < pattlen; pos++)
+		{
+			/* % and _ are wildcard characters in LIKE */
+			if (wpatt[pos] == '%' ||
+				wpatt[pos] == '_')
+				break;
+
+			/* Backslash escapes the next character */
+			if (wpatt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
+
+			/*
+			 * For ILIKE, stop if it's a case-varying character (it's sort of
+			 * a wildcard).
+			 */
+			if (case_insensitive && pg_iswcased(wpatt[pos], locale))
+				break;
+
+			wmatch[match_pos++] = wpatt[pos];
+		}
+
+		wmatch[match_pos] = '\0';
+
+		pg_wchar2mb_with_len(wmatch, match, pattlen);
+
+		pfree(wpatt);
+		pfree(wmatch);
+
+		*prefix_const = string_to_const(match, typeid);
 	}
 	else
 	{
 		bytea	   *bstr = DatumGetByteaPP(patt_const->constvalue);
+		char	   *match;
 
+		patt = VARDATA_ANY(bstr);
 		pattlen = VARSIZE_ANY_EXHDR(bstr);
-		patt = (char *) palloc(pattlen);
-		memcpy(patt, VARDATA_ANY(bstr), pattlen);
-		Assert((Pointer) bstr == DatumGetPointer(patt_const->constvalue));
-	}
 
-	match = palloc(pattlen + 1);
-	match_pos = 0;
-	for (pos = 0; pos < pattlen; pos++)
-	{
-		/* % and _ are wildcard characters in LIKE */
-		if (patt[pos] == '%' ||
-			patt[pos] == '_')
-			break;
-
-		/* Backslash escapes the next character */
-		if (patt[pos] == '\\')
+		match = palloc(pattlen + 1);
+		for (pos = 0; pos < pattlen; pos++)
 		{
-			pos++;
-			if (pos >= pattlen)
+			/* % and _ are wildcard characters in LIKE */
+			if (patt[pos] == '%' ||
+				patt[pos] == '_')
 				break;
-		}
 
-		/*
-		 * Stop if case-varying character (it's sort of a wildcard).
-		 *
-		 * In multibyte character sets or with non-libc providers, we can't
-		 * use isalpha, and it does not seem worth trying to convert to
-		 * wchar_t or char32_t.  Instead, just pass the single byte to the
-		 * provider, which will assume any non-ASCII char is potentially
-		 * case-varying.
-		 */
-		if (case_insensitive && char_is_cased(patt[pos], locale))
-			break;
-
-		match[match_pos++] = patt[pos];
-	}
+			/* Backslash escapes the next character */
+			if (patt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
 
-	match[match_pos] = '\0';
+			match[match_pos++] = pos;
+		}
 
-	if (typeid != BYTEAOID)
-		*prefix_const = string_to_const(match, typeid);
-	else
 		*prefix_const = string_to_bytea_const(match, match_pos);
 
+		pfree(match);
+	}
+
 	if (rest_selec != NULL)
 		*rest_selec = like_selectivity(&patt[pos], pattlen - pos,
 									   case_insensitive);
 
-	pfree(patt);
-	pfree(match);
-
 	/* in LIKE, an empty pattern is an exact match! */
 	if (pos == pattlen)
 		return Pattern_Prefix_Exact;	/* reached end of pattern, so exact */
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b3afa6cad6c..6ec7a48f4c3 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1577,6 +1577,17 @@ pg_iswxdigit(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_isxdigit(wc, locale);
 }
 
+bool
+pg_iswcased(pg_wchar wc, pg_locale_t locale)
+{
+	/* for the C locale, Cased and Alpha are equivalent */
+	if (locale->ctype == NULL)
+		return (wc <= (pg_wchar) 127 &&
+				(pg_char_properties[wc] & PG_ISALPHA));
+	else
+		return locale->ctype->wc_iscased(wc, locale);
+}
+
 pg_wchar
 pg_towupper(pg_wchar wc, pg_locale_t locale)
 {
@@ -1603,21 +1614,6 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_tolower(wc, locale);
 }
 
-/*
- * char_is_cased()
- *
- * Fuzzy test of whether the given char is case-varying or not. The argument
- * is a single byte, so in a multibyte encoding, just assume any non-ASCII
- * char is case-varying.
- */
-bool
-char_is_cased(char ch, pg_locale_t locale)
-{
-	if (locale->ctype == NULL)
-		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-	return locale->ctype->char_is_cased(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 1021e0d129b..0c2920112bb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -186,10 +186,9 @@ wc_isxdigit_builtin(pg_wchar wc, pg_locale_t locale)
 }
 
 static bool
-char_is_cased_builtin(char ch, pg_locale_t locale)
+wc_iscased_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+	return pg_u_prop_cased(to_char32(wc));
 }
 
 static pg_wchar
@@ -219,7 +218,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_ispunct = wc_ispunct_builtin,
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
-	.char_is_cased = char_is_cased_builtin,
+	.wc_iscased = wc_iscased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
 };
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index f5a0cc8fe41..18d026deda8 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,13 +121,6 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
-static bool
-char_is_cased_icu(char ch, pg_locale_t locale)
-{
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-}
-
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
  * UChar32, which is correct for the UTF-8 encoding, but not in general.
@@ -223,6 +216,12 @@ wc_isxdigit_icu(pg_wchar wc, pg_locale_t locale)
 	return u_isxdigit(wc);
 }
 
+static bool
+wc_iscased_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_hasBinaryProperty(wc, UCHAR_CASED);
+}
+
 static const struct ctype_methods ctype_methods_icu = {
 	.strlower = strlower_icu,
 	.strtitle = strtitle_icu,
@@ -238,7 +237,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_ispunct = wc_ispunct_icu,
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
-	.char_is_cased = char_is_cased_icu,
+	.wc_iscased = wc_iscased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
 };
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index feb63bbdad1..4c20797ad5c 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -184,6 +184,13 @@ wc_isxdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
+static bool
+wc_iscased_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->lt) ||
+		islower_l((unsigned char) wc, locale->lt);
+}
+
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
@@ -249,14 +256,10 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 }
 
 static bool
-char_is_cased_libc(char ch, pg_locale_t locale)
+wc_iscased_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	bool		is_multibyte = pg_database_encoding_max_length() > 1;
-
-	if (is_multibyte && IS_HIGHBIT_SET(ch))
-		return true;
-	else
-		return isalpha_l((unsigned char) ch, locale->lt);
+	return iswupper_l((wint_t) wc, locale->lt) ||
+		iswlower_l((wint_t) wc, locale->lt);
 }
 
 static pg_wchar
@@ -330,7 +333,7 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -354,7 +357,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -374,7 +377,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_ispunct = wc_ispunct_libc_mb,
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_mb,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e5aaf6422e8..6dda56d1c3c 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -119,11 +119,9 @@ struct ctype_methods
 	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isxdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_iscased) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
-
-	/* required */
-	bool		(*char_is_cased) (char ch, pg_locale_t locale);
 };
 
 /*
@@ -211,6 +209,7 @@ extern bool pg_iswprint(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswpunct(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswspace(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswxdigit(pg_wchar wc, pg_locale_t locale);
+extern bool pg_iswcased(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towupper(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towlower(pg_wchar wc, pg_locale_t locale);
 
-- 
2.43.0

v9-0009-fuzzystrmatch-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v9-0009-fuzzystrmatch-use-pg_ascii_toupper.patchDownload

From 6fe24276682edd24c4c9a20fb11747c095bd7744 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 13:24:38 -0800
Subject: [PATCH v9 09/11] fuzzystrmatch: use pg_ascii_toupper().

fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.
---
 contrib/fuzzystrmatch/dmetaphone.c    |  2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..bb5d3e90756 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -284,7 +284,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = pg_ascii_toupper((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..7f07efc2c35 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -62,7 +62,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = pg_ascii_toupper((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +124,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = pg_ascii_toupper((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +301,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (pg_ascii_toupper((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (pg_ascii_toupper((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? pg_ascii_toupper((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? pg_ascii_toupper((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) pg_ascii_toupper((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +742,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) pg_ascii_toupper((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v9-0010-downcase_identifier-use-method-table-from-locale-.patchtext/x-patch; charset=UTF-8; name=v9-0010-downcase_identifier-use-method-table-from-locale-.patchDownload

From 3ab850c32c4bfef9102117385ede98080d8cf4b6 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 20 Oct 2025 16:32:18 -0700
Subject: [PATCH v9 10/11] downcase_identifier(): use method table from locale
 provider.

Previously, libc's tolower() was always used for identifier case
folding, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
casefolding.

For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().

One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to fold the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)
---
 src/backend/parser/scansup.c              | 39 +++++++---------
 src/backend/utils/adt/pg_locale.c         | 32 +++++++++++++
 src/backend/utils/adt/pg_locale_builtin.c | 24 ++++++++++
 src/backend/utils/adt/pg_locale_icu.c     | 36 ++++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 55 +++++++++++++++++++++++
 src/include/utils/pg_locale.h             |  5 +++
 6 files changed, 166 insertions(+), 25 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..0bd049643d1 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -46,35 +47,25 @@ char *
 downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 {
 	char	   *result;
-	int			i;
-	bool		enc_is_single_byte;
-
-	result = palloc(len + 1);
-	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	size_t		dstsize;
+	size_t		needed pg_attribute_unused();
 
 	/*
-	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
-	 * have the infrastructure for.  Instead we use tolower() to provide a
-	 * locale-aware translation.  However, there are some locales where this
-	 * is not right either (eg, Turkish may do strange things with 'i' and
-	 * 'I').  Our current compromise is to use tolower() for characters with
-	 * the high bit set, as long as they aren't part of a multi-byte
-	 * character, and use an ASCII-only downcasing for 7-bit characters.
+	 * Preserves string length.
+	 *
+	 * NB: if we decide to support Unicode-aware identifier case folding, then
+	 * we need to account for a change in string length.
 	 */
-	for (i = 0; i < len; i++)
-	{
-		unsigned char ch = (unsigned char) ident[i];
+	dstsize = len + 1;
+	result = palloc(dstsize);
 
-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
-		result[i] = (char) ch;
-	}
-	result[i] = '\0';
+	needed = pg_strfold_ident(result, dstsize, ident, len);
+	Assert(needed + 1 == dstsize);
+	Assert(needed == len);
+	Assert(result[len] == '\0');
 
-	if (i >= NAMEDATALEN && truncate)
-		truncate_identifier(result, i, warn);
+	if (len >= NAMEDATALEN && truncate)
+		truncate_identifier(result, len, warn);
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 6ec7a48f4c3..68227367339 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1341,6 +1341,38 @@ pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 	return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
+/*
+ * Fold an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifier folding. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_strfold_ident(char *dest, size_t destsize, const char *src, ssize_t srclen)
+{
+	if (default_locale == NULL || default_locale->ctype == NULL)
+	{
+		int			i;
+
+		for (i = 0; i < srclen && i < destsize; i++)
+		{
+			unsigned char ch = (unsigned char) src[i];
+
+			if (ch >= 'A' && ch <= 'Z')
+				ch += 'a' - 'A';
+			dest[i] = (char) ch;
+		}
+
+		if (i < destsize)
+			dest[i] = '\0';
+
+		return srclen;
+	}
+	return default_locale->ctype->strfold_ident(dest, destsize, src, srclen,
+												default_locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 0c2920112bb..659e588d513 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -125,6 +125,29 @@ strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 						   locale->builtin.casemap_full);
 }
 
+static size_t
+strfold_ident_builtin(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
@@ -208,6 +231,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.strtitle = strtitle_builtin,
 	.strupper = strupper_builtin,
 	.strfold = strfold_builtin,
+	.strfold_ident = strfold_ident_builtin,
 	.wc_isdigit = wc_isdigit_builtin,
 	.wc_isalpha = wc_isalpha_builtin,
 	.wc_isalnum = wc_isalnum_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 18d026deda8..39b153a4262 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -61,6 +61,8 @@ static size_t strupper_icu(char *dest, size_t destsize, const char *src,
 						   ssize_t srclen, pg_locale_t locale);
 static size_t strfold_icu(char *dest, size_t destsize, const char *src,
 						  ssize_t srclen, pg_locale_t locale);
+static size_t strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+								ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -123,7 +125,7 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
- * UChar32, which is correct for the UTF-8 encoding, but not in general.
+ * UChar32, which is correct for UTF-8 and LATIN1, but not in general.
  */
 
 static pg_wchar
@@ -227,6 +229,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 	.strfold = strfold_icu,
+	.strfold_ident = strfold_ident_icu,
 	.wc_isdigit = wc_isdigit_icu,
 	.wc_isalpha = wc_isalpha_icu,
 	.wc_isalnum = wc_isalnum_icu,
@@ -564,6 +567,37 @@ strfold_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
+/*
+ * For historical compatibility, behavior is not multibyte-aware.
+ *
+ * NB: uses libc tolower() for single-byte encodings (also for historical
+ * compatibility), and therefore relies on the global LC_CTYPE setting.
+ */
+static size_t
+strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+				  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+	bool		enc_is_single_byte;
+
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
+			ch = tolower(ch);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 /*
  * strncoll_icu_utf8
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 4c20797ad5c..10d332888df 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -318,11 +318,64 @@ tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 		return wc;
 }
 
+/*
+ * Characters A..Z always fold to a..z, even in the Turkish locale. Characters
+ * beyond 127 use tolower().
+ */
+static size_t
+strfold_ident_libc_sb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	locale_t	loc = locale->lt;
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (IS_HIGHBIT_SET(ch) && isupper_l(ch, loc))
+			ch = tolower_l(ch, loc);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
+/*
+ * For historical reasons, not multibyte-aware; uses plain ASCII semantics.
+ */
+static size_t
+strfold_ident_libc_mb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
 	.strfold = strlower_libc_sb,
+	.strfold_ident = strfold_ident_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -347,6 +400,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -367,6 +421,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 6dda56d1c3c..b5251d175a9 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -107,6 +107,9 @@ struct ctype_methods
 	size_t		(*strfold) (char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+	size_t		(*strfold_ident) (char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
 
 	/* required */
 	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
@@ -185,6 +188,8 @@ extern size_t pg_strupper(char *dst, size_t dstsize,
 extern size_t pg_strfold(char *dst, size_t dstsize,
 						 const char *src, ssize_t srclen,
 						 pg_locale_t locale);
+extern size_t pg_strfold_ident(char *dst, size_t dstsize,
+							   const char *src, ssize_t srclen);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
-- 
2.43.0

v9-0011-Control-LC_COLLATE-with-GUC.patchtext/x-patch; charset=UTF-8; name=v9-0011-Control-LC_COLLATE-with-GUC.patchDownload

From c994f82e10a8910712683dd2d21679002d3697ab Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 14:00:52 -0800
Subject: [PATCH v9 11/11] Control LC_COLLATE with GUC.

Now that the global LC_COLLATE setting is not used for any in-core
purpose at all (see commit 5e6e42e44f), allow it to be set with a
GUC. This may be useful for extensions or procedural languages that
still depend on the global LC_COLLATE setting.
---
 src/backend/utils/adt/pg_locale.c             | 59 +++++++++++++++++++
 src/backend/utils/init/postinit.c             |  2 +
 src/backend/utils/misc/guc_parameters.dat     |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/bin/initdb/initdb.c                       |  3 +
 src/include/utils/guc_hooks.h                 |  2 +
 src/include/utils/pg_locale.h                 |  1 +
 7 files changed, 78 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 68227367339..143202abbad 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -81,6 +81,7 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
+char	   *locale_collate;
 char	   *locale_messages;
 char	   *locale_monetary;
 char	   *locale_numeric;
@@ -369,6 +370,64 @@ assign_locale_time(const char *newval, void *extra)
 	CurrentLCTimeValid = false;
 }
 
+/*
+ * We allow LC_COLLATE to actually be set globally.
+ *
+ * Note: we normally disallow value = "" because it wouldn't have consistent
+ * semantics (it'd effectively just use the previous value).  However, this
+ * is the value passed for PGC_S_DEFAULT, so don't complain in that case,
+ * not even if the attempted setting fails due to invalid environment value.
+ * The idea there is just to accept the environment setting *if possible*
+ * during startup, until we can read the proper value from postgresql.conf.
+ */
+bool
+check_locale_collate(char **newval, void **extra, GucSource source)
+{
+	int			locale_enc;
+	int			db_enc;
+
+	if (**newval == '\0')
+	{
+		if (source == PGC_S_DEFAULT)
+			return true;
+		else
+			return false;
+	}
+
+	locale_enc = pg_get_encoding_from_locale(*newval, true);
+	db_enc = GetDatabaseEncoding();
+
+	if (!(locale_enc == db_enc ||
+		  locale_enc == PG_SQL_ASCII ||
+		  db_enc == PG_SQL_ASCII ||
+		  locale_enc == -1))
+	{
+		if (source == PGC_S_FILE)
+		{
+			guc_free(*newval);
+			*newval = guc_strdup(LOG, "C");
+			if (!*newval)
+				return false;
+		}
+		else if (source != PGC_S_TEST)
+		{
+			ereport(WARNING,
+					(errmsg("encoding mismatch"),
+					 errdetail("Locale \"%s\" uses encoding \"%s\", which does not match database encoding \"%s\".",
+							   *newval, pg_encoding_to_char(locale_enc), pg_encoding_to_char(db_enc))));
+			return false;
+		}
+	}
+
+	return check_locale(LC_COLLATE, *newval, NULL);
+}
+
+void
+assign_locale_collate(const char *newval, void *extra)
+{
+	(void) pg_perm_setlocale(LC_COLLATE, newval);
+}
+
 /*
  * We allow LC_MESSAGES to actually be set globally.
  *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..c99d57eba48 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -404,6 +404,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	 * the pg_database tuple.
 	 */
 	SetDatabaseEncoding(dbform->encoding);
+	/* Reset lc_collate to check encoding, and fall back to C if necessary */
+	SetConfigOption("lc_collate", locale_collate, PGC_POSTMASTER, PGC_S_FILE);
 	/* Record it as a GUC internal option, too */
 	SetConfigOption("server_encoding", GetDatabaseEncodingName(),
 					PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 1128167c025..a3da16eadb1 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1450,6 +1450,15 @@
   boot_val => 'PG_KRB_SRVTAB',
 },
 
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
+  long_desc => 'An empty string means use the operating system setting.',
+  variable => 'locale_collate',
+  boot_val => '""',
+  check_hook => 'check_locale_collate',
+  assign_hook => 'assign_locale_collate',
+},
+
 { name => 'lc_messages', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
   short_desc => 'Sets the language in which messages are displayed.',
   long_desc => 'An empty string means use the operating system setting.',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..19332e39e82 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -798,6 +798,8 @@
                                         # encoding
 
 # These settings are initialized by initdb, but they can be changed.
+#lc_collate = ''                        # locale for text ordering (only affects
+                                        # extensions)
 #lc_messages = ''                       # locale for system error message
                                         # strings
 #lc_monetary = 'C'                      # locale for monetary formatting
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92fe2f531f7..8b2e7bfab6f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1312,6 +1312,9 @@ setup_config(void)
 	conflines = replace_guc_value(conflines, "shared_buffers",
 								  repltok, false);
 
+	conflines = replace_guc_value(conflines, "lc_collate",
+								  lc_collate, false);
+
 	conflines = replace_guc_value(conflines, "lc_messages",
 								  lc_messages, false);
 
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 82ac8646a8d..8a20f76eec8 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -65,6 +65,8 @@ extern bool check_huge_page_size(int *newval, void **extra, GucSource source);
 extern void assign_io_method(int newval, void *extra);
 extern bool check_io_max_concurrency(int *newval, void **extra, GucSource source);
 extern const char *show_in_hot_standby(void);
+extern bool check_locale_collate(char **newval, void **extra, GucSource source);
+extern void assign_locale_collate(const char *newval, void *extra);
 extern bool check_locale_messages(char **newval, void **extra, GucSource source);
 extern void assign_locale_messages(const char *newval, void *extra);
 extern bool check_locale_monetary(char **newval, void **extra, GucSource source);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index b5251d175a9..9c7371476c1 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -38,6 +38,7 @@
 #define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
 
 /* GUC settings */
+extern PGDLLIMPORT char *locale_collate;
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
 extern PGDLLIMPORT char *locale_numeric;
-- 
2.43.0

#73

Chao Li

li.evan.chao@gmail.com

about 2 months ago

In reply to: Jeff Davis (#72)

Re: Remaining dependency on setlocale()

Hi Jeff,

I have reviewed 0001-0004 and got a few comments:

On Nov 25, 2025, at 07:57, Jeff Davis <pgsql@j-davis.com> wrote:

0001-0004: Pure refactoring patches. I intend to commit a couple of
these soon.

1 - 0001
```
+/*
+ * Fold a character to upper case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_toupper(unsigned char ch)
```

I was curious why “inline” is needed, then I figured out when I tried to build. Without “inline”, compile will raise warnings of “unused function”. So I guess it’s better to explain why “inline” is used in the function comment, otherwise other readers may get the same confusion.

2 - 0002
```
+ * three output codepoints. See Unicode 5.18.2, "Change in Length".
```

With “change in length”, I confirmed “Unicode 5.18.2” means the Unicode Standard Section 5.18.2 “Complications for Case Mapping”. Why don’t we just give the URL in the comment. https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-5/#G29675

3 - 0004
```
@@ -1282,10 +1327,18 @@ size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->ctype->strfold)
-		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
-	else
-		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+	{
+		int			i;
+
+		srclen = (srclen >= 0) ? srclen : strlen(src);
+		for (i = 0; i < srclen && i < dstsize; i++)
+			dst[i] = pg_ascii_tolower(src[i]);
+		if (i < dstsize)
+			dst[i] = '\0';
+		return srclen;
+	}
+	return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
```

I don’t get this change. In old code, depending on locale->ctype->strfold, it calls strfold or strlower. But in this patch, it only calls strfold. Why? If that’s intentional, maybe better to add a comment to explain that.

4 - 0004

In pg_strfold, the ctype==NULL fallback code is exactly the same as pg_strlower. I guess you intentionally to not call pg_strlower here for performance consideration, is that true?

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#74

Jeff Davis

pgsql@j-davis.com

about 2 months ago

In reply to: Chao Li (#73)

11 attachment(s)

Re: Remaining dependency on setlocale()

On Tue, 2025-11-25 at 09:44 +0800, Chao Li wrote:

I was curious why “inline” is needed, then I figured out when I tried
to build. Without “inline”, compile will raise warnings of “unused
function”. So I guess it’s better to explain why “inline” is used in
the function comment, otherwise other readers may get the same
confusion.

That's a typical pattern: to make it "inline", move it to a .h file and
declare it as "static inline". For common patterns like that, I don't
think we should explain them in comments, because it would mean we
would start adding comments in zillions of places.

With “change in length”, I confirmed “Unicode 5.18.2” means the
Unicode Standard Section 5.18.2 “Complications for Case Mapping”. Why
don’t we just give the URL in the comment.
https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-5/#G29675

Done, thank you. (Though since we haven't actually moved to 17 yet, I
linked to 16 instead.)

I don’t get this change. In old code, depending on locale->ctype-

strfold, it calls strfold or strlower. But in this patch, it only

calls strfold. Why? If that’s intentional, maybe better to add a
comment to explain that.

I thought it would be slightly cleaner to just define the strfold
method in the libc provider as the same as strlower. I agree it's worth
a comment, so I added some in pg_locale_libc.c.

In pg_strfold, the ctype==NULL fallback code is exactly the same as
pg_strlower. I guess you intentionally to not call pg_strlower here
for performance consideration, is that true?

I made some static functions to clean that up, and added some comments.
Thank you.

New series attached with only these changes and a rebase.

Regards,
Jeff Davis

Attachments:

v10-0001-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v10-0001-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patchDownload

From a92c2595a8781bb972a301fa962083cc43ba6739 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 20 Nov 2025 13:09:26 -0800
Subject: [PATCH v10 01/11] Inline pg_ascii_tolower() and pg_ascii_toupper().

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
 src/include/port.h      | 25 +++++++++++++++++++++++--
 src/port/pgstrcasecmp.c | 26 --------------------------
 2 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/src/include/port.h b/src/include/port.h
index 3964d3b1293..159c2bcd7e3 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -169,8 +169,29 @@ extern int	pg_strcasecmp(const char *s1, const char *s2);
 extern int	pg_strncasecmp(const char *s1, const char *s2, size_t n);
 extern unsigned char pg_toupper(unsigned char ch);
 extern unsigned char pg_tolower(unsigned char ch);
-extern unsigned char pg_ascii_toupper(unsigned char ch);
-extern unsigned char pg_ascii_tolower(unsigned char ch);
+
+/*
+ * Fold a character to upper case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_toupper(unsigned char ch)
+{
+	if (ch >= 'a' && ch <= 'z')
+		ch += 'A' - 'a';
+	return ch;
+}
+
+/*
+ * Fold a character to lower case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_tolower(unsigned char ch)
+{
+	if (ch >= 'A' && ch <= 'Z')
+		ch += 'a' - 'A';
+	return ch;
+}
+
 
 /*
  * Beginning in v12, we always replace snprintf() and friends with our own
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..17e93180381 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -13,10 +13,6 @@
  *
  * NB: this code should match downcase_truncate_identifier() in scansup.c.
  *
- * We also provide strict ASCII-only case conversion functions, which can
- * be used to implement C/POSIX case folding semantics no matter what the
- * C library thinks the locale is.
- *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  *
@@ -127,25 +123,3 @@ pg_tolower(unsigned char ch)
 		ch = tolower(ch);
 	return ch;
 }
-
-/*
- * Fold a character to upper case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_toupper(unsigned char ch)
-{
-	if (ch >= 'a' && ch <= 'z')
-		ch += 'A' - 'a';
-	return ch;
-}
-
-/*
- * Fold a character to lower case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_tolower(unsigned char ch)
-{
-	if (ch >= 'A' && ch <= 'Z')
-		ch += 'a' - 'A';
-	return ch;
-}
-- 
2.43.0

v10-0002-Add-define-for-UNICODE_CASEMAP_BUFSZ.patchtext/x-patch; charset=UTF-8; name=v10-0002-Add-define-for-UNICODE_CASEMAP_BUFSZ.patchDownload

From bccff163e0ce2c0b1f9cb0cfdeecb32ededb898d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 10:11:52 -0800
Subject: [PATCH v10 02/11] Add #define for UNICODE_CASEMAP_BUFSZ.

Useful for mapping a single codepoint at a time into a
statically-allocated buffer.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
 src/include/utils/pg_locale.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 683e1a0eef8..54193a17a90 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -26,6 +26,20 @@
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
+/*
+ * Maximum number of bytes needed to map a single codepoint. Useful for
+ * mapping and processing a single input codepoint at a time with a
+ * statically-allocated buffer.
+ *
+ * With full case mapping, an input codepoint may be mapped to as many as
+ * three output codepoints. See Unicode 16.0.0, section 5.18.2, "Change in
+ * Length":
+ *
+ * https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G29675
+ */
+#define UNICODE_CASEMAP_LEN		3
+#define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
+
 /* GUC settings */
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
-- 
2.43.0

v10-0003-Change-some-callers-to-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v10-0003-Change-some-callers-to-use-pg_ascii_toupper.patchDownload

From bc9bd2209a7fb31993b04be9a79b82d0ed67c676 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 09:09:06 -0800
Subject: [PATCH v10 03/11] Change some callers to use pg_ascii_toupper().

The input is ASCII anyway, so it's better to be clear that it's not
locale-dependent.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/access/transam/xlogfuncs.c | 2 +-
 src/backend/utils/adt/cash.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e45fce43ed..a50345f9bf7 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -479,7 +479,7 @@ pg_split_walfile_name(PG_FUNCTION_ARGS)
 
 	/* Capitalize WAL file name. */
 	for (p = fname_upper; *p; p++)
-		*p = pg_toupper((unsigned char) *p);
+		*p = pg_ascii_toupper((unsigned char) *p);
 
 	if (!IsXLogFileName(fname_upper))
 		ereport(ERROR,
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index 611d23f3cb0..623f6eec056 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -1035,7 +1035,7 @@ cash_words(PG_FUNCTION_ARGS)
 	appendStringInfoString(&buf, m0 == 1 ? " cent" : " cents");
 
 	/* capitalize output */
-	buf.data[0] = pg_toupper((unsigned char) buf.data[0]);
+	buf.data[0] = pg_ascii_toupper((unsigned char) buf.data[0]);
 
 	/* return as text datum */
 	res = cstring_to_text_with_len(buf.data, buf.len);
-- 
2.43.0

v10-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patchtext/x-patch; charset=UTF-8; name=v10-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patchDownload

From 9d1c50e406099ebe82de8a8e03d9cb2f564d76eb Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 7 Nov 2025 12:11:34 -0800
Subject: [PATCH v10 04/11] Allow pg_locale_t APIs to work when ctype_is_c.

Previously, the caller needed to check ctype_is_c first for some
routines and not others. Now, the APIs consistently work, and the
caller can just check ctype_is_c for optimization purposes.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
 src/backend/utils/adt/like_support.c   | 34 ++++-------
 src/backend/utils/adt/pg_locale.c      | 78 ++++++++++++++++++++++++--
 src/backend/utils/adt/pg_locale_libc.c |  6 ++
 3 files changed, 88 insertions(+), 30 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 999f23f86d5..0debccfa67b 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -99,8 +99,6 @@ static Selectivity like_selectivity(const char *patt, int pattlen,
 static Selectivity regex_selectivity(const char *patt, int pattlen,
 									 bool case_insensitive,
 									 int fixed_prefix_len);
-static int	pattern_char_isalpha(char c, bool is_multibyte,
-								 pg_locale_t locale);
 static Const *make_greater_string(const Const *str_const, FmgrInfo *ltproc,
 								  Oid collation);
 static Datum string_to_datum(const char *str, Oid datatype);
@@ -995,7 +993,6 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 	Oid			typeid = patt_const->consttype;
 	int			pos,
 				match_pos;
-	bool		is_multibyte = (pg_database_encoding_max_length() > 1);
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1055,9 +1052,16 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				break;
 		}
 
-		/* Stop if case-varying character (it's sort of a wildcard) */
-		if (case_insensitive &&
-			pattern_char_isalpha(patt[pos], is_multibyte, locale))
+		/*
+		 * Stop if case-varying character (it's sort of a wildcard).
+		 *
+		 * In multibyte character sets or with non-libc providers, we can't
+		 * use isalpha, and it does not seem worth trying to convert to
+		 * wchar_t or char32_t.  Instead, just pass the single byte to the
+		 * provider, which will assume any non-ASCII char is potentially
+		 * case-varying.
+		 */
+		if (case_insensitive && char_is_cased(patt[pos], locale))
 			break;
 
 		match[match_pos++] = patt[pos];
@@ -1481,24 +1485,6 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
 	return sel;
 }
 
-/*
- * Check whether char is a letter (and, hence, subject to case-folding)
- *
- * In multibyte character sets or with ICU, we can't use isalpha, and it does
- * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha.
- * Instead, just assume any non-ASCII char is potentially case-varying, and
- * hard-wire knowledge of which ASCII chars are letters.
- */
-static int
-pattern_char_isalpha(char c, bool is_multibyte,
-					 pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else
-		return char_is_cased(c, locale);
-}
-
 
 /*
  * For bytea, the increment function need only increment the current byte
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b14c7837938..48f9d44a5f7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1257,35 +1257,99 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+/* lowercasing/casefolding in C locale */
+static size_t
+strlower_c(char *dst, size_t dstsize, const char *src, ssize_t srclen)
+{
+	int			i;
+
+	srclen = (srclen >= 0) ? srclen : strlen(src);
+	for (i = 0; i < srclen && i < dstsize; i++)
+		dst[i] = pg_ascii_tolower(src[i]);
+	if (i < dstsize)
+		dst[i] = '\0';
+	return srclen;
+}
+
+/* titlecasing in C locale */
+static size_t
+strtitle_c(char *dst, size_t dstsize, const char *src, ssize_t srclen)
+{
+	bool		wasalnum = false;
+	int			i;
+
+	srclen = (srclen >= 0) ? srclen : strlen(src);
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		char		c = src[i];
+
+		if (wasalnum)
+			dst[i] = pg_ascii_tolower(c);
+		else
+			dst[i] = pg_ascii_toupper(c);
+
+		wasalnum = ((c >= '0' && c <= '9') ||
+					(c >= 'A' && c <= 'Z') ||
+					(c >= 'a' && c <= 'z'));
+	}
+	if (i < dstsize)
+		dst[i] = '\0';
+	return srclen;
+}
+
+/* uppercasing in C locale */
+static size_t
+strupper_c(char *dst, size_t dstsize, const char *src, ssize_t srclen)
+{
+	int			i;
+
+	srclen = (srclen >= 0) ? srclen : strlen(src);
+	for (i = 0; i < srclen && i < dstsize; i++)
+		dst[i] = pg_ascii_toupper(src[i]);
+	if (i < dstsize)
+		dst[i] = '\0';
+	return srclen;
+}
+
 size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+		return strlower_c(dst, dstsize, src, srclen);
+	else
+		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+		return strtitle_c(dst, dstsize, src, srclen);
+	else
+		return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+		return strupper_c(dst, dstsize, src, srclen);
+	else
+		return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->ctype->strfold)
-		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
+	/* in the C locale, casefolding is the same as lowercasing */
+	if (locale->ctype == NULL)
+		return strlower_c(dst, dstsize, src, srclen);
 	else
-		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1560,6 +1624,8 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 bool
 char_is_cased(char ch, pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 716f005066a..26ba1be73f1 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -326,6 +326,8 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
+	/* in libc, casefolding is the same as lowercasing */
+	.strfold = strlower_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -351,6 +353,8 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	/* in libc, casefolding is the same as lowercasing */
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -372,6 +376,8 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	/* in libc, casefolding is the same as lowercasing */
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
-- 
2.43.0

v10-0005-Make-regex-max_chr-depend-on-encoding-not-provid.patchtext/x-patch; charset=UTF-8; name=v10-0005-Make-regex-max_chr-depend-on-encoding-not-provid.patchDownload

From 6cbcec89d329892a6db3b7b8efba2e0fda4a566e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Nov 2025 12:41:47 -0800
Subject: [PATCH v10 05/11] Make regex "max_chr" depend on encoding, not
 provider.

The previous per-provider "max_chr" field was there as a hack to
preserve the exact prior behavior, which depended on the
provider. Change to depend on the encoding, which makes more sense,
and remove the per-provider logic.

The only difference is for ICU: previously it always used
MAX_SIMPLE_CHR (0x7FF) regardless of the encoding; whereas now it will
match libc and use MAX_SIMPLE_CHR for UTF-8, and MAX_UCHAR for other
encodings. That's possibly a loss for non-UTF8 multibyte encodings,
but a win for single-byte encodings. Regardless, this distinction was
not worth the complexity.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/regex/regc_pg_locale.c     | 18 ++++++++++--------
 src/backend/utils/adt/pg_locale_libc.c |  2 --
 src/include/utils/pg_locale.h          |  6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 4698f110a0c..bb0e3f1d139 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -320,16 +320,18 @@ regc_ctype_get_cache(regc_wc_probefunc probefunc, int cclasscode)
 		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
 	}
+	else if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+	}
 	else
 	{
-		if (pg_regex_locale->ctype->max_chr != 0 &&
-			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
-		{
-			max_chr = pg_regex_locale->ctype->max_chr;
-			pcc->cv.cclasscode = -1;
-		}
-		else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#if MAX_SIMPLE_CHR >= UCHAR_MAX
+		max_chr = (pg_wchar) UCHAR_MAX;
+		pcc->cv.cclasscode = -1;
+#else
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#endif
 	}
 
 	/*
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 26ba1be73f1..69b336070e9 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -342,7 +342,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 /*
@@ -369,7 +368,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 static const struct ctype_methods ctype_methods_libc_utf8 = {
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 54193a17a90..42e21e7fb8a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -134,12 +134,6 @@ struct ctype_methods
 	 * pg_strlower().
 	 */
 	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
-
-	/*
-	 * For regex and pattern matching efficiency, the maximum char value
-	 * supported by the above methods. If zero, limit is set by regex code.
-	 */
-	pg_wchar	max_chr;
 };
 
 /*
-- 
2.43.0

v10-0006-Fix-inconsistency-between-ltree_strncasecmp-and-.patchtext/x-patch; charset=UTF-8; name=v10-0006-Fix-inconsistency-between-ltree_strncasecmp-and-.patchDownload

From dcd90b2751f4aaf11d6164ea9fec3309e0817125 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 10:20:36 -0800
Subject: [PATCH v10 06/11] Fix inconsistency between ltree_strncasecmp() and
 ltree_crc32_sz().

Previously, ltree_strncasecmp() used lowercasing with the default
collation; while ltree_crc32_sz used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding is single-byte.

Change both to use casefolding with the default collation.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 contrib/ltree/crc32.c     | 46 ++++++++++++++++++++++++++++++++-------
 contrib/ltree/lquery_op.c | 31 ++++++++++++++++++++++++--
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..3918d4a0ec2 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -10,31 +10,61 @@
 #include "postgres.h"
 #include "ltree.h"
 
+#include "crc32.h"
+#include "utils/pg_crc.h"
 #ifdef LOWER_NODE
-#include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
-#else
-#define TOLOWER(x)	(x)
+#include "utils/pg_locale.h"
 #endif
 
-#include "crc32.h"
-#include "utils/pg_crc.h"
+#ifdef LOWER_NODE
 
 unsigned int
 ltree_crc32_sz(const char *buf, int size)
 {
 	pg_crc32	crc;
 	const char *p = buf;
+	static pg_locale_t locale = NULL;
+
+	if (!locale)
+		locale = pg_database_locale();
 
 	INIT_TRADITIONAL_CRC32(crc);
 	while (size > 0)
 	{
-		char		c = (char) TOLOWER(*p);
+		char		foldstr[UNICODE_CASEMAP_BUFSZ];
+		int			srclen = pg_mblen(p);
+		size_t		foldlen;
+
+		/* fold one codepoint at a time */
+		foldlen = pg_strfold(foldstr, UNICODE_CASEMAP_BUFSZ, p, srclen,
+							 locale);
+
+		COMP_TRADITIONAL_CRC32(crc, foldstr, foldlen);
+
+		size -= srclen;
+		p += srclen;
+	}
+	FIN_TRADITIONAL_CRC32(crc);
+	return (unsigned int) crc;
+}
+
+#else
 
-		COMP_TRADITIONAL_CRC32(crc, &c, 1);
+unsigned int
+ltree_crc32_sz(const char *buf, int size)
+{
+	pg_crc32	crc;
+	const char *p = buf;
+
+	INIT_TRADITIONAL_CRC32(crc);
+	while (size > 0)
+	{
+		COMP_TRADITIONAL_CRC32(crc, p, 1);
 		size--;
 		p++;
 	}
 	FIN_TRADITIONAL_CRC32(crc);
 	return (unsigned int) crc;
 }
+
+#endif							/* !LOWER_NODE */
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index a6466f575fd..d6754eb613f 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -77,10 +77,37 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 int
 ltree_strncasecmp(const char *a, const char *b, size_t s)
 {
-	char	   *al = str_tolower(a, s, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, s, DEFAULT_COLLATION_OID);
+	static pg_locale_t locale = NULL;
+	size_t		al_sz = s + 1;
+	char	   *al = palloc(al_sz);
+	size_t		bl_sz = s + 1;
+	char	   *bl = palloc(bl_sz);
+	size_t		needed;
 	int			res;
 
+	if (!locale)
+		locale = pg_database_locale();
+
+	needed = pg_strfold(al, al_sz, a, s, locale);
+	if (needed + 1 > al_sz)
+	{
+		/* grow buffer if needed and retry */
+		al_sz = needed + 1;
+		al = repalloc(al, al_sz);
+		needed = pg_strfold(al, al_sz, a, s, locale);
+		Assert(needed + 1 <= al_sz);
+	}
+
+	needed = pg_strfold(bl, bl_sz, b, s, locale);
+	if (needed + 1 > bl_sz)
+	{
+		/* grow buffer if needed and retry */
+		bl_sz = needed + 1;
+		bl = repalloc(bl, bl_sz);
+		needed = pg_strfold(bl, bl_sz, b, s, locale);
+		Assert(needed + 1 <= bl_sz);
+	}
+
 	res = strncmp(al, bl, s);
 
 	pfree(al);
-- 
2.43.0

v10-0007-Remove-char_tolower-API.patchtext/x-patch; charset=UTF-8; name=v10-0007-Remove-char_tolower-API.patchDownload

From 04045352820f56b8403aca7a3e319532e71ca66d Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 18:16:41 -0800
Subject: [PATCH v10 07/11] Remove char_tolower() API.

It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/like.c           | 42 +++++++++-----------------
 src/backend/utils/adt/like_match.c     | 18 ++++++-----
 src/backend/utils/adt/pg_locale.c      | 22 --------------
 src/backend/utils/adt/pg_locale_libc.c | 10 ------
 src/include/utils/pg_locale.h          |  9 ------
 5 files changed, 25 insertions(+), 76 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4216ac17f43..4a7fc583c71 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -43,8 +43,8 @@ static text *MB_do_like_escape(text *pat, text *esc);
 static int	UTF8_MatchText(const char *t, int tlen, const char *p, int plen,
 						   pg_locale_t locale);
 
-static int	SB_IMatchText(const char *t, int tlen, const char *p, int plen,
-						  pg_locale_t locale);
+static int	C_IMatchText(const char *t, int tlen, const char *p, int plen,
+						 pg_locale_t locale);
 
 static int	GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation);
 static int	Generic_Text_IC_like(text *str, text *pat, Oid collation);
@@ -84,22 +84,10 @@ wchareq(const char *p1, const char *p2)
  * of getting a single character transformed to the system's wchar_t format.
  * So now, we just downcase the strings using lower() and apply regular LIKE
  * comparison.  This should be revisited when we install better locale support.
- */
-
-/*
- * We do handle case-insensitive matching for single-byte encodings using
+ *
+ * We do handle case-insensitive matching for the C locale using
  * fold-on-the-fly processing, however.
  */
-static char
-SB_lower_char(unsigned char c, pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return pg_ascii_tolower(c);
-	else if (locale->is_default)
-		return pg_tolower(c);
-	else
-		return char_tolower(c, locale);
-}
 
 
 #define NextByte(p, plen)	((p)++, (plen)--)
@@ -131,9 +119,9 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 #include "like_match.c"
 
 /* setup to compile like_match.c for single byte case insensitive matches */
-#define MATCH_LOWER(t, locale) SB_lower_char((unsigned char) (t), locale)
+#define MATCH_LOWER
 #define NextChar(p, plen) NextByte((p), (plen))
-#define MatchText SB_IMatchText
+#define MatchText C_IMatchText
 
 #include "like_match.c"
 
@@ -202,22 +190,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
-	 * For efficiency reasons, in the single byte case we don't call lower()
-	 * on the pattern and text, but instead call SB_lower_char on each
-	 * character.  In the multi-byte case we don't have much choice :-(. Also,
-	 * ICU does not support single-character case folding, so we go the long
-	 * way.
+	 * For efficiency reasons, in the C locale we don't call lower() on the
+	 * pattern and text, but instead call SB_lower_char on each character.
 	 */
 
-	if (locale->ctype_is_c ||
-		(char_tolower_enabled(locale) &&
-		 pg_database_encoding_max_length() == 1))
+	if (locale->ctype_is_c)
 	{
 		p = VARDATA_ANY(pat);
 		plen = VARSIZE_ANY_EXHDR(pat);
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
+		return C_IMatchText(s, slen, p, plen, locale);
 	}
 	else
 	{
@@ -229,10 +212,13 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 													 PointerGetDatum(str)));
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
+
 		if (GetDatabaseEncoding() == PG_UTF8)
 			return UTF8_MatchText(s, slen, p, plen, 0);
-		else
+		else if (pg_database_encoding_max_length() > 1)
 			return MB_MatchText(s, slen, p, plen, 0);
+		else
+			return SB_MatchText(s, slen, p, plen, 0);
 	}
 }
 
diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index 892f8a745ea..54846c9541d 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -70,10 +70,14 @@
  *--------------------
  */
 
+/*
+ * MATCH_LOWER is defined for ILIKE in the C locale as an optimization. Other
+ * locales must casefold the inputs before matching.
+ */
 #ifdef MATCH_LOWER
-#define GETCHAR(t, locale) MATCH_LOWER(t, locale)
+#define GETCHAR(t) pg_ascii_tolower(t)
 #else
-#define GETCHAR(t, locale) (t)
+#define GETCHAR(t) (t)
 #endif
 
 static int
@@ -105,7 +109,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 						 errmsg("LIKE pattern must not end with escape character")));
-			if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+			if (GETCHAR(*p) != GETCHAR(*t))
 				return LIKE_FALSE;
 		}
 		else if (*p == '%')
@@ -167,14 +171,14 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 							 errmsg("LIKE pattern must not end with escape character")));
-				firstpat = GETCHAR(p[1], locale);
+				firstpat = GETCHAR(p[1]);
 			}
 			else
-				firstpat = GETCHAR(*p, locale);
+				firstpat = GETCHAR(*p);
 
 			while (tlen > 0)
 			{
-				if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
+				if (GETCHAR(*t) == firstpat || (locale && !locale->deterministic))
 				{
 					int			matched = MatchText(t, tlen, p, plen, locale);
 
@@ -342,7 +346,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					NextChar(t1, t1len);
 			}
 		}
-		else if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+		else if (GETCHAR(*p) != GETCHAR(*t))
 		{
 			/* non-wildcard pattern char fails to match text char */
 			return LIKE_FALSE;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 48f9d44a5f7..5aba277ba99 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1629,28 +1629,6 @@ char_is_cased(char ch, pg_locale_t locale)
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
-/*
- * char_tolower_enabled()
- *
- * Does the provider support char_tolower()?
- */
-bool
-char_tolower_enabled(pg_locale_t locale)
-{
-	return (locale->ctype->char_tolower != NULL);
-}
-
-/*
- * char_tolower()
- *
- * Convert char (single-byte encoding) to lowercase.
- */
-char
-char_tolower(unsigned char ch, pg_locale_t locale)
-{
-	return locale->ctype->char_tolower(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 69b336070e9..545ee9a3099 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -248,13 +248,6 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
-static char
-char_tolower_libc(unsigned char ch, pg_locale_t locale)
-{
-	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->lt);
-}
-
 static bool
 char_is_cased_libc(char ch, pg_locale_t locale)
 {
@@ -339,7 +332,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -365,7 +357,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -387,7 +378,6 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 42e21e7fb8a..50520e50127 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -127,13 +127,6 @@ struct ctype_methods
 
 	/* required */
 	bool		(*char_is_cased) (char ch, pg_locale_t locale);
-
-	/*
-	 * Optional. If defined, will only be called for single-byte encodings. If
-	 * not defined, or if the encoding is multibyte, will fall back to
-	 * pg_strlower().
-	 */
-	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
 };
 
 /*
@@ -185,8 +178,6 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 
 extern bool char_is_cased(char ch, pg_locale_t locale);
-extern bool char_tolower_enabled(pg_locale_t locale);
-extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.43.0

v10-0008-Use-multibyte-aware-extraction-of-pattern-prefix.patchtext/x-patch; charset=UTF-8; name=v10-0008-Use-multibyte-aware-extraction-of-pattern-prefix.patchDownload

From 94a0e519f13cfc8554d11cf46ed7bbef8aad2ed3 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Nov 2025 12:14:21 -0800
Subject: [PATCH v10 08/11] Use multibyte-aware extraction of pattern prefixes.

Previously, like_fixed_prefix() used char-at-a-time logic, which
forced it to be too conservative for case-insensitive matching.

Now, use pg_wchar-at-a-time loop for text types, along with proper
detection of cased characters; and preserve and char-at-a-time logic
for bytea.

Removes the pg_locale_t char_is_cased() single-byte method and
replaces it with a proper multibyte pg_iswcased() method.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/like_support.c      | 111 +++++++++++++---------
 src/backend/utils/adt/pg_locale.c         |  26 +++--
 src/backend/utils/adt/pg_locale_builtin.c |   7 +-
 src/backend/utils/adt/pg_locale_icu.c     |  15 ++-
 src/backend/utils/adt/pg_locale_libc.c    |  23 +++--
 src/include/utils/pg_locale.h             |   5 +-
 6 files changed, 103 insertions(+), 84 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 0debccfa67b..e7255fa652a 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -987,12 +987,11 @@ static Pattern_Prefix_Status
 like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				  Const **prefix_const, Selectivity *rest_selec)
 {
-	char	   *match;
 	char	   *patt;
 	int			pattlen;
 	Oid			typeid = patt_const->consttype;
-	int			pos,
-				match_pos;
+	int			pos;
+	int			match_pos = 0;
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1020,67 +1019,91 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 		locale = pg_newlocale_from_collation(collation);
 	}
 
+	/* for text types, use pg_wchar; for BYTEA, use char */
 	if (typeid != BYTEAOID)
 	{
-		patt = TextDatumGetCString(patt_const->constvalue);
-		pattlen = strlen(patt);
+		text	   *val = DatumGetTextPP(patt_const->constvalue);
+		pg_wchar   *wpatt;
+		pg_wchar   *wmatch;
+		char	   *match;
+
+		patt = VARDATA_ANY(val);
+		pattlen = VARSIZE_ANY_EXHDR(val);
+		wpatt = palloc((pattlen + 1) * sizeof(pg_wchar));
+		wmatch = palloc((pattlen + 1) * sizeof(pg_wchar));
+		pg_mb2wchar_with_len(patt, wpatt, pattlen);
+
+		match = palloc(pattlen + 1);
+		for (pos = 0; pos < pattlen; pos++)
+		{
+			/* % and _ are wildcard characters in LIKE */
+			if (wpatt[pos] == '%' ||
+				wpatt[pos] == '_')
+				break;
+
+			/* Backslash escapes the next character */
+			if (wpatt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
+
+			/*
+			 * For ILIKE, stop if it's a case-varying character (it's sort of
+			 * a wildcard).
+			 */
+			if (case_insensitive && pg_iswcased(wpatt[pos], locale))
+				break;
+
+			wmatch[match_pos++] = wpatt[pos];
+		}
+
+		wmatch[match_pos] = '\0';
+
+		pg_wchar2mb_with_len(wmatch, match, pattlen);
+
+		pfree(wpatt);
+		pfree(wmatch);
+
+		*prefix_const = string_to_const(match, typeid);
 	}
 	else
 	{
 		bytea	   *bstr = DatumGetByteaPP(patt_const->constvalue);
+		char	   *match;
 
+		patt = VARDATA_ANY(bstr);
 		pattlen = VARSIZE_ANY_EXHDR(bstr);
-		patt = (char *) palloc(pattlen);
-		memcpy(patt, VARDATA_ANY(bstr), pattlen);
-		Assert((Pointer) bstr == DatumGetPointer(patt_const->constvalue));
-	}
 
-	match = palloc(pattlen + 1);
-	match_pos = 0;
-	for (pos = 0; pos < pattlen; pos++)
-	{
-		/* % and _ are wildcard characters in LIKE */
-		if (patt[pos] == '%' ||
-			patt[pos] == '_')
-			break;
-
-		/* Backslash escapes the next character */
-		if (patt[pos] == '\\')
+		match = palloc(pattlen + 1);
+		for (pos = 0; pos < pattlen; pos++)
 		{
-			pos++;
-			if (pos >= pattlen)
+			/* % and _ are wildcard characters in LIKE */
+			if (patt[pos] == '%' ||
+				patt[pos] == '_')
 				break;
-		}
 
-		/*
-		 * Stop if case-varying character (it's sort of a wildcard).
-		 *
-		 * In multibyte character sets or with non-libc providers, we can't
-		 * use isalpha, and it does not seem worth trying to convert to
-		 * wchar_t or char32_t.  Instead, just pass the single byte to the
-		 * provider, which will assume any non-ASCII char is potentially
-		 * case-varying.
-		 */
-		if (case_insensitive && char_is_cased(patt[pos], locale))
-			break;
-
-		match[match_pos++] = patt[pos];
-	}
+			/* Backslash escapes the next character */
+			if (patt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
 
-	match[match_pos] = '\0';
+			match[match_pos++] = pos;
+		}
 
-	if (typeid != BYTEAOID)
-		*prefix_const = string_to_const(match, typeid);
-	else
 		*prefix_const = string_to_bytea_const(match, match_pos);
 
+		pfree(match);
+	}
+
 	if (rest_selec != NULL)
 		*rest_selec = like_selectivity(&patt[pos], pattlen - pos,
 									   case_insensitive);
 
-	pfree(patt);
-	pfree(match);
-
 	/* in LIKE, an empty pattern is an exact match! */
 	if (pos == pattlen)
 		return Pattern_Prefix_Exact;	/* reached end of pattern, so exact */
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5aba277ba99..c4e89502f85 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1588,6 +1588,17 @@ pg_iswxdigit(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_isxdigit(wc, locale);
 }
 
+bool
+pg_iswcased(pg_wchar wc, pg_locale_t locale)
+{
+	/* for the C locale, Cased and Alpha are equivalent */
+	if (locale->ctype == NULL)
+		return (wc <= (pg_wchar) 127 &&
+				(pg_char_properties[wc] & PG_ISALPHA));
+	else
+		return locale->ctype->wc_iscased(wc, locale);
+}
+
 pg_wchar
 pg_towupper(pg_wchar wc, pg_locale_t locale)
 {
@@ -1614,21 +1625,6 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_tolower(wc, locale);
 }
 
-/*
- * char_is_cased()
- *
- * Fuzzy test of whether the given char is case-varying or not. The argument
- * is a single byte, so in a multibyte encoding, just assume any non-ASCII
- * char is case-varying.
- */
-bool
-char_is_cased(char ch, pg_locale_t locale)
-{
-	if (locale->ctype == NULL)
-		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-	return locale->ctype->char_is_cased(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 1021e0d129b..0c2920112bb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -186,10 +186,9 @@ wc_isxdigit_builtin(pg_wchar wc, pg_locale_t locale)
 }
 
 static bool
-char_is_cased_builtin(char ch, pg_locale_t locale)
+wc_iscased_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+	return pg_u_prop_cased(to_char32(wc));
 }
 
 static pg_wchar
@@ -219,7 +218,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_ispunct = wc_ispunct_builtin,
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
-	.char_is_cased = char_is_cased_builtin,
+	.wc_iscased = wc_iscased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
 };
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index f5a0cc8fe41..18d026deda8 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,13 +121,6 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
-static bool
-char_is_cased_icu(char ch, pg_locale_t locale)
-{
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-}
-
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
  * UChar32, which is correct for the UTF-8 encoding, but not in general.
@@ -223,6 +216,12 @@ wc_isxdigit_icu(pg_wchar wc, pg_locale_t locale)
 	return u_isxdigit(wc);
 }
 
+static bool
+wc_iscased_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_hasBinaryProperty(wc, UCHAR_CASED);
+}
+
 static const struct ctype_methods ctype_methods_icu = {
 	.strlower = strlower_icu,
 	.strtitle = strtitle_icu,
@@ -238,7 +237,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_ispunct = wc_ispunct_icu,
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
-	.char_is_cased = char_is_cased_icu,
+	.wc_iscased = wc_iscased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
 };
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 545ee9a3099..fa419863fa7 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -184,6 +184,13 @@ wc_isxdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
+static bool
+wc_iscased_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->lt) ||
+		islower_l((unsigned char) wc, locale->lt);
+}
+
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
@@ -249,14 +256,10 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 }
 
 static bool
-char_is_cased_libc(char ch, pg_locale_t locale)
+wc_iscased_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	bool		is_multibyte = pg_database_encoding_max_length() > 1;
-
-	if (is_multibyte && IS_HIGHBIT_SET(ch))
-		return true;
-	else
-		return isalpha_l((unsigned char) ch, locale->lt);
+	return iswupper_l((wint_t) wc, locale->lt) ||
+		iswlower_l((wint_t) wc, locale->lt);
 }
 
 static pg_wchar
@@ -331,7 +334,7 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -356,7 +359,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -377,7 +380,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_ispunct = wc_ispunct_libc_mb,
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_mb,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 50520e50127..01f891def7a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -122,11 +122,9 @@ struct ctype_methods
 	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isxdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_iscased) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
-
-	/* required */
-	bool		(*char_is_cased) (char ch, pg_locale_t locale);
 };
 
 /*
@@ -214,6 +212,7 @@ extern bool pg_iswprint(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswpunct(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswspace(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswxdigit(pg_wchar wc, pg_locale_t locale);
+extern bool pg_iswcased(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towupper(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towlower(pg_wchar wc, pg_locale_t locale);
 
-- 
2.43.0

v10-0009-fuzzystrmatch-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v10-0009-fuzzystrmatch-use-pg_ascii_toupper.patchDownload

From c32561a8447abda8fa03e45f3bfc857227b3d43a Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 13:24:38 -0800
Subject: [PATCH v10 09/11] fuzzystrmatch: use pg_ascii_toupper().

fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 contrib/fuzzystrmatch/dmetaphone.c    |  2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..bb5d3e90756 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -284,7 +284,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = pg_ascii_toupper((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..7f07efc2c35 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -62,7 +62,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = pg_ascii_toupper((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +124,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = pg_ascii_toupper((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +301,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (pg_ascii_toupper((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (pg_ascii_toupper((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? pg_ascii_toupper((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? pg_ascii_toupper((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) pg_ascii_toupper((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +742,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) pg_ascii_toupper((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v10-0010-downcase_identifier-use-method-table-from-locale.patchtext/x-patch; charset=UTF-8; name=v10-0010-downcase_identifier-use-method-table-from-locale.patchDownload

From 66bcc8c8de81ba83e90fd43a48acdb214c85139e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 20 Oct 2025 16:32:18 -0700
Subject: [PATCH v10 10/11] downcase_identifier(): use method table from locale
 provider.

Previously, libc's tolower() was always used for identifier case
folding, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
casefolding.

For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().

One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to fold the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/parser/scansup.c              | 39 +++++++---------
 src/backend/utils/adt/pg_locale.c         | 32 +++++++++++++
 src/backend/utils/adt/pg_locale_builtin.c | 24 ++++++++++
 src/backend/utils/adt/pg_locale_icu.c     | 36 ++++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 55 +++++++++++++++++++++++
 src/include/utils/pg_locale.h             |  5 +++
 6 files changed, 166 insertions(+), 25 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..0bd049643d1 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -46,35 +47,25 @@ char *
 downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 {
 	char	   *result;
-	int			i;
-	bool		enc_is_single_byte;
-
-	result = palloc(len + 1);
-	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	size_t		dstsize;
+	size_t		needed pg_attribute_unused();
 
 	/*
-	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
-	 * have the infrastructure for.  Instead we use tolower() to provide a
-	 * locale-aware translation.  However, there are some locales where this
-	 * is not right either (eg, Turkish may do strange things with 'i' and
-	 * 'I').  Our current compromise is to use tolower() for characters with
-	 * the high bit set, as long as they aren't part of a multi-byte
-	 * character, and use an ASCII-only downcasing for 7-bit characters.
+	 * Preserves string length.
+	 *
+	 * NB: if we decide to support Unicode-aware identifier case folding, then
+	 * we need to account for a change in string length.
 	 */
-	for (i = 0; i < len; i++)
-	{
-		unsigned char ch = (unsigned char) ident[i];
+	dstsize = len + 1;
+	result = palloc(dstsize);
 
-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
-		result[i] = (char) ch;
-	}
-	result[i] = '\0';
+	needed = pg_strfold_ident(result, dstsize, ident, len);
+	Assert(needed + 1 == dstsize);
+	Assert(needed == len);
+	Assert(result[len] == '\0');
 
-	if (i >= NAMEDATALEN && truncate)
-		truncate_identifier(result, i, warn);
+	if (len >= NAMEDATALEN && truncate)
+		truncate_identifier(result, len, warn);
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c4e89502f85..9167018c85b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1352,6 +1352,38 @@ pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
+/*
+ * Fold an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifier folding. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_strfold_ident(char *dest, size_t destsize, const char *src, ssize_t srclen)
+{
+	if (default_locale == NULL || default_locale->ctype == NULL)
+	{
+		int			i;
+
+		for (i = 0; i < srclen && i < destsize; i++)
+		{
+			unsigned char ch = (unsigned char) src[i];
+
+			if (ch >= 'A' && ch <= 'Z')
+				ch += 'a' - 'A';
+			dest[i] = (char) ch;
+		}
+
+		if (i < destsize)
+			dest[i] = '\0';
+
+		return srclen;
+	}
+	return default_locale->ctype->strfold_ident(dest, destsize, src, srclen,
+												default_locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 0c2920112bb..659e588d513 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -125,6 +125,29 @@ strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 						   locale->builtin.casemap_full);
 }
 
+static size_t
+strfold_ident_builtin(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
@@ -208,6 +231,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.strtitle = strtitle_builtin,
 	.strupper = strupper_builtin,
 	.strfold = strfold_builtin,
+	.strfold_ident = strfold_ident_builtin,
 	.wc_isdigit = wc_isdigit_builtin,
 	.wc_isalpha = wc_isalpha_builtin,
 	.wc_isalnum = wc_isalnum_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 18d026deda8..39b153a4262 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -61,6 +61,8 @@ static size_t strupper_icu(char *dest, size_t destsize, const char *src,
 						   ssize_t srclen, pg_locale_t locale);
 static size_t strfold_icu(char *dest, size_t destsize, const char *src,
 						  ssize_t srclen, pg_locale_t locale);
+static size_t strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+								ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -123,7 +125,7 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
- * UChar32, which is correct for the UTF-8 encoding, but not in general.
+ * UChar32, which is correct for UTF-8 and LATIN1, but not in general.
  */
 
 static pg_wchar
@@ -227,6 +229,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 	.strfold = strfold_icu,
+	.strfold_ident = strfold_ident_icu,
 	.wc_isdigit = wc_isdigit_icu,
 	.wc_isalpha = wc_isalpha_icu,
 	.wc_isalnum = wc_isalnum_icu,
@@ -564,6 +567,37 @@ strfold_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
+/*
+ * For historical compatibility, behavior is not multibyte-aware.
+ *
+ * NB: uses libc tolower() for single-byte encodings (also for historical
+ * compatibility), and therefore relies on the global LC_CTYPE setting.
+ */
+static size_t
+strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+				  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+	bool		enc_is_single_byte;
+
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
+			ch = tolower(ch);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 /*
  * strncoll_icu_utf8
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index fa419863fa7..cb6b573dd34 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -318,12 +318,65 @@ tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 		return wc;
 }
 
+/*
+ * Characters A..Z always fold to a..z, even in the Turkish locale. Characters
+ * beyond 127 use tolower().
+ */
+static size_t
+strfold_ident_libc_sb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	locale_t	loc = locale->lt;
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (IS_HIGHBIT_SET(ch) && isupper_l(ch, loc))
+			ch = tolower_l(ch, loc);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
+/*
+ * For historical reasons, not multibyte-aware; uses plain ASCII semantics.
+ */
+static size_t
+strfold_ident_libc_mb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_sb,
+	.strfold_ident = strfold_ident_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -349,6 +402,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -370,6 +424,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 01f891def7a..53574d2ef85 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -110,6 +110,9 @@ struct ctype_methods
 	size_t		(*strfold) (char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+	size_t		(*strfold_ident) (char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
 
 	/* required */
 	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
@@ -188,6 +191,8 @@ extern size_t pg_strupper(char *dst, size_t dstsize,
 extern size_t pg_strfold(char *dst, size_t dstsize,
 						 const char *src, ssize_t srclen,
 						 pg_locale_t locale);
+extern size_t pg_strfold_ident(char *dst, size_t dstsize,
+							   const char *src, ssize_t srclen);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
-- 
2.43.0

v10-0011-Control-LC_COLLATE-with-GUC.patchtext/x-patch; charset=UTF-8; name=v10-0011-Control-LC_COLLATE-with-GUC.patchDownload

From c9661114336dc16e81761b48a151c476cc4d4dea Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 14:00:52 -0800
Subject: [PATCH v10 11/11] Control LC_COLLATE with GUC.

Now that the global LC_COLLATE setting is not used for any in-core
purpose at all (see commit 5e6e42e44f), allow it to be set with a
GUC. This may be useful for extensions or procedural languages that
still depend on the global LC_COLLATE setting.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 +++++++++++++++++++
 src/backend/utils/init/postinit.c             |  2 +
 src/backend/utils/misc/guc_parameters.dat     |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/bin/initdb/initdb.c                       |  3 +
 src/include/utils/guc_hooks.h                 |  2 +
 src/include/utils/pg_locale.h                 |  1 +
 7 files changed, 78 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9167018c85b..91e7eba2eac 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -81,6 +81,7 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
+char	   *locale_collate;
 char	   *locale_messages;
 char	   *locale_monetary;
 char	   *locale_numeric;
@@ -369,6 +370,64 @@ assign_locale_time(const char *newval, void *extra)
 	CurrentLCTimeValid = false;
 }
 
+/*
+ * We allow LC_COLLATE to actually be set globally.
+ *
+ * Note: we normally disallow value = "" because it wouldn't have consistent
+ * semantics (it'd effectively just use the previous value).  However, this
+ * is the value passed for PGC_S_DEFAULT, so don't complain in that case,
+ * not even if the attempted setting fails due to invalid environment value.
+ * The idea there is just to accept the environment setting *if possible*
+ * during startup, until we can read the proper value from postgresql.conf.
+ */
+bool
+check_locale_collate(char **newval, void **extra, GucSource source)
+{
+	int			locale_enc;
+	int			db_enc;
+
+	if (**newval == '\0')
+	{
+		if (source == PGC_S_DEFAULT)
+			return true;
+		else
+			return false;
+	}
+
+	locale_enc = pg_get_encoding_from_locale(*newval, true);
+	db_enc = GetDatabaseEncoding();
+
+	if (!(locale_enc == db_enc ||
+		  locale_enc == PG_SQL_ASCII ||
+		  db_enc == PG_SQL_ASCII ||
+		  locale_enc == -1))
+	{
+		if (source == PGC_S_FILE)
+		{
+			guc_free(*newval);
+			*newval = guc_strdup(LOG, "C");
+			if (!*newval)
+				return false;
+		}
+		else if (source != PGC_S_TEST)
+		{
+			ereport(WARNING,
+					(errmsg("encoding mismatch"),
+					 errdetail("Locale \"%s\" uses encoding \"%s\", which does not match database encoding \"%s\".",
+							   *newval, pg_encoding_to_char(locale_enc), pg_encoding_to_char(db_enc))));
+			return false;
+		}
+	}
+
+	return check_locale(LC_COLLATE, *newval, NULL);
+}
+
+void
+assign_locale_collate(const char *newval, void *extra)
+{
+	(void) pg_perm_setlocale(LC_COLLATE, newval);
+}
+
 /*
  * We allow LC_MESSAGES to actually be set globally.
  *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..c99d57eba48 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -404,6 +404,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	 * the pg_database tuple.
 	 */
 	SetDatabaseEncoding(dbform->encoding);
+	/* Reset lc_collate to check encoding, and fall back to C if necessary */
+	SetConfigOption("lc_collate", locale_collate, PGC_POSTMASTER, PGC_S_FILE);
 	/* Record it as a GUC internal option, too */
 	SetConfigOption("server_encoding", GetDatabaseEncodingName(),
 					PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 1128167c025..a3da16eadb1 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1450,6 +1450,15 @@
   boot_val => 'PG_KRB_SRVTAB',
 },
 
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
+  long_desc => 'An empty string means use the operating system setting.',
+  variable => 'locale_collate',
+  boot_val => '""',
+  check_hook => 'check_locale_collate',
+  assign_hook => 'assign_locale_collate',
+},
+
 { name => 'lc_messages', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
   short_desc => 'Sets the language in which messages are displayed.',
   long_desc => 'An empty string means use the operating system setting.',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..19332e39e82 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -798,6 +798,8 @@
                                         # encoding
 
 # These settings are initialized by initdb, but they can be changed.
+#lc_collate = ''                        # locale for text ordering (only affects
+                                        # extensions)
 #lc_messages = ''                       # locale for system error message
                                         # strings
 #lc_monetary = 'C'                      # locale for monetary formatting
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92fe2f531f7..8b2e7bfab6f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1312,6 +1312,9 @@ setup_config(void)
 	conflines = replace_guc_value(conflines, "shared_buffers",
 								  repltok, false);
 
+	conflines = replace_guc_value(conflines, "lc_collate",
+								  lc_collate, false);
+
 	conflines = replace_guc_value(conflines, "lc_messages",
 								  lc_messages, false);
 
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 82ac8646a8d..8a20f76eec8 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -65,6 +65,8 @@ extern bool check_huge_page_size(int *newval, void **extra, GucSource source);
 extern void assign_io_method(int newval, void *extra);
 extern bool check_io_max_concurrency(int *newval, void **extra, GucSource source);
 extern const char *show_in_hot_standby(void);
+extern bool check_locale_collate(char **newval, void **extra, GucSource source);
+extern void assign_locale_collate(const char *newval, void *extra);
 extern bool check_locale_messages(char **newval, void **extra, GucSource source);
 extern void assign_locale_messages(const char *newval, void *extra);
 extern bool check_locale_monetary(char **newval, void **extra, GucSource source);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 53574d2ef85..276be4c1fef 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -41,6 +41,7 @@
 #define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
 
 /* GUC settings */
+extern PGDLLIMPORT char *locale_collate;
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
 extern PGDLLIMPORT char *locale_numeric;
-- 
2.43.0

#75

Chao Li

li.evan.chao@gmail.com

about 2 months ago

In reply to: Jeff Davis (#74)

Re: Remaining dependency on setlocale()

On Nov 26, 2025, at 01:20, Jeff Davis <pgsql@j-davis.com> wrote:

New series attached with only these changes and a rebase.

Regards,
Jeff Davis

<v10-0001-Inline-pg_ascii_tolower-and-pg_ascii_toupper.patch><v10-0002-Add-define-for-UNICODE_CASEMAP_BUFSZ.patch><v10-0003-Change-some-callers-to-use-pg_ascii_toupper.patch><v10-0004-Allow-pg_locale_t-APIs-to-work-when-ctype_is_c.patch><v10-0005-Make-regex-max_chr-depend-on-encoding-not-provid.patch><v10-0006-Fix-inconsistency-between-ltree_strncasecmp-and-.patch><v10-0007-Remove-char_tolower-API.patch><v10-0008-Use-multibyte-aware-extraction-of-pattern-prefix.patch><v10-0009-fuzzystrmatch-use-pg_ascii_toupper.patch><v10-0010-downcase_identifier-use-method-table-from-locale.patch><v10-0011-Control-LC_COLLATE-with-GUC.patch>

I continued reviewing 0005-0008.

0005 - no comment. The change looks correct to me. Before the patch, pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR should be the more common cases, with the patch, MAX_SIMPLE_CHR >= UCHAR_MAX should be the more common cases, thus there should be not a behavior change.

5 - 0006
```
@@ -77,10 +77,37 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 int
 ltree_strncasecmp(const char *a, const char *b, size_t s)
 {
-	char	   *al = str_tolower(a, s, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, s, DEFAULT_COLLATION_OID);
+	static pg_locale_t locale = NULL;
+	size_t		al_sz = s + 1;
+	char	   *al = palloc(al_sz);
+	size_t		bl_sz = s + 1;
+	char	   *bl = palloc(bl_sz);
+	size_t		needed;
 	int			res;

+	if (!locale)
+		locale = pg_database_locale();
+
+	needed = pg_strfold(al, al_sz, a, s, locale);
+	if (needed + 1 > al_sz)
+	{
+		/* grow buffer if needed and retry */
+		al_sz = needed + 1;
+		al = repalloc(al, al_sz);
+		needed = pg_strfold(al, al_sz, a, s, locale);
+		Assert(needed + 1 <= al_sz);
+	}
+
+	needed = pg_strfold(bl, bl_sz, b, s, locale);
+	if (needed + 1 > bl_sz)
+	{
+		/* grow buffer if needed and retry */
+		bl_sz = needed + 1;
+		bl = repalloc(bl, bl_sz);
+		needed = pg_strfold(bl, bl_sz, b, s, locale);
+		Assert(needed + 1 <= bl_sz);
+	}
+
 	res = strncmp(al, bl, s);

pfree(al);
```

I do think the new implementation has some problem.

* The retry logic implies that a single-byte char may become multiple bytes after folding, otherwise retry is not needed because you have allocated s+1 bytes for dest buffers. From this perspective, we should use two needed variables: neededA and neededB, if neededA != neededB, then the two strings are different; if neededA == neededB, then we should be perform strncmp, but here we should pass neededA (or neededB as they are identical) to strncmp(al, bl, neededA).

* Based on the logic you implemented in 0004, first pg_strfold() has copied as many chars as possible to dest buffer, so when retry, ideally we can should resume instead of start over. However, if single-byte->multi-byte folding happens, we have no information to decide from where to resume. From this perspective, in 0004, do we really need to take the try-the-best strategy for strlower_c()? If there are some other use cases that require data to be placed in dest buffer even if dest buffer doesn’t have enough space, then my patch [1]/messages/by-id/CAEoWx2m9mUN397neL=p9x0vaVcj5EGiKD53F1MNTwTDXizxiaA@mail.gmail.com of changing strlower_libc_sb() should be considered.

6 - 0007
```
 	/*
-	 * For efficiency reasons, in the single byte case we don't call lower()
-	 * on the pattern and text, but instead call SB_lower_char on each
-	 * character.  In the multi-byte case we don't have much choice :-(. Also,
-	 * ICU does not support single-character case folding, so we go the long
-	 * way.
+	 * For efficiency reasons, in the C locale we don't call lower() on the
+	 * pattern and text, but instead call SB_lower_char on each character.
 	 */
```

SB_lower_char should be changed to C_IMatchText.

7 - 0007
```
/* setup to compile like_match.c for single byte case insensitive matches */
-#define MATCH_LOWER(t, locale) SB_lower_char((unsigned char) (t), locale)
+#define MATCH_LOWER
 ```

I think the comment should be updated accordingly, like “for ILIKE in the C locale”.

8 - 0008
```
+	/* for text types, use pg_wchar; for BYTEA, use char */
 	if (typeid != BYTEAOID)
 	{
-		patt = TextDatumGetCString(patt_const->constvalue);
-		pattlen = strlen(patt);
+		text	   *val = DatumGetTextPP(patt_const->constvalue);
+		pg_wchar   *wpatt;
+		pg_wchar   *wmatch;
+		char	   *match;
+
+		patt = VARDATA_ANY(val);
+		pattlen = VARSIZE_ANY_EXHDR(val);
+		wpatt = palloc((pattlen + 1) * sizeof(pg_wchar));
+		wmatch = palloc((pattlen + 1) * sizeof(pg_wchar));
+		pg_mb2wchar_with_len(patt, wpatt, pattlen);
+
+		match = palloc(pattlen + 1);
```

* match is allocated with pattlen+1 bytes, is it long enough to hold pattlen multiple-byte chars?

* match is not freed, but looks like it should be:

*prefix_const = string_to_const(match, typeid);
-> string_to_datum(str, datatype);
-> CStringGetTextDatum(str);
-> cstring_to_text(s)
-> cstring_to_text_with_len(s, strlen(s));
-> *result = (text *) palloc(len + VARHDRSZ);

So, match has been copied, it should be freed.

9 - 0008
```
-	}
+			/* Backslash escapes the next character */
+			if (patt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}

-	match[match_pos] = '\0';
+			match[match_pos++] = pos;
+		}
```

Should “pos” be “part[pos]” assigning to match[match_pos++]?

I will review the rest 3 commits tomorrow.

[1]: /messages/by-id/CAEoWx2m9mUN397neL=p9x0vaVcj5EGiKD53F1MNTwTDXizxiaA@mail.gmail.com

--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#76

Chao Li

li.evan.chao@gmail.com

about 2 months ago

In reply to: Chao Li (#75)

Re: Remaining dependency on setlocale()

On Nov 26, 2025, at 09:50, Chao Li <li.evan.chao@gmail.com> wrote:

I will review the rest 3 commits tomorrow.

10 - 0009
```
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = pg_ascii_toupper((unsigned char) c);
```

Just curious. As isaplha() and toupper() come from the same header file ctype.h, if we replace toupper with pg_ascii_toupper, does isapha also need to be handled?

11 - 0010
```
-	for (i = 0; i < len; i++)
-	{
-		unsigned char ch = (unsigned char) ident[i];
+	dstsize = len + 1;
+	result = palloc(dstsize);

-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
-		result[i] = (char) ch;
-	}
-	result[i] = '\0';
+	needed = pg_strfold_ident(result, dstsize, ident, len);
+	Assert(needed + 1 == dstsize);
+	Assert(needed == len);
```

I think assert both dstsize and len are redundant, because dstsize=len+1, and no place to change their values.

12 - 0010
```
+/*
+ * Fold an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifier folding. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_strfold_ident(char *dest, size_t destsize, const char *src, ssize_t srclen)
+{
+	if (default_locale == NULL || default_locale->ctype == NULL)
+	{
+		int			i;
+
+		for (i = 0; i < srclen && i < destsize; i++)
+		{
+			unsigned char ch = (unsigned char) src[i];
+
+			if (ch >= 'A' && ch <= 'Z')
+				ch += 'a' - 'A';
+			dest[i] = (char) ch;
+		}
+
+		if (i < destsize)
+			dest[i] = '\0';
+
+		return srclen;
+	}
+	return default_locale->ctype->strfold_ident(dest, destsize, src, srclen,
+												default_locale);
+}
```

Given default_local can be NULL only at some specific moment, can we do something like

Local = default_local;
If (local == NULL || local->ctype == NULL)
Local = libc or other fallback;
Return default_locale->ctype->strfold_ident(dest, destsize, src, srclen, local);

This way avoids the duplicate code.

13 - 0011
```
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
```

I just feel the GUC name is very misleading. Without carefully reading the doc, users may very easy to consider lc_collate the system’s locale. If it only affects extensions, then let’s name it accordingly, for example, “extension_lc_collate”, or “legacy_lc_collate”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#77

Jeff Davis

pgsql@j-davis.com

about 1 month ago

In reply to: Chao Li (#75)

9 attachment(s)

Re: Remaining dependency on setlocale()

On Wed, 2025-11-26 at 09:50 +0800, Chao Li wrote:

* The retry logic implies that a single-byte char may become multiple
bytes after folding, otherwise retry is not needed because you have
allocated s+1 bytes for dest buffers. From this perspective, we
should use two needed variables: neededA and neededB, if neededA !=
neededB, then the two strings are different; if neededA == neededB,
then we should be perform strncmp, but here we should pass neededA
(or neededB as they are identical) to strncmp(al, bl, neededA).

Thank you.

It's actually worse than that: having a single 's' argument is just
completely wrong. Consider:

a: U&'x\0394\0394\0394'
b: U&'\0394\0394\0394'

There is no value for byte length 's' for which both 'a' and 'b' are
properly-encoded strings. So, the current code passes invalid byte
sequences to LOWER(), which is another pre-existing bug.

ltree_strncasecmp() is only used for checking equality of the first s
bytes of the query, so let's make it a safer API that just checks
prefix equality. Attached.

* Based on the logic you implemented in 0004, first pg_strfold() has
copied as many chars as possible to dest buffer, so when retry,
ideally we can should resume instead of start over. However, if
single-byte->multi-byte folding happens, we have no information to
decide from where to resume.

Right.

That suggests that we might want some kind of lazy or iterator-based
API for string folding. We'd need to find the right way to do that with
ICU. If we find that it's a performance problem somewhere, we can look
into that. Do you think we need that now?

From this perspective, in 0004, do we really need to take the try-
the-best strategy for strlower_c()? If there are some other use cases
that require data to be placed in dest buffer even if dest buffer
doesn’t have enough space, then my patch [1] of changing
strlower_libc_sb() should be considered.

I will look into that.

SB_lower_char should be changed to C_IMatchText.

Updated comment.

I think the comment should be updated accordingly, like “for ILIKE in
the C locale”.

Done, thank you.

* match is allocated with pattlen+1 bytes, is it long enough to hold
pattlen multiple-byte chars?

* match is not freed, but looks like it should be:

...

Should “pos” be “part[pos]” assigning to match[match_pos++]?

All fixed, thank you! (I apologize for posting a patch in that state to
begin with...)

I also reorganized slightly to separate out the pg_iswcased() API into
its own patch, and moved the like_support.c changes from the ctype_is_c
patch (already committed: 1476028225) into the pattern prefixes patch.

Regards,
Jeff Davis

Attachments:

v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patchtext/x-patch; charset=UTF-8; name=v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patchDownload

From bf07696da676482d874602bd0f267b979dea5b82 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 26 Nov 2025 10:28:45 -0800
Subject: [PATCH v11 6/9] Use multibyte-aware extraction of pattern prefixes.

Previously, like_fixed_prefix() used char-at-a-time logic, which
forced it to be too conservative for case-insensitive matching.

Now, use pg_wchar-at-a-time loop for text types, along with proper
detection of cased characters; and preserve and char-at-a-time logic
for bytea.

Removes the pg_locale_t char_is_cased() single-byte method and
replaces it with a proper multibyte pg_iswcased() method.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/like.c              |   9 +-
 src/backend/utils/adt/like_support.c      | 128 ++++++++++++----------
 src/backend/utils/adt/pg_locale.c         |  15 ---
 src/backend/utils/adt/pg_locale_builtin.c |   8 --
 src/backend/utils/adt/pg_locale_icu.c     |   8 --
 src/backend/utils/adt/pg_locale_libc.c    |  14 ---
 src/include/utils/pg_locale.h             |   3 -
 7 files changed, 76 insertions(+), 109 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4a7fc583c71..772879f0a81 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -118,7 +118,7 @@ wchareq(const char *p1, const char *p2)
 
 #include "like_match.c"
 
-/* setup to compile like_match.c for single byte case insensitive matches */
+/* setup to compile like_match.c for case-insensitive matches in C locale */
 #define MATCH_LOWER
 #define NextChar(p, plen) NextByte((p), (plen))
 #define MatchText C_IMatchText
@@ -190,8 +190,11 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
-	 * For efficiency reasons, in the C locale we don't call lower() on the
-	 * pattern and text, but instead call SB_lower_char on each character.
+	 * For efficiency reasons, in the C locale lowercase each character
+	 * lazily.  Otherwise, we lowercase the entire pattern and text strings
+	 * prior to matching.
+	 *
+	 * XXX: use casefolding instead?
 	 */
 
 	if (locale->ctype_is_c)
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 999f23f86d5..2db06bd1728 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -99,8 +99,6 @@ static Selectivity like_selectivity(const char *patt, int pattlen,
 static Selectivity regex_selectivity(const char *patt, int pattlen,
 									 bool case_insensitive,
 									 int fixed_prefix_len);
-static int	pattern_char_isalpha(char c, bool is_multibyte,
-								 pg_locale_t locale);
 static Const *make_greater_string(const Const *str_const, FmgrInfo *ltproc,
 								  Oid collation);
 static Datum string_to_datum(const char *str, Oid datatype);
@@ -989,13 +987,11 @@ static Pattern_Prefix_Status
 like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				  Const **prefix_const, Selectivity *rest_selec)
 {
-	char	   *match;
 	char	   *patt;
 	int			pattlen;
 	Oid			typeid = patt_const->consttype;
-	int			pos,
-				match_pos;
-	bool		is_multibyte = (pg_database_encoding_max_length() > 1);
+	int			pos;
+	int			match_pos = 0;
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1023,60 +1019,94 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 		locale = pg_newlocale_from_collation(collation);
 	}
 
+	/* for text types, use pg_wchar; for BYTEA, use char */
 	if (typeid != BYTEAOID)
 	{
-		patt = TextDatumGetCString(patt_const->constvalue);
-		pattlen = strlen(patt);
+		text	   *val = DatumGetTextPP(patt_const->constvalue);
+		pg_wchar   *wpatt;
+		pg_wchar   *wmatch;
+		char	   *match;
+		int			match_mblen;
+
+		patt = VARDATA_ANY(val);
+		pattlen = VARSIZE_ANY_EXHDR(val);
+		wpatt = palloc((pattlen + 1) * sizeof(pg_wchar));
+		pg_mb2wchar_with_len(patt, wpatt, pattlen);
+
+		wmatch = palloc((pattlen + 1) * sizeof(pg_wchar));
+		for (pos = 0; pos < pattlen; pos++)
+		{
+			/* % and _ are wildcard characters in LIKE */
+			if (wpatt[pos] == '%' ||
+				wpatt[pos] == '_')
+				break;
+
+			/* Backslash escapes the next character */
+			if (wpatt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
+
+			/*
+			 * For ILIKE, stop if it's a case-varying character (it's sort of
+			 * a wildcard).
+			 */
+			if (case_insensitive && pg_iswcased(wpatt[pos], locale))
+				break;
+
+			wmatch[match_pos++] = wpatt[pos];
+		}
+
+		wmatch[match_pos] = '\0';
+
+		match_mblen = pg_database_encoding_max_length() * match_pos + 1;
+		match = palloc(match_mblen);
+		pg_wchar2mb_with_len(wmatch, match, match_pos);
+
+		pfree(wpatt);
+		pfree(wmatch);
+
+		*prefix_const = string_to_const(match, typeid);
+		pfree(match);
 	}
 	else
 	{
 		bytea	   *bstr = DatumGetByteaPP(patt_const->constvalue);
+		char	   *match;
 
+		patt = VARDATA_ANY(bstr);
 		pattlen = VARSIZE_ANY_EXHDR(bstr);
-		patt = (char *) palloc(pattlen);
-		memcpy(patt, VARDATA_ANY(bstr), pattlen);
-		Assert((Pointer) bstr == DatumGetPointer(patt_const->constvalue));
-	}
-
-	match = palloc(pattlen + 1);
-	match_pos = 0;
-	for (pos = 0; pos < pattlen; pos++)
-	{
-		/* % and _ are wildcard characters in LIKE */
-		if (patt[pos] == '%' ||
-			patt[pos] == '_')
-			break;
 
-		/* Backslash escapes the next character */
-		if (patt[pos] == '\\')
+		match = palloc(pattlen + 1);
+		for (pos = 0; pos < pattlen; pos++)
 		{
-			pos++;
-			if (pos >= pattlen)
+			/* % and _ are wildcard characters in LIKE */
+			if (patt[pos] == '%' ||
+				patt[pos] == '_')
 				break;
-		}
 
-		/* Stop if case-varying character (it's sort of a wildcard) */
-		if (case_insensitive &&
-			pattern_char_isalpha(patt[pos], is_multibyte, locale))
-			break;
-
-		match[match_pos++] = patt[pos];
-	}
+			/* Backslash escapes the next character */
+			if (patt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
 
-	match[match_pos] = '\0';
+			match[match_pos++] = patt[pos];
+		}
 
-	if (typeid != BYTEAOID)
-		*prefix_const = string_to_const(match, typeid);
-	else
 		*prefix_const = string_to_bytea_const(match, match_pos);
 
+		pfree(match);
+	}
+
 	if (rest_selec != NULL)
 		*rest_selec = like_selectivity(&patt[pos], pattlen - pos,
 									   case_insensitive);
 
-	pfree(patt);
-	pfree(match);
-
 	/* in LIKE, an empty pattern is an exact match! */
 	if (pos == pattlen)
 		return Pattern_Prefix_Exact;	/* reached end of pattern, so exact */
@@ -1481,24 +1511,6 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
 	return sel;
 }
 
-/*
- * Check whether char is a letter (and, hence, subject to case-folding)
- *
- * In multibyte character sets or with ICU, we can't use isalpha, and it does
- * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha.
- * Instead, just assume any non-ASCII char is potentially case-varying, and
- * hard-wire knowledge of which ASCII chars are letters.
- */
-static int
-pattern_char_isalpha(char c, bool is_multibyte,
-					 pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else
-		return char_is_cased(c, locale);
-}
-
 
 /*
  * For bytea, the increment function need only increment the current byte
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index e5e75ca2c2c..c4e89502f85 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1625,21 +1625,6 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_tolower(wc, locale);
 }
 
-/*
- * char_is_cased()
- *
- * Fuzzy test of whether the given char is case-varying or not. The argument
- * is a single byte, so in a multibyte encoding, just assume any non-ASCII
- * char is case-varying.
- */
-bool
-char_is_cased(char ch, pg_locale_t locale)
-{
-	if (locale->ctype == NULL)
-		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-	return locale->ctype->char_is_cased(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 0d4c754a267..0c2920112bb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -191,13 +191,6 @@ wc_iscased_builtin(pg_wchar wc, pg_locale_t locale)
 	return pg_u_prop_cased(to_char32(wc));
 }
 
-static bool
-char_is_cased_builtin(char ch, pg_locale_t locale)
-{
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-}
-
 static pg_wchar
 wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
 {
@@ -225,7 +218,6 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_ispunct = wc_ispunct_builtin,
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
-	.char_is_cased = char_is_cased_builtin,
 	.wc_iscased = wc_iscased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index e8820666b2d..18d026deda8 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,13 +121,6 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
-static bool
-char_is_cased_icu(char ch, pg_locale_t locale)
-{
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-}
-
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
  * UChar32, which is correct for the UTF-8 encoding, but not in general.
@@ -244,7 +237,6 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_ispunct = wc_ispunct_icu,
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
-	.char_is_cased = char_is_cased_icu,
 	.wc_iscased = wc_iscased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index cd54198f0c7..4cb3c64b4a6 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -262,17 +262,6 @@ wc_iscased_libc_mb(pg_wchar wc, pg_locale_t locale)
 		iswlower_l((wint_t) wc, locale->lt);
 }
 
-static bool
-char_is_cased_libc(char ch, pg_locale_t locale)
-{
-	bool		is_multibyte = pg_database_encoding_max_length() > 1;
-
-	if (is_multibyte && IS_HIGHBIT_SET(ch))
-		return true;
-	else
-		return isalpha_l((unsigned char) ch, locale->lt);
-}
-
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
@@ -345,7 +334,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
 	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
@@ -371,7 +359,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
 	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
@@ -393,7 +380,6 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_ispunct = wc_ispunct_libc_mb,
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
-	.char_is_cased = char_is_cased_libc,
 	.wc_iscased = wc_iscased_libc_mb,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 832007385d8..01f891def7a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -125,9 +125,6 @@ struct ctype_methods
 	bool		(*wc_iscased) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
-
-	/* required */
-	bool		(*char_is_cased) (char ch, pg_locale_t locale);
 };
 
 /*
-- 
2.43.0

v11-0007-fuzzystrmatch-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v11-0007-fuzzystrmatch-use-pg_ascii_toupper.patchDownload

From 9d99649a07b5eb165254ca43ada6ffd1d4e36555 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 13:24:38 -0800
Subject: [PATCH v11 7/9] fuzzystrmatch: use pg_ascii_toupper().

fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 contrib/fuzzystrmatch/dmetaphone.c    |  2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..bb5d3e90756 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -284,7 +284,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = pg_ascii_toupper((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..7f07efc2c35 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -62,7 +62,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = pg_ascii_toupper((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +124,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = pg_ascii_toupper((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +301,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (pg_ascii_toupper((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (pg_ascii_toupper((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? pg_ascii_toupper((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? pg_ascii_toupper((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) pg_ascii_toupper((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +742,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) pg_ascii_toupper((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

v11-0008-downcase_identifier-use-method-table-from-locale.patchtext/x-patch; charset=UTF-8; name=v11-0008-downcase_identifier-use-method-table-from-locale.patchDownload

From de1d8c438c74cbb0b8bba70172f02e746db21a05 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 20 Oct 2025 16:32:18 -0700
Subject: [PATCH v11 8/9] downcase_identifier(): use method table from locale
 provider.

Previously, libc's tolower() was always used for identifier case
folding, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
casefolding.

For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().

One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to fold the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/parser/scansup.c              | 39 +++++++---------
 src/backend/utils/adt/pg_locale.c         | 32 +++++++++++++
 src/backend/utils/adt/pg_locale_builtin.c | 24 ++++++++++
 src/backend/utils/adt/pg_locale_icu.c     | 36 ++++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 55 +++++++++++++++++++++++
 src/include/utils/pg_locale.h             |  5 +++
 6 files changed, 166 insertions(+), 25 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..0bd049643d1 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -46,35 +47,25 @@ char *
 downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 {
 	char	   *result;
-	int			i;
-	bool		enc_is_single_byte;
-
-	result = palloc(len + 1);
-	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	size_t		dstsize;
+	size_t		needed pg_attribute_unused();
 
 	/*
-	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
-	 * have the infrastructure for.  Instead we use tolower() to provide a
-	 * locale-aware translation.  However, there are some locales where this
-	 * is not right either (eg, Turkish may do strange things with 'i' and
-	 * 'I').  Our current compromise is to use tolower() for characters with
-	 * the high bit set, as long as they aren't part of a multi-byte
-	 * character, and use an ASCII-only downcasing for 7-bit characters.
+	 * Preserves string length.
+	 *
+	 * NB: if we decide to support Unicode-aware identifier case folding, then
+	 * we need to account for a change in string length.
 	 */
-	for (i = 0; i < len; i++)
-	{
-		unsigned char ch = (unsigned char) ident[i];
+	dstsize = len + 1;
+	result = palloc(dstsize);
 
-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
-		result[i] = (char) ch;
-	}
-	result[i] = '\0';
+	needed = pg_strfold_ident(result, dstsize, ident, len);
+	Assert(needed + 1 == dstsize);
+	Assert(needed == len);
+	Assert(result[len] == '\0');
 
-	if (i >= NAMEDATALEN && truncate)
-		truncate_identifier(result, i, warn);
+	if (len >= NAMEDATALEN && truncate)
+		truncate_identifier(result, len, warn);
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c4e89502f85..9167018c85b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1352,6 +1352,38 @@ pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
+/*
+ * Fold an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifier folding. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_strfold_ident(char *dest, size_t destsize, const char *src, ssize_t srclen)
+{
+	if (default_locale == NULL || default_locale->ctype == NULL)
+	{
+		int			i;
+
+		for (i = 0; i < srclen && i < destsize; i++)
+		{
+			unsigned char ch = (unsigned char) src[i];
+
+			if (ch >= 'A' && ch <= 'Z')
+				ch += 'a' - 'A';
+			dest[i] = (char) ch;
+		}
+
+		if (i < destsize)
+			dest[i] = '\0';
+
+		return srclen;
+	}
+	return default_locale->ctype->strfold_ident(dest, destsize, src, srclen,
+												default_locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 0c2920112bb..659e588d513 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -125,6 +125,29 @@ strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 						   locale->builtin.casemap_full);
 }
 
+static size_t
+strfold_ident_builtin(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
@@ -208,6 +231,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.strtitle = strtitle_builtin,
 	.strupper = strupper_builtin,
 	.strfold = strfold_builtin,
+	.strfold_ident = strfold_ident_builtin,
 	.wc_isdigit = wc_isdigit_builtin,
 	.wc_isalpha = wc_isalpha_builtin,
 	.wc_isalnum = wc_isalnum_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 18d026deda8..39b153a4262 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -61,6 +61,8 @@ static size_t strupper_icu(char *dest, size_t destsize, const char *src,
 						   ssize_t srclen, pg_locale_t locale);
 static size_t strfold_icu(char *dest, size_t destsize, const char *src,
 						  ssize_t srclen, pg_locale_t locale);
+static size_t strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+								ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -123,7 +125,7 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
- * UChar32, which is correct for the UTF-8 encoding, but not in general.
+ * UChar32, which is correct for UTF-8 and LATIN1, but not in general.
  */
 
 static pg_wchar
@@ -227,6 +229,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 	.strfold = strfold_icu,
+	.strfold_ident = strfold_ident_icu,
 	.wc_isdigit = wc_isdigit_icu,
 	.wc_isalpha = wc_isalpha_icu,
 	.wc_isalnum = wc_isalnum_icu,
@@ -564,6 +567,37 @@ strfold_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
+/*
+ * For historical compatibility, behavior is not multibyte-aware.
+ *
+ * NB: uses libc tolower() for single-byte encodings (also for historical
+ * compatibility), and therefore relies on the global LC_CTYPE setting.
+ */
+static size_t
+strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+				  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+	bool		enc_is_single_byte;
+
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
+			ch = tolower(ch);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 /*
  * strncoll_icu_utf8
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 4cb3c64b4a6..85c7885a8ae 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -318,12 +318,65 @@ tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 		return wc;
 }
 
+/*
+ * Characters A..Z always fold to a..z, even in the Turkish locale. Characters
+ * beyond 127 use tolower().
+ */
+static size_t
+strfold_ident_libc_sb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	locale_t	loc = locale->lt;
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (IS_HIGHBIT_SET(ch) && isupper_l(ch, loc))
+			ch = tolower_l(ch, loc);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
+/*
+ * For historical reasons, not multibyte-aware; uses plain ASCII semantics.
+ */
+static size_t
+strfold_ident_libc_mb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_sb,
+	.strfold_ident = strfold_ident_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -349,6 +402,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -370,6 +424,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 01f891def7a..53574d2ef85 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -110,6 +110,9 @@ struct ctype_methods
 	size_t		(*strfold) (char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+	size_t		(*strfold_ident) (char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
 
 	/* required */
 	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
@@ -188,6 +191,8 @@ extern size_t pg_strupper(char *dst, size_t dstsize,
 extern size_t pg_strfold(char *dst, size_t dstsize,
 						 const char *src, ssize_t srclen,
 						 pg_locale_t locale);
+extern size_t pg_strfold_ident(char *dst, size_t dstsize,
+							   const char *src, ssize_t srclen);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
-- 
2.43.0

v11-0009-Control-LC_COLLATE-with-GUC.patchtext/x-patch; charset=UTF-8; name=v11-0009-Control-LC_COLLATE-with-GUC.patchDownload

From d7970e1db1b3185c509be22839857ecc4c2a140e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 14:00:52 -0800
Subject: [PATCH v11 9/9] Control LC_COLLATE with GUC.

Now that the global LC_COLLATE setting is not used for any in-core
purpose at all (see commit 5e6e42e44f), allow it to be set with a
GUC. This may be useful for extensions or procedural languages that
still depend on the global LC_COLLATE setting.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 +++++++++++++++++++
 src/backend/utils/init/postinit.c             |  2 +
 src/backend/utils/misc/guc_parameters.dat     |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/bin/initdb/initdb.c                       |  3 +
 src/include/utils/guc_hooks.h                 |  2 +
 src/include/utils/pg_locale.h                 |  1 +
 7 files changed, 78 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9167018c85b..91e7eba2eac 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -81,6 +81,7 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
+char	   *locale_collate;
 char	   *locale_messages;
 char	   *locale_monetary;
 char	   *locale_numeric;
@@ -369,6 +370,64 @@ assign_locale_time(const char *newval, void *extra)
 	CurrentLCTimeValid = false;
 }
 
+/*
+ * We allow LC_COLLATE to actually be set globally.
+ *
+ * Note: we normally disallow value = "" because it wouldn't have consistent
+ * semantics (it'd effectively just use the previous value).  However, this
+ * is the value passed for PGC_S_DEFAULT, so don't complain in that case,
+ * not even if the attempted setting fails due to invalid environment value.
+ * The idea there is just to accept the environment setting *if possible*
+ * during startup, until we can read the proper value from postgresql.conf.
+ */
+bool
+check_locale_collate(char **newval, void **extra, GucSource source)
+{
+	int			locale_enc;
+	int			db_enc;
+
+	if (**newval == '\0')
+	{
+		if (source == PGC_S_DEFAULT)
+			return true;
+		else
+			return false;
+	}
+
+	locale_enc = pg_get_encoding_from_locale(*newval, true);
+	db_enc = GetDatabaseEncoding();
+
+	if (!(locale_enc == db_enc ||
+		  locale_enc == PG_SQL_ASCII ||
+		  db_enc == PG_SQL_ASCII ||
+		  locale_enc == -1))
+	{
+		if (source == PGC_S_FILE)
+		{
+			guc_free(*newval);
+			*newval = guc_strdup(LOG, "C");
+			if (!*newval)
+				return false;
+		}
+		else if (source != PGC_S_TEST)
+		{
+			ereport(WARNING,
+					(errmsg("encoding mismatch"),
+					 errdetail("Locale \"%s\" uses encoding \"%s\", which does not match database encoding \"%s\".",
+							   *newval, pg_encoding_to_char(locale_enc), pg_encoding_to_char(db_enc))));
+			return false;
+		}
+	}
+
+	return check_locale(LC_COLLATE, *newval, NULL);
+}
+
+void
+assign_locale_collate(const char *newval, void *extra)
+{
+	(void) pg_perm_setlocale(LC_COLLATE, newval);
+}
+
 /*
  * We allow LC_MESSAGES to actually be set globally.
  *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..c99d57eba48 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -404,6 +404,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	 * the pg_database tuple.
 	 */
 	SetDatabaseEncoding(dbform->encoding);
+	/* Reset lc_collate to check encoding, and fall back to C if necessary */
+	SetConfigOption("lc_collate", locale_collate, PGC_POSTMASTER, PGC_S_FILE);
 	/* Record it as a GUC internal option, too */
 	SetConfigOption("server_encoding", GetDatabaseEncodingName(),
 					PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..a36c680719f 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1457,6 +1457,15 @@
   boot_val => 'PG_KRB_SRVTAB',
 },
 
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
+  long_desc => 'An empty string means use the operating system setting.',
+  variable => 'locale_collate',
+  boot_val => '""',
+  check_hook => 'check_locale_collate',
+  assign_hook => 'assign_locale_collate',
+},
+
 { name => 'lc_messages', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
   short_desc => 'Sets the language in which messages are displayed.',
   long_desc => 'An empty string means use the operating system setting.',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..19332e39e82 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -798,6 +798,8 @@
                                         # encoding
 
 # These settings are initialized by initdb, but they can be changed.
+#lc_collate = ''                        # locale for text ordering (only affects
+                                        # extensions)
 #lc_messages = ''                       # locale for system error message
                                         # strings
 #lc_monetary = 'C'                      # locale for monetary formatting
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92fe2f531f7..8b2e7bfab6f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1312,6 +1312,9 @@ setup_config(void)
 	conflines = replace_guc_value(conflines, "shared_buffers",
 								  repltok, false);
 
+	conflines = replace_guc_value(conflines, "lc_collate",
+								  lc_collate, false);
+
 	conflines = replace_guc_value(conflines, "lc_messages",
 								  lc_messages, false);
 
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 82ac8646a8d..8a20f76eec8 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -65,6 +65,8 @@ extern bool check_huge_page_size(int *newval, void **extra, GucSource source);
 extern void assign_io_method(int newval, void *extra);
 extern bool check_io_max_concurrency(int *newval, void **extra, GucSource source);
 extern const char *show_in_hot_standby(void);
+extern bool check_locale_collate(char **newval, void **extra, GucSource source);
+extern void assign_locale_collate(const char *newval, void *extra);
 extern bool check_locale_messages(char **newval, void **extra, GucSource source);
 extern void assign_locale_messages(const char *newval, void *extra);
 extern bool check_locale_monetary(char **newval, void **extra, GucSource source);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 53574d2ef85..276be4c1fef 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -41,6 +41,7 @@
 #define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
 
 /* GUC settings */
+extern PGDLLIMPORT char *locale_collate;
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
 extern PGDLLIMPORT char *locale_numeric;
-- 
2.43.0

v11-0001-Change-some-callers-to-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v11-0001-Change-some-callers-to-use-pg_ascii_toupper.patchDownload

From a70ce0d50ae47ddaf3c310ebf94d24fdc642e074 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 09:09:06 -0800
Subject: [PATCH v11 1/9] Change some callers to use pg_ascii_toupper().

The input is ASCII anyway, so it's better to be clear that it's not
locale-dependent.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/access/transam/xlogfuncs.c | 2 +-
 src/backend/utils/adt/cash.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e45fce43ed..a50345f9bf7 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -479,7 +479,7 @@ pg_split_walfile_name(PG_FUNCTION_ARGS)
 
 	/* Capitalize WAL file name. */
 	for (p = fname_upper; *p; p++)
-		*p = pg_toupper((unsigned char) *p);
+		*p = pg_ascii_toupper((unsigned char) *p);
 
 	if (!IsXLogFileName(fname_upper))
 		ereport(ERROR,
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index 611d23f3cb0..623f6eec056 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -1035,7 +1035,7 @@ cash_words(PG_FUNCTION_ARGS)
 	appendStringInfoString(&buf, m0 == 1 ? " cent" : " cents");
 
 	/* capitalize output */
-	buf.data[0] = pg_toupper((unsigned char) buf.data[0]);
+	buf.data[0] = pg_ascii_toupper((unsigned char) buf.data[0]);
 
 	/* return as text datum */
 	res = cstring_to_text_with_len(buf.data, buf.len);
-- 
2.43.0

v11-0002-Make-regex-max_chr-depend-on-encoding-not-provid.patchtext/x-patch; charset=UTF-8; name=v11-0002-Make-regex-max_chr-depend-on-encoding-not-provid.patchDownload

From de4c04590f095d543cc24217945a259236ea866f Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 21 Nov 2025 12:41:47 -0800
Subject: [PATCH v11 2/9] Make regex "max_chr" depend on encoding, not
 provider.

The previous per-provider "max_chr" field was there as a hack to
preserve the exact prior behavior, which depended on the
provider. Change to depend on the encoding, which makes more sense,
and remove the per-provider logic.

The only difference is for ICU: previously it always used
MAX_SIMPLE_CHR (0x7FF) regardless of the encoding; whereas now it will
match libc and use MAX_SIMPLE_CHR for UTF-8, and MAX_UCHAR for other
encodings. That's possibly a loss for non-UTF8 multibyte encodings,
but a win for single-byte encodings. Regardless, this distinction was
not worth the complexity.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
 src/backend/regex/regc_pg_locale.c     | 18 ++++++++++--------
 src/backend/utils/adt/pg_locale_libc.c |  2 --
 src/include/utils/pg_locale.h          |  6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 4698f110a0c..bb0e3f1d139 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -320,16 +320,18 @@ regc_ctype_get_cache(regc_wc_probefunc probefunc, int cclasscode)
 		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
 	}
+	else if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+	}
 	else
 	{
-		if (pg_regex_locale->ctype->max_chr != 0 &&
-			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
-		{
-			max_chr = pg_regex_locale->ctype->max_chr;
-			pcc->cv.cclasscode = -1;
-		}
-		else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#if MAX_SIMPLE_CHR >= UCHAR_MAX
+		max_chr = (pg_wchar) UCHAR_MAX;
+		pcc->cv.cclasscode = -1;
+#else
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#endif
 	}
 
 	/*
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index e2beee44335..6ad3f93b543 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -342,7 +342,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 /*
@@ -369,7 +368,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 static const struct ctype_methods ctype_methods_libc_utf8 = {
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 54193a17a90..42e21e7fb8a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -134,12 +134,6 @@ struct ctype_methods
 	 * pg_strlower().
 	 */
 	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
-
-	/*
-	 * For regex and pattern matching efficiency, the maximum char value
-	 * supported by the above methods. If zero, limit is set by regex code.
-	 */
-	pg_wchar	max_chr;
 };
 
 /*
-- 
2.43.0

v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patchtext/x-patch; charset=UTF-8; name=v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patchDownload

From d485548107cc9c5833185932d462febe8fdf7ef1 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 10:20:36 -0800
Subject: [PATCH v11 3/9] Fix inconsistency between ltree_strncasecmp() and
 ltree_crc32_sz().

Previously, ltree_strncasecmp() used lowercasing with the default
collation; while ltree_crc32_sz used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding is single-byte.

Change both to use casefolding with the default collation.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
---
 contrib/ltree/crc32.c        | 46 ++++++++++++++++++----
 contrib/ltree/lquery_op.c    | 74 ++++++++++++++++++++++++++++++------
 contrib/ltree/ltree.h        |  6 ++-
 contrib/ltree/ltxtquery_op.c |  8 ++--
 4 files changed, 108 insertions(+), 26 deletions(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..3918d4a0ec2 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -10,31 +10,61 @@
 #include "postgres.h"
 #include "ltree.h"
 
+#include "crc32.h"
+#include "utils/pg_crc.h"
 #ifdef LOWER_NODE
-#include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
-#else
-#define TOLOWER(x)	(x)
+#include "utils/pg_locale.h"
 #endif
 
-#include "crc32.h"
-#include "utils/pg_crc.h"
+#ifdef LOWER_NODE
 
 unsigned int
 ltree_crc32_sz(const char *buf, int size)
 {
 	pg_crc32	crc;
 	const char *p = buf;
+	static pg_locale_t locale = NULL;
+
+	if (!locale)
+		locale = pg_database_locale();
 
 	INIT_TRADITIONAL_CRC32(crc);
 	while (size > 0)
 	{
-		char		c = (char) TOLOWER(*p);
+		char		foldstr[UNICODE_CASEMAP_BUFSZ];
+		int			srclen = pg_mblen(p);
+		size_t		foldlen;
+
+		/* fold one codepoint at a time */
+		foldlen = pg_strfold(foldstr, UNICODE_CASEMAP_BUFSZ, p, srclen,
+							 locale);
+
+		COMP_TRADITIONAL_CRC32(crc, foldstr, foldlen);
+
+		size -= srclen;
+		p += srclen;
+	}
+	FIN_TRADITIONAL_CRC32(crc);
+	return (unsigned int) crc;
+}
+
+#else
 
-		COMP_TRADITIONAL_CRC32(crc, &c, 1);
+unsigned int
+ltree_crc32_sz(const char *buf, int size)
+{
+	pg_crc32	crc;
+	const char *p = buf;
+
+	INIT_TRADITIONAL_CRC32(crc);
+	while (size > 0)
+	{
+		COMP_TRADITIONAL_CRC32(crc, p, 1);
 		size--;
 		p++;
 	}
 	FIN_TRADITIONAL_CRC32(crc);
 	return (unsigned int) crc;
 }
+
+#endif							/* !LOWER_NODE */
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index a6466f575fd..ba8e114d742 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -41,7 +41,9 @@ getlexeme(char *start, char *end, int *len)
 }
 
 bool
-compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *, const char *, size_t), bool anyend)
+compare_subnode(ltree_level *t, char *qn, int len,
+				bool (*prefix_eq) (const char *, size_t, const char *, size_t),
+				bool anyend)
 {
 	char	   *endt = t->name + t->len;
 	char	   *endq = qn + len;
@@ -57,7 +59,7 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 		while ((tn = getlexeme(tn, endt, &lent)) != NULL)
 		{
 			if ((lent == lenq || (lent > lenq && anyend)) &&
-				(*cmpptr) (qn, tn, lenq) == 0)
+				(*prefix_eq) (qn, lenq, tn, lent))
 			{
 
 				isok = true;
@@ -74,14 +76,62 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 	return true;
 }
 
-int
-ltree_strncasecmp(const char *a, const char *b, size_t s)
+/*
+ * Check if b has a prefix of a.
+ */
+bool
+ltree_prefix_eq(const char *a, size_t a_sz, const char *b, size_t b_sz)
+{
+	if (a_sz > b_sz)
+		return false;
+	else
+		return (strncmp(a, b, a_sz) == 0);
+}
+
+/*
+ * Case-insensitive check if b has a prefix of a.
+ */
+bool
+ltree_prefix_eq_ci(const char *a, size_t a_sz, const char *b, size_t b_sz)
 {
-	char	   *al = str_tolower(a, s, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, s, DEFAULT_COLLATION_OID);
-	int			res;
+	static pg_locale_t locale = NULL;
+	size_t		al_sz = a_sz + 1;
+	size_t		al_len;
+	char	   *al = palloc(al_sz);
+	size_t		bl_sz = b_sz + 1;
+	size_t		bl_len;
+	char	   *bl = palloc(bl_sz);
+	bool		res;
+
+	if (!locale)
+		locale = pg_database_locale();
+
+	/* casefold both a and b */
+
+	al_len = pg_strfold(al, al_sz, a, a_sz, locale);
+	if (al_len + 1 > al_sz)
+	{
+		/* grow buffer if needed and retry */
+		al_sz = al_len + 1;
+		al = repalloc(al, al_sz);
+		al_len = pg_strfold(al, al_sz, a, a_sz, locale);
+		Assert(al_len + 1 <= al_sz);
+	}
+
+	bl_len = pg_strfold(bl, bl_sz, b, b_sz, locale);
+	if (bl_len + 1 > bl_sz)
+	{
+		/* grow buffer if needed and retry */
+		bl_sz = bl_len + 1;
+		bl = repalloc(bl, bl_sz);
+		bl_len = pg_strfold(bl, bl_sz, b, b_sz, locale);
+		Assert(bl_len + 1 <= bl_sz);
+	}
 
-	res = strncmp(al, bl, s);
+	if (al_len > bl_len)
+		res = false;
+	else
+		res = (strncmp(al, bl, al_len) == 0);
 
 	pfree(al);
 	pfree(bl);
@@ -109,19 +159,19 @@ checkLevel(lquery_level *curq, ltree_level *curt)
 
 	for (int i = 0; i < curq->numvar; i++)
 	{
-		int			(*cmpptr) (const char *, const char *, size_t);
+		bool			(*prefix_eq) (const char *, size_t, const char *, size_t);
 
-		cmpptr = (curvar->flag & LVAR_INCASE) ? ltree_strncasecmp : strncmp;
+		prefix_eq = (curvar->flag & LVAR_INCASE) ? ltree_prefix_eq_ci : ltree_prefix_eq;
 
 		if (curvar->flag & LVAR_SUBLEXEME)
 		{
-			if (compare_subnode(curt, curvar->name, curvar->len, cmpptr,
+			if (compare_subnode(curt, curvar->name, curvar->len, prefix_eq,
 								(curvar->flag & LVAR_ANYEND)))
 				return success;
 		}
 		else if ((curvar->len == curt->len ||
 				  (curt->len > curvar->len && (curvar->flag & LVAR_ANYEND))) &&
-				 (*cmpptr) (curvar->name, curt->name, curvar->len) == 0)
+				 (*prefix_eq) (curvar->name, curvar->len, curt->name, curt->len))
 			return success;
 
 		curvar = LVAR_NEXT(curvar);
diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h
index 5e0761641d3..08199ceb588 100644
--- a/contrib/ltree/ltree.h
+++ b/contrib/ltree/ltree.h
@@ -208,9 +208,11 @@ bool		ltree_execute(ITEM *curitem, void *checkval,
 int			ltree_compare(const ltree *a, const ltree *b);
 bool		inner_isparent(const ltree *c, const ltree *p);
 bool		compare_subnode(ltree_level *t, char *qn, int len,
-							int (*cmpptr) (const char *, const char *, size_t), bool anyend);
+							bool (*prefix_eq) (const char *, size_t, const char *, size_t),
+							bool anyend);
 ltree	   *lca_inner(ltree **a, int len);
-int			ltree_strncasecmp(const char *a, const char *b, size_t s);
+bool		ltree_prefix_eq(const char *a, size_t a_sz, const char *b, size_t b_sz);
+bool		ltree_prefix_eq_ci(const char *a, size_t a_sz, const char *b, size_t b_sz);
 
 /* fmgr macros for ltree objects */
 #define DatumGetLtreeP(X)			((ltree *) PG_DETOAST_DATUM(X))
diff --git a/contrib/ltree/ltxtquery_op.c b/contrib/ltree/ltxtquery_op.c
index 002102c9c75..e3666a2d46e 100644
--- a/contrib/ltree/ltxtquery_op.c
+++ b/contrib/ltree/ltxtquery_op.c
@@ -58,19 +58,19 @@ checkcondition_str(void *checkval, ITEM *val)
 	ltree_level *level = LTREE_FIRST(((CHKVAL *) checkval)->node);
 	int			tlen = ((CHKVAL *) checkval)->node->numlevel;
 	char	   *op = ((CHKVAL *) checkval)->operand + val->distance;
-	int			(*cmpptr) (const char *, const char *, size_t);
+	bool		(*prefix_eq) (const char *, size_t, const char *, size_t);
 
-	cmpptr = (val->flag & LVAR_INCASE) ? ltree_strncasecmp : strncmp;
+	prefix_eq = (val->flag & LVAR_INCASE) ? ltree_prefix_eq_ci : ltree_prefix_eq;
 	while (tlen > 0)
 	{
 		if (val->flag & LVAR_SUBLEXEME)
 		{
-			if (compare_subnode(level, op, val->length, cmpptr, (val->flag & LVAR_ANYEND)))
+			if (compare_subnode(level, op, val->length, prefix_eq, (val->flag & LVAR_ANYEND)))
 				return true;
 		}
 		else if ((val->length == level->len ||
 				  (level->len > val->length && (val->flag & LVAR_ANYEND))) &&
-				 (*cmpptr) (op, level->name, val->length) == 0)
+				 (*prefix_eq) (op, val->length, level->name, level->len))
 			return true;
 
 		tlen--;
-- 
2.43.0

v11-0004-Remove-char_tolower-API.patchtext/x-patch; charset=UTF-8; name=v11-0004-Remove-char_tolower-API.patchDownload

From 509118852993a2b1132de7ee28d43143bcfcef11 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 18:16:41 -0800
Subject: [PATCH v11 4/9] Remove char_tolower() API.

It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/like.c           | 42 +++++++++-----------------
 src/backend/utils/adt/like_match.c     | 18 ++++++-----
 src/backend/utils/adt/pg_locale.c      | 26 ----------------
 src/backend/utils/adt/pg_locale_libc.c | 10 ------
 src/include/utils/pg_locale.h          |  9 ------
 5 files changed, 25 insertions(+), 80 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4216ac17f43..4a7fc583c71 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -43,8 +43,8 @@ static text *MB_do_like_escape(text *pat, text *esc);
 static int	UTF8_MatchText(const char *t, int tlen, const char *p, int plen,
 						   pg_locale_t locale);
 
-static int	SB_IMatchText(const char *t, int tlen, const char *p, int plen,
-						  pg_locale_t locale);
+static int	C_IMatchText(const char *t, int tlen, const char *p, int plen,
+						 pg_locale_t locale);
 
 static int	GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation);
 static int	Generic_Text_IC_like(text *str, text *pat, Oid collation);
@@ -84,22 +84,10 @@ wchareq(const char *p1, const char *p2)
  * of getting a single character transformed to the system's wchar_t format.
  * So now, we just downcase the strings using lower() and apply regular LIKE
  * comparison.  This should be revisited when we install better locale support.
- */
-
-/*
- * We do handle case-insensitive matching for single-byte encodings using
+ *
+ * We do handle case-insensitive matching for the C locale using
  * fold-on-the-fly processing, however.
  */
-static char
-SB_lower_char(unsigned char c, pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return pg_ascii_tolower(c);
-	else if (locale->is_default)
-		return pg_tolower(c);
-	else
-		return char_tolower(c, locale);
-}
 
 
 #define NextByte(p, plen)	((p)++, (plen)--)
@@ -131,9 +119,9 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 #include "like_match.c"
 
 /* setup to compile like_match.c for single byte case insensitive matches */
-#define MATCH_LOWER(t, locale) SB_lower_char((unsigned char) (t), locale)
+#define MATCH_LOWER
 #define NextChar(p, plen) NextByte((p), (plen))
-#define MatchText SB_IMatchText
+#define MatchText C_IMatchText
 
 #include "like_match.c"
 
@@ -202,22 +190,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
-	 * For efficiency reasons, in the single byte case we don't call lower()
-	 * on the pattern and text, but instead call SB_lower_char on each
-	 * character.  In the multi-byte case we don't have much choice :-(. Also,
-	 * ICU does not support single-character case folding, so we go the long
-	 * way.
+	 * For efficiency reasons, in the C locale we don't call lower() on the
+	 * pattern and text, but instead call SB_lower_char on each character.
 	 */
 
-	if (locale->ctype_is_c ||
-		(char_tolower_enabled(locale) &&
-		 pg_database_encoding_max_length() == 1))
+	if (locale->ctype_is_c)
 	{
 		p = VARDATA_ANY(pat);
 		plen = VARSIZE_ANY_EXHDR(pat);
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
+		return C_IMatchText(s, slen, p, plen, locale);
 	}
 	else
 	{
@@ -229,10 +212,13 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 													 PointerGetDatum(str)));
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
+
 		if (GetDatabaseEncoding() == PG_UTF8)
 			return UTF8_MatchText(s, slen, p, plen, 0);
-		else
+		else if (pg_database_encoding_max_length() > 1)
 			return MB_MatchText(s, slen, p, plen, 0);
+		else
+			return SB_MatchText(s, slen, p, plen, 0);
 	}
 }
 
diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index 892f8a745ea..54846c9541d 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -70,10 +70,14 @@
  *--------------------
  */
 
+/*
+ * MATCH_LOWER is defined for ILIKE in the C locale as an optimization. Other
+ * locales must casefold the inputs before matching.
+ */
 #ifdef MATCH_LOWER
-#define GETCHAR(t, locale) MATCH_LOWER(t, locale)
+#define GETCHAR(t) pg_ascii_tolower(t)
 #else
-#define GETCHAR(t, locale) (t)
+#define GETCHAR(t) (t)
 #endif
 
 static int
@@ -105,7 +109,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 						 errmsg("LIKE pattern must not end with escape character")));
-			if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+			if (GETCHAR(*p) != GETCHAR(*t))
 				return LIKE_FALSE;
 		}
 		else if (*p == '%')
@@ -167,14 +171,14 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 							 errmsg("LIKE pattern must not end with escape character")));
-				firstpat = GETCHAR(p[1], locale);
+				firstpat = GETCHAR(p[1]);
 			}
 			else
-				firstpat = GETCHAR(*p, locale);
+				firstpat = GETCHAR(*p);
 
 			while (tlen > 0)
 			{
-				if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
+				if (GETCHAR(*t) == firstpat || (locale && !locale->deterministic))
 				{
 					int			matched = MatchText(t, tlen, p, plen, locale);
 
@@ -342,7 +346,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					NextChar(t1, t1len);
 			}
 		}
-		else if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+		else if (GETCHAR(*p) != GETCHAR(*t))
 		{
 			/* non-wildcard pattern char fails to match text char */
 			return LIKE_FALSE;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b02e7fa4f18..5aba277ba99 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1629,32 +1629,6 @@ char_is_cased(char ch, pg_locale_t locale)
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
-/*
- * char_tolower_enabled()
- *
- * Does the provider support char_tolower()?
- */
-bool
-char_tolower_enabled(pg_locale_t locale)
-{
-	if (locale->ctype == NULL)
-		return true;
-	return (locale->ctype->char_tolower != NULL);
-}
-
-/*
- * char_tolower()
- *
- * Convert char (single-byte encoding) to lowercase.
- */
-char
-char_tolower(unsigned char ch, pg_locale_t locale)
-{
-	if (locale->ctype == NULL)
-		return pg_ascii_tolower(ch);
-	return locale->ctype->char_tolower(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 6ad3f93b543..91a892bb540 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -248,13 +248,6 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
-static char
-char_tolower_libc(unsigned char ch, pg_locale_t locale)
-{
-	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->lt);
-}
-
 static bool
 char_is_cased_libc(char ch, pg_locale_t locale)
 {
@@ -339,7 +332,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -365,7 +357,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -387,7 +378,6 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 42e21e7fb8a..50520e50127 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -127,13 +127,6 @@ struct ctype_methods
 
 	/* required */
 	bool		(*char_is_cased) (char ch, pg_locale_t locale);
-
-	/*
-	 * Optional. If defined, will only be called for single-byte encodings. If
-	 * not defined, or if the encoding is multibyte, will fall back to
-	 * pg_strlower().
-	 */
-	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
 };
 
 /*
@@ -185,8 +178,6 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 
 extern bool char_is_cased(char ch, pg_locale_t locale);
-extern bool char_tolower_enabled(pg_locale_t locale);
-extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.43.0

v11-0005-Add-pg_iswcased.patchtext/x-patch; charset=UTF-8; name=v11-0005-Add-pg_iswcased.patchDownload

From b6de7ad668d90d2c15568e8d0321f7b140c36e01 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 26 Nov 2025 10:28:36 -0800
Subject: [PATCH v11 5/9] Add pg_iswcased().

True if character has multiple case forms. Will be a useful
multibyte-aware replacement for char_is_cased().
---
 src/backend/utils/adt/pg_locale.c         | 11 +++++++++++
 src/backend/utils/adt/pg_locale_builtin.c |  7 +++++++
 src/backend/utils/adt/pg_locale_icu.c     |  7 +++++++
 src/backend/utils/adt/pg_locale_libc.c    | 17 +++++++++++++++++
 src/include/utils/pg_locale.h             |  2 ++
 5 files changed, 44 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5aba277ba99..e5e75ca2c2c 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1588,6 +1588,17 @@ pg_iswxdigit(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_isxdigit(wc, locale);
 }
 
+bool
+pg_iswcased(pg_wchar wc, pg_locale_t locale)
+{
+	/* for the C locale, Cased and Alpha are equivalent */
+	if (locale->ctype == NULL)
+		return (wc <= (pg_wchar) 127 &&
+				(pg_char_properties[wc] & PG_ISALPHA));
+	else
+		return locale->ctype->wc_iscased(wc, locale);
+}
+
 pg_wchar
 pg_towupper(pg_wchar wc, pg_locale_t locale)
 {
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 1021e0d129b..0d4c754a267 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -185,6 +185,12 @@ wc_isxdigit_builtin(pg_wchar wc, pg_locale_t locale)
 	return pg_u_isxdigit(to_char32(wc), !locale->builtin.casemap_full);
 }
 
+static bool
+wc_iscased_builtin(pg_wchar wc, pg_locale_t locale)
+{
+	return pg_u_prop_cased(to_char32(wc));
+}
+
 static bool
 char_is_cased_builtin(char ch, pg_locale_t locale)
 {
@@ -220,6 +226,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
 	.char_is_cased = char_is_cased_builtin,
+	.wc_iscased = wc_iscased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
 };
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index f5a0cc8fe41..e8820666b2d 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -223,6 +223,12 @@ wc_isxdigit_icu(pg_wchar wc, pg_locale_t locale)
 	return u_isxdigit(wc);
 }
 
+static bool
+wc_iscased_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_hasBinaryProperty(wc, UCHAR_CASED);
+}
+
 static const struct ctype_methods ctype_methods_icu = {
 	.strlower = strlower_icu,
 	.strtitle = strtitle_icu,
@@ -239,6 +245,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
 	.char_is_cased = char_is_cased_icu,
+	.wc_iscased = wc_iscased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
 };
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 91a892bb540..cd54198f0c7 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -184,6 +184,13 @@ wc_isxdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
+static bool
+wc_iscased_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->lt) ||
+		islower_l((unsigned char) wc, locale->lt);
+}
+
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
@@ -248,6 +255,13 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
+static bool
+wc_iscased_libc_mb(pg_wchar wc, pg_locale_t locale)
+{
+	return iswupper_l((wint_t) wc, locale->lt) ||
+		iswlower_l((wint_t) wc, locale->lt);
+}
+
 static bool
 char_is_cased_libc(char ch, pg_locale_t locale)
 {
@@ -332,6 +346,7 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -357,6 +372,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -378,6 +394,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_mb,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 50520e50127..832007385d8 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -122,6 +122,7 @@ struct ctype_methods
 	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isxdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_iscased) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
 
@@ -214,6 +215,7 @@ extern bool pg_iswprint(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswpunct(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswspace(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswxdigit(pg_wchar wc, pg_locale_t locale);
+extern bool pg_iswcased(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towupper(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towlower(pg_wchar wc, pg_locale_t locale);
 
-- 
2.43.0

#78

Jeff Davis

pgsql@j-davis.com

about 1 month ago

In reply to: Chao Li (#76)

Re: Remaining dependency on setlocale()

On Thu, 2025-11-27 at 09:08 +0800, Chao Li wrote:

On Nov 26, 2025, at 09:50, Chao Li <li.evan.chao@gmail.com> wrote:

I will review the rest 3 commits tomorrow.

10 - 0009

Just curious. As isaplha() and toupper() come from the same header
file ctype.h, if we replace toupper with pg_ascii_toupper, does
isapha also need to be handled?

OK.

What do you think about the change overall? Is fuzzystrmatch inherently
ASCII-based? Does it cause behavior changes aside from soundex()? Does
the behavior change in soundex() matter?

11 - 0010

I think assert both dstsize and len are redundant, because
dstsize=len+1, and no place to change their values.

OK.

What do you think of the change overall?

If (local == NULL || local->ctype == NULL)
Local = libc or other fallback;
Return default_locale->ctype->strfold_ident(dest, destsize, src,
srclen, local);

This way avoids the duplicate code.

OK. The fallback would still be ASCII though, right?

I just feel the GUC name is very misleading. Without carefully
reading the doc, users may very easy to consider lc_collate the
system’s locale. If it only affects extensions, then let’s name it
accordingly, for example, “extension_lc_collate”, or
“legacy_lc_collate”.

It is the system locale, it's just that we won't be using the system
locale for most purposes, so it has very little effect: PLs,
extensions, and libraries used by extensions that happen to rely on the
system locale. That is a bit confusing, which is why I previously just
set LC_COLLATE=C. This patch addresses Daniel's concern that people
might still want lc_collate set to something other than C. I'm not sure
we want this patch, it's just a POC.

I didn't attach a new series here yet, but will after some of the
earlier patches get committed.

Regards,
Jeff Davis

#79

Peter Eisentraut

peter@eisentraut.org

about 1 month ago

In reply to: Jeff Davis (#77)

Re: Remaining dependency on setlocale()

On 29.11.25 21:50, Jeff Davis wrote:

All fixed, thank you! (I apologize for posting a patch in that state to
begin with...)

I also reorganized slightly to separate out the pg_iswcased() API into
its own patch, and moved the like_support.c changes from the ctype_is_c
patch (already committed: 1476028225) into the pattern prefixes patch.

I reviewed the v11 patches. But I wasn't able to apply them locally
(couldn't find a starting commit where they applied cleanly), so I
haven't tested them.

Patches 0001 through 0006 seem generally ok, with some small comments:

v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch

The function comment reads "Check if b has a prefix of a." -- Is that
the same as "Check if a is a prefix of b."? The latter might be clearer.

v11-0004-Remove-char_tolower-API.patch

The updated comment reads

+        * For efficiency reasons, in the C locale we don't call lower() 
on the
+        * pattern and text, but instead call SB_lower_char on each 
character.

but the patch removes SB_lower_char().

v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch

Might have a small typo in the commit message:

; and preserve and char-at-a-time logic for bytea.

For the remaining patches I have some more substantial questions.

v11-0007-fuzzystrmatch-use-pg_ascii_toupper.patch

dmetaphone.c has a comment

case '\xc7': /* C with cedilla */

so the premise that "fuzzystrmatch is designed for ASCII" does not
appear to be correct. Needs more analysis.

(But apparently it's not multibyte aware at all, so I don't know what to
do about that.)

v11-0008-downcase_identifier-use-method-table-from-locale.patch

I'm confused here about the name of the function pg_strfold_ident(). In
general, case "folding" results in an opaque string that is really only
useful for comparing against other case-folded strings. But for
identifiers we are actually interested lower-casing. I think this
should be corrected in the API naming.

v11-0009-Control-LC_COLLATE-with-GUC.patch

I know there were some complaints about compatibility with extensions,
but I don't think anything concrete was presented. I would like to see
more evidence that we need this.

Also, recall that we used to have a lc_collate GUC, and in the end
people got confused that it didn't actually show a meaningful value when
you used ICU. So we removed that. It seems adding this back in would
create a similar kind of confusion. So to avoid that, maybe this should
be called fallback_lc_collate or something like that.

If we were to proceed with this patch, it should have some documentation
and tests.

#80

Jeff Davis

pgsql@j-davis.com

about 1 month ago

In reply to: Peter Eisentraut (#79)

8 attachment(s)

Re: Remaining dependency on setlocale()

On Fri, 2025-12-05 at 16:01 +0100, Peter Eisentraut wrote:

v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch

The function comment reads "Check if b has a prefix of a." -- Is that
the same as "Check if a is a prefix of b."? The latter might be
clearer.

Yes, fixed.

Note: I separated this into two patches. 0003 fixes the multibyte
mishandling issue, and 0004 consistently performs case folding. 0003 is
backpatchable, I believe.

but the patch removes SB_lower_char().

Fixed and committed.

v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch

Might have a small typo in the commit message:

; and preserve and char-at-a-time logic for bytea.

Fixed.

I also changed it into two functions: like_fixed_prefix(), which is
almost unchanged from the original; and like_fixed_prefix_ci(), which
is multibyte and locale-aware. It was too confusing to have single-byte
and multi-byte logic in the same function, and they didn't share much
code anyway.

case '\xc7': /* C with cedilla */

so the premise that "fuzzystrmatch is designed for ASCII" does not
appear to be correct. Needs more analysis.

(But apparently it's not multibyte aware at all, so I don't know what
to
do about that.)

I didn't notice that, thank you. Agreed, we need a bit more discussion
around this case as well as soundex().

v11-0008-downcase_identifier-use-method-table-from-locale.patch

I'm confused here about the name of the function pg_strfold_ident().
In
general, case "folding" results in an opaque string that is really
only
useful for comparing against other case-folded strings. But for
identifiers we are actually interested lower-casing. I think this
should be corrected in the API naming.

Agreed and fixed.

Also, I added 0006, which saves a locale_t object for ICU in this one
case where it's required. Surely that's not what we want in the long
term, but we don't have the infrastructure for decoding pg_wchar into
code points yet, and 0006 avoids the dependency on the global LC_CTYPE
setting.

v11-0009-Control-LC_COLLATE-with-GUC.patch

I know there were some complaints about compatibility with
extensions,
but I don't think anything concrete was presented. I would like to
see
more evidence that we need this.

Also, recall that we used to have a lc_collate GUC, and in the end
people got confused that it didn't actually show a meaningful value
when
you used ICU. So we removed that. It seems adding this back in
would
create a similar kind of confusion. So to avoid that, maybe this
should
be called fallback_lc_collate or something like that.

Yes, this is a POC patch and needs more discussion.

What are your thoughts about a similar lc_ctype GUC, though? That has
slightly different trade-offs.

I believe v12 0001-0005 are about ready for commit, and 0003 should be
backported.

Regards,
Jeff Davis

Attachments:

v12-0001-Use-multibyte-aware-extraction-of-pattern-prefix.patchtext/x-patch; charset=UTF-8; name=v12-0001-Use-multibyte-aware-extraction-of-pattern-prefix.patchDownload

From 779205c112bfbd1f89fc0edd9a4d7b932d21e15e Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 12 Dec 2025 09:44:37 -0800
Subject: [PATCH v12 1/8] Use multibyte-aware extraction of pattern prefixes.

Previously, like_fixed_prefix() used char-at-a-time logic, which
forced it to be too conservative for case-insensitive matching.

Introduce like_fixed_prefix_ci(), and use that for case-insensitive
pattern prefixes. It uses multibyte and locale-aware logic, along with
the new pg_iswcased() API introduced in 630706ced0.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/like_support.c | 169 ++++++++++++++++++---------
 1 file changed, 112 insertions(+), 57 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index dca1d9be035..007dd5b5a01 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -99,8 +99,6 @@ static Selectivity like_selectivity(const char *patt, int pattlen,
 static Selectivity regex_selectivity(const char *patt, int pattlen,
 									 bool case_insensitive,
 									 int fixed_prefix_len);
-static int	pattern_char_isalpha(char c, bool is_multibyte,
-								 pg_locale_t locale);
 static Const *make_greater_string(const Const *str_const, FmgrInfo *ltproc,
 								  Oid collation);
 static Datum string_to_datum(const char *str, Oid datatype);
@@ -986,8 +984,8 @@ icnlikejoinsel(PG_FUNCTION_ARGS)
  */
 
 static Pattern_Prefix_Status
-like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
-				  Const **prefix_const, Selectivity *rest_selec)
+like_fixed_prefix(Const *patt_const, Const **prefix_const,
+				  Selectivity *rest_selec)
 {
 	char	   *match;
 	char	   *patt;
@@ -995,34 +993,10 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 	Oid			typeid = patt_const->consttype;
 	int			pos,
 				match_pos;
-	bool		is_multibyte = (pg_database_encoding_max_length() > 1);
-	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
 	Assert(typeid == BYTEAOID || typeid == TEXTOID);
 
-	if (case_insensitive)
-	{
-		if (typeid == BYTEAOID)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("case insensitive matching not supported on type bytea")));
-
-		if (!OidIsValid(collation))
-		{
-			/*
-			 * This typically means that the parser could not resolve a
-			 * conflict of implicit collations, so report it that way.
-			 */
-			ereport(ERROR,
-					(errcode(ERRCODE_INDETERMINATE_COLLATION),
-					 errmsg("could not determine which collation to use for ILIKE"),
-					 errhint("Use the COLLATE clause to set the collation explicitly.")));
-		}
-
-		locale = pg_newlocale_from_collation(collation);
-	}
-
 	if (typeid != BYTEAOID)
 	{
 		patt = TextDatumGetCString(patt_const->constvalue);
@@ -1055,11 +1029,6 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				break;
 		}
 
-		/* Stop if case-varying character (it's sort of a wildcard) */
-		if (case_insensitive &&
-			pattern_char_isalpha(patt[pos], is_multibyte, locale))
-			break;
-
 		match[match_pos++] = patt[pos];
 	}
 
@@ -1071,8 +1040,7 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 		*prefix_const = string_to_bytea_const(match, match_pos);
 
 	if (rest_selec != NULL)
-		*rest_selec = like_selectivity(&patt[pos], pattlen - pos,
-									   case_insensitive);
+		*rest_selec = like_selectivity(&patt[pos], pattlen - pos, false);
 
 	pfree(patt);
 	pfree(match);
@@ -1087,6 +1055,112 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 	return Pattern_Prefix_None;
 }
 
+/*
+ * Case-insensitive variant of like_fixed_prefix().  Multibyte and
+ * locale-aware for detecting cased characters.
+ */
+static Pattern_Prefix_Status
+like_fixed_prefix_ci(Const *patt_const, Oid collation, Const **prefix_const,
+					 Selectivity *rest_selec)
+{
+	text	   *val = DatumGetTextPP(patt_const->constvalue);
+	Oid			typeid = patt_const->consttype;
+	int			nbytes = VARSIZE_ANY_EXHDR(val);
+	int			wpos;
+	pg_wchar   *wpatt;
+	int			wpattlen;
+	pg_wchar   *wmatch;
+	int			wmatch_pos = 0;
+	char	   *match;
+	int			match_mblen pg_attribute_unused();
+	pg_locale_t locale = 0;
+
+	/* the right-hand const is type text or bytea */
+	Assert(typeid == BYTEAOID || typeid == TEXTOID);
+
+	if (typeid == BYTEAOID)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("case insensitive matching not supported on type bytea")));
+
+	if (!OidIsValid(collation))
+	{
+		/*
+		 * This typically means that the parser could not resolve a conflict
+		 * of implicit collations, so report it that way.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_INDETERMINATE_COLLATION),
+				 errmsg("could not determine which collation to use for ILIKE"),
+				 errhint("Use the COLLATE clause to set the collation explicitly.")));
+	}
+
+	locale = pg_newlocale_from_collation(collation);
+
+	wpatt = palloc((nbytes + 1) * sizeof(pg_wchar));
+	wpattlen = pg_mb2wchar_with_len(VARDATA_ANY(val), wpatt, nbytes);
+
+	wmatch = palloc((nbytes + 1) * sizeof(pg_wchar));
+	for (wpos = 0; wpos < wpattlen; wpos++)
+	{
+		/* % and _ are wildcard characters in LIKE */
+		if (wpatt[wpos] == '%' ||
+			wpatt[wpos] == '_')
+			break;
+
+		/* Backslash escapes the next character */
+		if (wpatt[wpos] == '\\')
+		{
+			wpos++;
+			if (wpos >= wpattlen)
+				break;
+		}
+
+		/*
+		 * For ILIKE, stop if it's a case-varying character (it's sort of a
+		 * wildcard).
+		 */
+		if (pg_iswcased(wpatt[wpos], locale))
+			break;
+
+		wmatch[wmatch_pos++] = wpatt[wpos];
+	}
+
+	wmatch[wmatch_pos] = '\0';
+
+	match = palloc(pg_database_encoding_max_length() * wmatch_pos + 1);
+	match_mblen = pg_wchar2mb_with_len(wmatch, match, wmatch_pos);
+	match[match_mblen] = '\0';
+	pfree(wmatch);
+
+	*prefix_const = string_to_const(match, TEXTOID);
+	pfree(match);
+
+	if (rest_selec != NULL)
+	{
+		int			wrestlen = wpattlen - wmatch_pos;
+		char	   *rest;
+		int			rest_mblen;
+
+		rest = palloc(pg_database_encoding_max_length() * wrestlen + 1);
+		rest_mblen = pg_wchar2mb_with_len(&wpatt[wmatch_pos], rest, wrestlen);
+
+		*rest_selec = like_selectivity(rest, rest_mblen, true);
+		pfree(rest);
+	}
+
+	pfree(wpatt);
+
+	/* in LIKE, an empty pattern is an exact match! */
+	if (wpos == wpattlen)
+		return Pattern_Prefix_Exact;	/* reached end of pattern, so exact */
+
+	if (wmatch_pos > 0)
+		return Pattern_Prefix_Partial;
+
+	return Pattern_Prefix_None;
+}
+
 static Pattern_Prefix_Status
 regex_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				   Const **prefix_const, Selectivity *rest_selec)
@@ -1164,12 +1238,11 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
 	switch (ptype)
 	{
 		case Pattern_Type_Like:
-			result = like_fixed_prefix(patt, false, collation,
-									   prefix, rest_selec);
+			result = like_fixed_prefix(patt, prefix, rest_selec);
 			break;
 		case Pattern_Type_Like_IC:
-			result = like_fixed_prefix(patt, true, collation,
-									   prefix, rest_selec);
+			result = like_fixed_prefix_ci(patt, collation, prefix,
+										  rest_selec);
 			break;
 		case Pattern_Type_Regex:
 			result = regex_fixed_prefix(patt, false, collation,
@@ -1481,24 +1554,6 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
 	return sel;
 }
 
-/*
- * Check whether char is a letter (and, hence, subject to case-folding)
- *
- * In multibyte character sets or with ICU, we can't use isalpha, and it does
- * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha.
- * Instead, just assume any non-ASCII char is potentially case-varying, and
- * hard-wire knowledge of which ASCII chars are letters.
- */
-static int
-pattern_char_isalpha(char c, bool is_multibyte,
-					 pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else
-		return char_is_cased(c, locale);
-}
-
 
 /*
  * For bytea, the increment function need only increment the current byte
-- 
2.43.0

v12-0002-Remove-unused-single-byte-char_is_cased-API.patchtext/x-patch; charset=UTF-8; name=v12-0002-Remove-unused-single-byte-char_is_cased-API.patchDownload

From 48620dadcfeec2880575c963441bb1dd017802f0 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Fri, 12 Dec 2025 09:44:59 -0800
Subject: [PATCH v12 2/8] Remove unused single-byte char_is_cased() API.

https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c         | 15 ---------------
 src/backend/utils/adt/pg_locale_builtin.c |  8 --------
 src/backend/utils/adt/pg_locale_icu.c     |  8 --------
 src/backend/utils/adt/pg_locale_libc.c    | 14 --------------
 src/include/utils/pg_locale.h             |  3 ---
 5 files changed, 48 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 70933ee3843..8a3796aa5d0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1625,21 +1625,6 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_tolower(wc, locale);
 }
 
-/*
- * char_is_cased()
- *
- * Fuzzy test of whether the given char is case-varying or not. The argument
- * is a single byte, so in a multibyte encoding, just assume any non-ASCII
- * char is case-varying.
- */
-bool
-char_is_cased(char ch, pg_locale_t locale)
-{
-	if (locale->ctype == NULL)
-		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-	return locale->ctype->char_is_cased(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 0d4c754a267..0c2920112bb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -191,13 +191,6 @@ wc_iscased_builtin(pg_wchar wc, pg_locale_t locale)
 	return pg_u_prop_cased(to_char32(wc));
 }
 
-static bool
-char_is_cased_builtin(char ch, pg_locale_t locale)
-{
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-}
-
 static pg_wchar
 wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
 {
@@ -225,7 +218,6 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_ispunct = wc_ispunct_builtin,
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
-	.char_is_cased = char_is_cased_builtin,
 	.wc_iscased = wc_iscased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index e8820666b2d..18d026deda8 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,13 +121,6 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
-static bool
-char_is_cased_icu(char ch, pg_locale_t locale)
-{
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-}
-
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
  * UChar32, which is correct for the UTF-8 encoding, but not in general.
@@ -244,7 +237,6 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_ispunct = wc_ispunct_icu,
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
-	.char_is_cased = char_is_cased_icu,
 	.wc_iscased = wc_iscased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 3d841f818a5..3baa5816b5f 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -262,17 +262,6 @@ wc_iscased_libc_mb(pg_wchar wc, pg_locale_t locale)
 		iswlower_l((wint_t) wc, locale->lt);
 }
 
-static bool
-char_is_cased_libc(char ch, pg_locale_t locale)
-{
-	bool		is_multibyte = pg_database_encoding_max_length() > 1;
-
-	if (is_multibyte && IS_HIGHBIT_SET(ch))
-		return true;
-	else
-		return isalpha_l((unsigned char) ch, locale->lt);
-}
-
 static pg_wchar
 toupper_libc_sb(pg_wchar wc, pg_locale_t locale)
 {
@@ -345,7 +334,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
 	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
@@ -371,7 +359,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
 	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
@@ -393,7 +380,6 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_ispunct = wc_ispunct_libc_mb,
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
-	.char_is_cased = char_is_cased_libc,
 	.wc_iscased = wc_iscased_libc_mb,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 832007385d8..01f891def7a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -125,9 +125,6 @@ struct ctype_methods
 	bool		(*wc_iscased) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
-
-	/* required */
-	bool		(*char_is_cased) (char ch, pg_locale_t locale);
 };
 
 /*
-- 
2.43.0

v12-0003-Fix-multibyte-issue-in-ltree_strncasecmp.patchtext/x-patch; charset=UTF-8; name=v12-0003-Fix-multibyte-issue-in-ltree_strncasecmp.patchDownload

From f6923421824c4cdefb83b781a234ebfd562b86ed Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 4 Dec 2025 10:37:56 -0800
Subject: [PATCH v12 3/8] Fix multibyte issue in ltree_strncasecmp().

The API for ltree_strncasecmp() took two inputs but only one length
(that of the smaller input). It truncated the larger input to that
length, but that could break a multibyte sequence.

Refactor and rename to be a check for prefix equality (possibly
case-insensitive) instead, which is all that's needed by the
callers. Also, provide the lengths of both inputs.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
Backpatch-through: 14
---
 contrib/ltree/lquery_op.c    | 41 +++++++++++++++++++++++++-----------
 contrib/ltree/ltree.h        |  6 ++++--
 contrib/ltree/ltxtquery_op.c |  8 +++----
 3 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index a6466f575fd..d58e34769b8 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -41,7 +41,9 @@ getlexeme(char *start, char *end, int *len)
 }
 
 bool
-compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *, const char *, size_t), bool anyend)
+compare_subnode(ltree_level *t, char *qn, int len,
+				bool (*prefix_eq) (const char *, size_t, const char *, size_t),
+				bool anyend)
 {
 	char	   *endt = t->name + t->len;
 	char	   *endq = qn + len;
@@ -57,7 +59,7 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 		while ((tn = getlexeme(tn, endt, &lent)) != NULL)
 		{
 			if ((lent == lenq || (lent > lenq && anyend)) &&
-				(*cmpptr) (qn, tn, lenq) == 0)
+				(*prefix_eq) (qn, lenq, tn, lent))
 			{
 
 				isok = true;
@@ -74,14 +76,29 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 	return true;
 }
 
-int
-ltree_strncasecmp(const char *a, const char *b, size_t s)
+/*
+ * Check if 'a' is a prefix of 'b'.
+ */
+bool
+ltree_prefix_eq(const char *a, size_t a_sz, const char *b, size_t b_sz)
+{
+	if (a_sz > b_sz)
+		return false;
+	else
+		return (strncmp(a, b, a_sz) == 0);
+}
+
+/*
+ * Case-insensitive check if 'a' is a prefix of 'b'.
+ */
+bool
+ltree_prefix_eq_ci(const char *a, size_t a_sz, const char *b, size_t b_sz)
 {
-	char	   *al = str_tolower(a, s, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, s, DEFAULT_COLLATION_OID);
-	int			res;
+	char	   *al = str_tolower(a, a_sz, DEFAULT_COLLATION_OID);
+	char	   *bl = str_tolower(b, b_sz, DEFAULT_COLLATION_OID);
+	bool		res;
 
-	res = strncmp(al, bl, s);
+	res = (strncmp(al, bl, a_sz) == 0);
 
 	pfree(al);
 	pfree(bl);
@@ -109,19 +126,19 @@ checkLevel(lquery_level *curq, ltree_level *curt)
 
 	for (int i = 0; i < curq->numvar; i++)
 	{
-		int			(*cmpptr) (const char *, const char *, size_t);
+		bool		(*prefix_eq) (const char *, size_t, const char *, size_t);
 
-		cmpptr = (curvar->flag & LVAR_INCASE) ? ltree_strncasecmp : strncmp;
+		prefix_eq = (curvar->flag & LVAR_INCASE) ? ltree_prefix_eq_ci : ltree_prefix_eq;
 
 		if (curvar->flag & LVAR_SUBLEXEME)
 		{
-			if (compare_subnode(curt, curvar->name, curvar->len, cmpptr,
+			if (compare_subnode(curt, curvar->name, curvar->len, prefix_eq,
 								(curvar->flag & LVAR_ANYEND)))
 				return success;
 		}
 		else if ((curvar->len == curt->len ||
 				  (curt->len > curvar->len && (curvar->flag & LVAR_ANYEND))) &&
-				 (*cmpptr) (curvar->name, curt->name, curvar->len) == 0)
+				 (*prefix_eq) (curvar->name, curvar->len, curt->name, curt->len))
 			return success;
 
 		curvar = LVAR_NEXT(curvar);
diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h
index 5e0761641d3..08199ceb588 100644
--- a/contrib/ltree/ltree.h
+++ b/contrib/ltree/ltree.h
@@ -208,9 +208,11 @@ bool		ltree_execute(ITEM *curitem, void *checkval,
 int			ltree_compare(const ltree *a, const ltree *b);
 bool		inner_isparent(const ltree *c, const ltree *p);
 bool		compare_subnode(ltree_level *t, char *qn, int len,
-							int (*cmpptr) (const char *, const char *, size_t), bool anyend);
+							bool (*prefix_eq) (const char *, size_t, const char *, size_t),
+							bool anyend);
 ltree	   *lca_inner(ltree **a, int len);
-int			ltree_strncasecmp(const char *a, const char *b, size_t s);
+bool		ltree_prefix_eq(const char *a, size_t a_sz, const char *b, size_t b_sz);
+bool		ltree_prefix_eq_ci(const char *a, size_t a_sz, const char *b, size_t b_sz);
 
 /* fmgr macros for ltree objects */
 #define DatumGetLtreeP(X)			((ltree *) PG_DETOAST_DATUM(X))
diff --git a/contrib/ltree/ltxtquery_op.c b/contrib/ltree/ltxtquery_op.c
index 002102c9c75..e3666a2d46e 100644
--- a/contrib/ltree/ltxtquery_op.c
+++ b/contrib/ltree/ltxtquery_op.c
@@ -58,19 +58,19 @@ checkcondition_str(void *checkval, ITEM *val)
 	ltree_level *level = LTREE_FIRST(((CHKVAL *) checkval)->node);
 	int			tlen = ((CHKVAL *) checkval)->node->numlevel;
 	char	   *op = ((CHKVAL *) checkval)->operand + val->distance;
-	int			(*cmpptr) (const char *, const char *, size_t);
+	bool		(*prefix_eq) (const char *, size_t, const char *, size_t);
 
-	cmpptr = (val->flag & LVAR_INCASE) ? ltree_strncasecmp : strncmp;
+	prefix_eq = (val->flag & LVAR_INCASE) ? ltree_prefix_eq_ci : ltree_prefix_eq;
 	while (tlen > 0)
 	{
 		if (val->flag & LVAR_SUBLEXEME)
 		{
-			if (compare_subnode(level, op, val->length, cmpptr, (val->flag & LVAR_ANYEND)))
+			if (compare_subnode(level, op, val->length, prefix_eq, (val->flag & LVAR_ANYEND)))
 				return true;
 		}
 		else if ((val->length == level->len ||
 				  (level->len > val->length && (val->flag & LVAR_ANYEND))) &&
-				 (*cmpptr) (op, level->name, val->length) == 0)
+				 (*prefix_eq) (op, val->length, level->name, level->len))
 			return true;
 
 		tlen--;
-- 
2.43.0

v12-0004-Fix-inconsistency-between-ltree_strncasecmp-and-.patchtext/x-patch; charset=UTF-8; name=v12-0004-Fix-inconsistency-between-ltree_strncasecmp-and-.patchDownload

From 785323a138398465bb10e8ecdda5fef9cd19edd1 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Thu, 4 Dec 2025 10:38:11 -0800
Subject: [PATCH v12 4/8] Fix inconsistency between ltree_strncasecmp() and
 ltree_crc32_sz().

Previously, ltree_strncasecmp() used lowercasing with the default
collation; while ltree_crc32_sz used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding is single-byte.

Change both to use casefolding with the default collation.

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 contrib/ltree/crc32.c     | 46 ++++++++++++++++++++++++++++++++-------
 contrib/ltree/lquery_op.c | 39 ++++++++++++++++++++++++++++++---
 2 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..3918d4a0ec2 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -10,31 +10,61 @@
 #include "postgres.h"
 #include "ltree.h"
 
+#include "crc32.h"
+#include "utils/pg_crc.h"
 #ifdef LOWER_NODE
-#include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
-#else
-#define TOLOWER(x)	(x)
+#include "utils/pg_locale.h"
 #endif
 
-#include "crc32.h"
-#include "utils/pg_crc.h"
+#ifdef LOWER_NODE
 
 unsigned int
 ltree_crc32_sz(const char *buf, int size)
 {
 	pg_crc32	crc;
 	const char *p = buf;
+	static pg_locale_t locale = NULL;
+
+	if (!locale)
+		locale = pg_database_locale();
 
 	INIT_TRADITIONAL_CRC32(crc);
 	while (size > 0)
 	{
-		char		c = (char) TOLOWER(*p);
+		char		foldstr[UNICODE_CASEMAP_BUFSZ];
+		int			srclen = pg_mblen(p);
+		size_t		foldlen;
+
+		/* fold one codepoint at a time */
+		foldlen = pg_strfold(foldstr, UNICODE_CASEMAP_BUFSZ, p, srclen,
+							 locale);
+
+		COMP_TRADITIONAL_CRC32(crc, foldstr, foldlen);
+
+		size -= srclen;
+		p += srclen;
+	}
+	FIN_TRADITIONAL_CRC32(crc);
+	return (unsigned int) crc;
+}
+
+#else
 
-		COMP_TRADITIONAL_CRC32(crc, &c, 1);
+unsigned int
+ltree_crc32_sz(const char *buf, int size)
+{
+	pg_crc32	crc;
+	const char *p = buf;
+
+	INIT_TRADITIONAL_CRC32(crc);
+	while (size > 0)
+	{
+		COMP_TRADITIONAL_CRC32(crc, p, 1);
 		size--;
 		p++;
 	}
 	FIN_TRADITIONAL_CRC32(crc);
 	return (unsigned int) crc;
 }
+
+#endif							/* !LOWER_NODE */
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index d58e34769b8..8abd0de1a9c 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -94,11 +94,44 @@ ltree_prefix_eq(const char *a, size_t a_sz, const char *b, size_t b_sz)
 bool
 ltree_prefix_eq_ci(const char *a, size_t a_sz, const char *b, size_t b_sz)
 {
-	char	   *al = str_tolower(a, a_sz, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, b_sz, DEFAULT_COLLATION_OID);
+	static pg_locale_t locale = NULL;
+	size_t		al_sz = a_sz + 1;
+	size_t		al_len;
+	char	   *al = palloc(al_sz);
+	size_t		bl_sz = b_sz + 1;
+	size_t		bl_len;
+	char	   *bl = palloc(bl_sz);
 	bool		res;
 
-	res = (strncmp(al, bl, a_sz) == 0);
+	if (!locale)
+		locale = pg_database_locale();
+
+	/* casefold both a and b */
+
+	al_len = pg_strfold(al, al_sz, a, a_sz, locale);
+	if (al_len + 1 > al_sz)
+	{
+		/* grow buffer if needed and retry */
+		al_sz = al_len + 1;
+		al = repalloc(al, al_sz);
+		al_len = pg_strfold(al, al_sz, a, a_sz, locale);
+		Assert(al_len + 1 <= al_sz);
+	}
+
+	bl_len = pg_strfold(bl, bl_sz, b, b_sz, locale);
+	if (bl_len + 1 > bl_sz)
+	{
+		/* grow buffer if needed and retry */
+		bl_sz = bl_len + 1;
+		bl = repalloc(bl, bl_sz);
+		bl_len = pg_strfold(bl, bl_sz, b, b_sz, locale);
+		Assert(bl_len + 1 <= bl_sz);
+	}
+
+	if (al_len > bl_len)
+		res = false;
+	else
+		res = (strncmp(al, bl, al_len) == 0);
 
 	pfree(al);
 	pfree(bl);
-- 
2.43.0

v12-0005-downcase_identifier-use-method-table-from-locale.patchtext/x-patch; charset=UTF-8; name=v12-0005-downcase_identifier-use-method-table-from-locale.patchDownload

From 93c283ff78c32719e2a50f60efc829bb5998e9da Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 20 Oct 2025 16:32:18 -0700
Subject: [PATCH v12 5/8] downcase_identifier(): use method table from locale
 provider.

Previously, libc's tolower() was always used for lowercasing
identifiers, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
downcasing.

For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().

One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to downcase the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)

Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/parser/scansup.c              | 36 ++++++++---------------
 src/backend/utils/adt/pg_locale.c         | 20 +++++++++++++
 src/backend/utils/adt/pg_locale_builtin.c |  2 ++
 src/backend/utils/adt/pg_locale_icu.c     | 36 ++++++++++++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 33 +++++++++++++++++++++
 src/include/utils/pg_locale.h             |  5 ++++
 6 files changed, 107 insertions(+), 25 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..d63cb865260 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -46,35 +47,22 @@ char *
 downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 {
 	char	   *result;
-	int			i;
-	bool		enc_is_single_byte;
-
-	result = palloc(len + 1);
-	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	size_t		needed pg_attribute_unused();
 
 	/*
-	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
-	 * have the infrastructure for.  Instead we use tolower() to provide a
-	 * locale-aware translation.  However, there are some locales where this
-	 * is not right either (eg, Turkish may do strange things with 'i' and
-	 * 'I').  Our current compromise is to use tolower() for characters with
-	 * the high bit set, as long as they aren't part of a multi-byte
-	 * character, and use an ASCII-only downcasing for 7-bit characters.
+	 * Preserves string length.
+	 *
+	 * NB: if we decide to support Unicode-aware identifier case folding, then
+	 * we need to account for a change in string length.
 	 */
-	for (i = 0; i < len; i++)
-	{
-		unsigned char ch = (unsigned char) ident[i];
+	result = palloc(len + 1);
 
-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
-		result[i] = (char) ch;
-	}
-	result[i] = '\0';
+	needed = pg_downcase_ident(result, len + 1, ident, len);
+	Assert(needed == len);
+	Assert(result[len] == '\0');
 
-	if (i >= NAMEDATALEN && truncate)
-		truncate_identifier(result, i, warn);
+	if (len >= NAMEDATALEN && truncate)
+		truncate_identifier(result, len, warn);
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 8a3796aa5d0..ee08ac045b7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1352,6 +1352,26 @@ pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
+/*
+ * Lowercase an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifiers. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_downcase_ident(char *dst, size_t dstsize, const char *src, ssize_t srclen)
+{
+	pg_locale_t locale = default_locale;
+
+	if (locale == NULL || locale->ctype == NULL ||
+		locale->ctype->downcase_ident == NULL)
+		return strlower_c(dst, dstsize, src, srclen);
+	else
+		return locale->ctype->downcase_ident(dst, dstsize, src, srclen,
+											 locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 0c2920112bb..145b4641b1b 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -208,6 +208,8 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.strtitle = strtitle_builtin,
 	.strupper = strupper_builtin,
 	.strfold = strfold_builtin,
+	/* uses plain ASCII semantics for historical reasons */
+	.downcase_ident = NULL,
 	.wc_isdigit = wc_isdigit_builtin,
 	.wc_isalpha = wc_isalpha_builtin,
 	.wc_isalnum = wc_isalnum_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 18d026deda8..69f22b47a68 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -61,6 +61,8 @@ static size_t strupper_icu(char *dest, size_t destsize, const char *src,
 						   ssize_t srclen, pg_locale_t locale);
 static size_t strfold_icu(char *dest, size_t destsize, const char *src,
 						  ssize_t srclen, pg_locale_t locale);
+static size_t downcase_ident_icu(char *dst, size_t dstsize, const char *src,
+								 ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -123,7 +125,7 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
- * UChar32, which is correct for the UTF-8 encoding, but not in general.
+ * UChar32, which is correct for UTF-8 and LATIN1, but not in general.
  */
 
 static pg_wchar
@@ -227,6 +229,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 	.strfold = strfold_icu,
+	.downcase_ident = downcase_ident_icu,
 	.wc_isdigit = wc_isdigit_icu,
 	.wc_isalpha = wc_isalpha_icu,
 	.wc_isalnum = wc_isalnum_icu,
@@ -564,6 +567,37 @@ strfold_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
+/*
+ * For historical compatibility, behavior is not multibyte-aware.
+ *
+ * NB: uses libc tolower() for single-byte encodings (also for historical
+ * compatibility), and therefore relies on the global LC_CTYPE setting.
+ */
+static size_t
+downcase_ident_icu(char *dst, size_t dstsize, const char *src,
+				   ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+	bool		enc_is_single_byte;
+
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch = pg_ascii_tolower(ch);
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
+			ch = tolower(ch);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 /*
  * strncoll_icu_utf8
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 3baa5816b5f..ab6117aaace 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -318,12 +318,41 @@ tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 		return wc;
 }
 
+/*
+ * Characters A..Z always downcase to a..z, even in the Turkish
+ * locale. Characters beyond 127 use tolower().
+ */
+static size_t
+downcase_ident_libc_sb(char *dst, size_t dstsize, const char *src,
+					   ssize_t srclen, pg_locale_t locale)
+{
+	locale_t	loc = locale->lt;
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch = pg_ascii_tolower(ch);
+		else if (IS_HIGHBIT_SET(ch) && isupper_l(ch, loc))
+			ch = tolower_l(ch, loc);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_sb,
+	.downcase_ident = downcase_ident_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -349,6 +378,8 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	/* uses plain ASCII semantics for historical reasons */
+	.downcase_ident = NULL,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -370,6 +401,8 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	/* uses plain ASCII semantics for historical reasons */
+	.downcase_ident = NULL,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 01f891def7a..614affa1e91 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -110,6 +110,9 @@ struct ctype_methods
 	size_t		(*strfold) (char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+	size_t		(*downcase_ident) (char *dest, size_t destsize,
+								   const char *src, ssize_t srclen,
+								   pg_locale_t locale);
 
 	/* required */
 	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
@@ -188,6 +191,8 @@ extern size_t pg_strupper(char *dst, size_t dstsize,
 extern size_t pg_strfold(char *dst, size_t dstsize,
 						 const char *src, ssize_t srclen,
 						 pg_locale_t locale);
+extern size_t pg_downcase_ident(char *dst, size_t dstsize,
+								const char *src, ssize_t srclen);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
-- 
2.43.0

v12-0006-Avoid-global-LC_CTYPE-dependency-in-pg_locale_ic.patchtext/x-patch; charset=UTF-8; name=v12-0006-Avoid-global-LC_CTYPE-dependency-in-pg_locale_ic.patchDownload

From 82ce5b3d0ebea2b41806710ffe4aa2e1c5240861 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Sun, 26 Oct 2025 15:12:38 -0700
Subject: [PATCH v12 6/8] Avoid global LC_CTYPE dependency in pg_locale_icu.c.

ICU still depends on libc for compatibility with certain historical
behavior for single-byte encodings. Make the dependency explicit by
holding a locale_t object when required.

We should consider a better solution in the future, such as decoding
the text to UTF-32 and using u_tolower(). That would require
additional infrastructure though; so for now, just avoid the global
LC_CTYPE dependency.

https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale_icu.c | 47 ++++++++++++++++++++++++---
 src/include/utils/pg_locale.h         |  1 +
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 69f22b47a68..43d44fe43bd 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -244,6 +244,29 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
 };
+
+/*
+ * ICU still depends on libc for compatibility with certain historical
+ * behavior for single-byte encodings.  See downcase_ident_icu().
+ *
+ * XXX: consider fixing by decoding the single byte into a code point, and
+ * using u_tolower().
+ */
+static locale_t
+make_libc_ctype_locale(const char *ctype)
+{
+	locale_t	loc;
+
+#ifndef WIN32
+	loc = newlocale(LC_CTYPE_MASK, ctype, NULL);
+#else
+	loc = _create_locale(LC_ALL, ctype);
+#endif
+	if (!loc)
+		report_newlocale_failure(ctype);
+
+	return loc;
+}
 #endif
 
 pg_locale_t
@@ -254,6 +277,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	const char *iculocstr;
 	const char *icurules = NULL;
 	UCollator  *collator;
+	locale_t	loc = (locale_t) 0;
 	pg_locale_t result;
 
 	if (collid == DEFAULT_COLLATION_OID)
@@ -276,6 +300,18 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 		if (!isnull)
 			icurules = TextDatumGetCString(datum);
 
+		/* libc only needed for default locale and single-byte encoding */
+		if (pg_database_encoding_max_length() == 1)
+		{
+			const char *ctype;
+
+			datum = SysCacheGetAttrNotNull(DATABASEOID, tp,
+										   Anum_pg_database_datctype);
+			ctype = TextDatumGetCString(datum);
+
+			loc = make_libc_ctype_locale(ctype);
+		}
+
 		ReleaseSysCache(tp);
 	}
 	else
@@ -306,6 +342,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
 	result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
 	result->icu.locale = MemoryContextStrdup(context, iculocstr);
 	result->icu.ucol = collator;
+	result->icu.lt = loc;
 	result->deterministic = deterministic;
 	result->collate_is_c = false;
 	result->ctype_is_c = false;
@@ -578,17 +615,19 @@ downcase_ident_icu(char *dst, size_t dstsize, const char *src,
 				   ssize_t srclen, pg_locale_t locale)
 {
 	int			i;
-	bool		enc_is_single_byte;
+	bool		libc_lower;
+	locale_t	lt = locale->icu.lt;
+
+	libc_lower = lt && (pg_database_encoding_max_length() == 1);
 
-	enc_is_single_byte = pg_database_encoding_max_length() == 1;
 	for (i = 0; i < srclen && i < dstsize; i++)
 	{
 		unsigned char ch = (unsigned char) src[i];
 
 		if (ch >= 'A' && ch <= 'Z')
 			ch = pg_ascii_tolower(ch);
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
+		else if (libc_lower && IS_HIGHBIT_SET(ch) && isupper_l(ch, lt))
+			ch = tolower_l(ch, lt);
 		dst[i] = (char) ch;
 	}
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 614affa1e91..8ad8900cf93 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -167,6 +167,7 @@ struct pg_locale_struct
 		{
 			const char *locale;
 			UCollator  *ucol;
+			locale_t	lt;
 		}			icu;
 #endif
 	};
-- 
2.43.0

v12-0007-fuzzystrmatch-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v12-0007-fuzzystrmatch-use-pg_ascii_toupper.patchDownload

From 8bea39a2780283d4afdd75e0eb4a01b50d524faf Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 13:24:38 -0800
Subject: [PATCH v12 7/8] fuzzystrmatch: use pg_ascii_toupper().

fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.

TODO: what about \xc7 case? Also, what should the behavior be for
soundex()?

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 contrib/fuzzystrmatch/dmetaphone.c    |  2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 43 +++++++++++++++------------
 2 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 227d8b11ddc..5e8ee2b0354 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -284,7 +284,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = pg_ascii_toupper((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..319302af0e4 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -62,7 +62,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = pg_ascii_toupper((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -122,16 +122,21 @@ static const char _codes[26] = {
 static int
 getcode(char c)
 {
-	if (isalpha((unsigned char) c))
-	{
-		c = toupper((unsigned char) c);
-		/* Defend against non-ASCII letters */
-		if (c >= 'A' && c <= 'Z')
-			return _codes[c - 'A'];
-	}
+	c = pg_ascii_toupper((unsigned char) c);
+	/* Defend against non-ASCII letters */
+	if (c >= 'A' && c <= 'Z')
+		return _codes[c - 'A'];
+
 	return 0;
 }
 
+static bool
+ascii_isalpha(char c)
+{
+	return (c >= 'A' && c <= 'Z') ||
+		(c >= 'a' && c <= 'z');
+}
+
 #define isvowel(c)	(getcode(c) & 1)	/* AEIOU */
 
 /* These letters are passed through unchanged */
@@ -301,18 +306,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (pg_ascii_toupper((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (pg_ascii_toupper((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? pg_ascii_toupper((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? pg_ascii_toupper((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) pg_ascii_toupper((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -340,7 +345,7 @@ Lookahead(char *word, int how_far)
 #define Phone_Len	(p_idx)
 
 /* Note is a letter is a 'break' in the word */
-#define Isbreak(c)	(!isalpha((unsigned char) (c)))
+#define Isbreak(c)	(!ascii_isalpha((unsigned char) (c)))
 
 
 static void
@@ -379,7 +384,7 @@ _metaphone(char *word,			/* IN */
 
 	/*-- The first phoneme has to be processed specially. --*/
 	/* Find our first letter */
-	for (; !isalpha((unsigned char) (Curr_Letter)); w_idx++)
+	for (; !ascii_isalpha((unsigned char) (Curr_Letter)); w_idx++)
 	{
 		/* On the off chance we were given nothing but crap... */
 		if (Curr_Letter == '\0')
@@ -478,7 +483,7 @@ _metaphone(char *word,			/* IN */
 		 */
 
 		/* Ignore non-alphas */
-		if (!isalpha((unsigned char) (Curr_Letter)))
+		if (!ascii_isalpha((unsigned char) (Curr_Letter)))
 			continue;
 
 		/* Drop duplicates, except CC */
@@ -731,7 +736,7 @@ _soundex(const char *instr, char *outstr)
 	Assert(outstr);
 
 	/* Skip leading non-alphabetic characters */
-	while (*instr && !isalpha((unsigned char) *instr))
+	while (*instr && !ascii_isalpha((unsigned char) *instr))
 		++instr;
 
 	/* If no string left, return all-zeroes buffer */
@@ -742,12 +747,12 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) pg_ascii_toupper((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
 	{
-		if (isalpha((unsigned char) *instr) &&
+		if (ascii_isalpha((unsigned char) *instr) &&
 			soundex_code(*instr) != soundex_code(*(instr - 1)))
 		{
 			*outstr = soundex_code(*instr);
-- 
2.43.0

v12-0008-Control-LC_COLLATE-with-GUC.patchtext/x-patch; charset=UTF-8; name=v12-0008-Control-LC_COLLATE-with-GUC.patchDownload

From 68b05c20a68098613fbe6657ddb2b07c5ffd3d0b Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 14:00:52 -0800
Subject: [PATCH v12 8/8] Control LC_COLLATE with GUC.

Now that the global LC_COLLATE setting is not used for any in-core
purpose at all (see commit 5e6e42e44f), allow it to be set with a
GUC. This may be useful for extensions or procedural languages that
still depend on the global LC_COLLATE setting.

TODO: needs discussion

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 +++++++++++++++++++
 src/backend/utils/init/postinit.c             |  2 +
 src/backend/utils/misc/guc_parameters.dat     |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/bin/initdb/initdb.c                       |  3 +
 src/include/utils/guc_hooks.h                 |  2 +
 src/include/utils/pg_locale.h                 |  1 +
 7 files changed, 78 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index ee08ac045b7..6dfbe8af47b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -81,6 +81,7 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
+char	   *locale_collate;
 char	   *locale_messages;
 char	   *locale_monetary;
 char	   *locale_numeric;
@@ -369,6 +370,64 @@ assign_locale_time(const char *newval, void *extra)
 	CurrentLCTimeValid = false;
 }
 
+/*
+ * We allow LC_COLLATE to actually be set globally.
+ *
+ * Note: we normally disallow value = "" because it wouldn't have consistent
+ * semantics (it'd effectively just use the previous value).  However, this
+ * is the value passed for PGC_S_DEFAULT, so don't complain in that case,
+ * not even if the attempted setting fails due to invalid environment value.
+ * The idea there is just to accept the environment setting *if possible*
+ * during startup, until we can read the proper value from postgresql.conf.
+ */
+bool
+check_locale_collate(char **newval, void **extra, GucSource source)
+{
+	int			locale_enc;
+	int			db_enc;
+
+	if (**newval == '\0')
+	{
+		if (source == PGC_S_DEFAULT)
+			return true;
+		else
+			return false;
+	}
+
+	locale_enc = pg_get_encoding_from_locale(*newval, true);
+	db_enc = GetDatabaseEncoding();
+
+	if (!(locale_enc == db_enc ||
+		  locale_enc == PG_SQL_ASCII ||
+		  db_enc == PG_SQL_ASCII ||
+		  locale_enc == -1))
+	{
+		if (source == PGC_S_FILE)
+		{
+			guc_free(*newval);
+			*newval = guc_strdup(LOG, "C");
+			if (!*newval)
+				return false;
+		}
+		else if (source != PGC_S_TEST)
+		{
+			ereport(WARNING,
+					(errmsg("encoding mismatch"),
+					 errdetail("Locale \"%s\" uses encoding \"%s\", which does not match database encoding \"%s\".",
+							   *newval, pg_encoding_to_char(locale_enc), pg_encoding_to_char(db_enc))));
+			return false;
+		}
+	}
+
+	return check_locale(LC_COLLATE, *newval, NULL);
+}
+
+void
+assign_locale_collate(const char *newval, void *extra)
+{
+	(void) pg_perm_setlocale(LC_COLLATE, newval);
+}
+
 /*
  * We allow LC_MESSAGES to actually be set globally.
  *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4ed69ac7ba2..8586832acaa 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -404,6 +404,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	 * the pg_database tuple.
 	 */
 	SetDatabaseEncoding(dbform->encoding);
+	/* Reset lc_collate to check encoding, and fall back to C if necessary */
+	SetConfigOption("lc_collate", locale_collate, PGC_POSTMASTER, PGC_S_FILE);
 	/* Record it as a GUC internal option, too */
 	SetConfigOption("server_encoding", GetDatabaseEncodingName(),
 					PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..a36c680719f 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1457,6 +1457,15 @@
   boot_val => 'PG_KRB_SRVTAB',
 },
 
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
+  long_desc => 'An empty string means use the operating system setting.',
+  variable => 'locale_collate',
+  boot_val => '""',
+  check_hook => 'check_locale_collate',
+  assign_hook => 'assign_locale_collate',
+},
+
 { name => 'lc_messages', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
   short_desc => 'Sets the language in which messages are displayed.',
   long_desc => 'An empty string means use the operating system setting.',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..19332e39e82 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -798,6 +798,8 @@
                                         # encoding
 
 # These settings are initialized by initdb, but they can be changed.
+#lc_collate = ''                        # locale for text ordering (only affects
+                                        # extensions)
 #lc_messages = ''                       # locale for system error message
                                         # strings
 #lc_monetary = 'C'                      # locale for monetary formatting
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92fe2f531f7..8b2e7bfab6f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1312,6 +1312,9 @@ setup_config(void)
 	conflines = replace_guc_value(conflines, "shared_buffers",
 								  repltok, false);
 
+	conflines = replace_guc_value(conflines, "lc_collate",
+								  lc_collate, false);
+
 	conflines = replace_guc_value(conflines, "lc_messages",
 								  lc_messages, false);
 
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 82ac8646a8d..8a20f76eec8 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -65,6 +65,8 @@ extern bool check_huge_page_size(int *newval, void **extra, GucSource source);
 extern void assign_io_method(int newval, void *extra);
 extern bool check_io_max_concurrency(int *newval, void **extra, GucSource source);
 extern const char *show_in_hot_standby(void);
+extern bool check_locale_collate(char **newval, void **extra, GucSource source);
+extern void assign_locale_collate(const char *newval, void *extra);
 extern bool check_locale_messages(char **newval, void **extra, GucSource source);
 extern void assign_locale_messages(const char *newval, void *extra);
 extern bool check_locale_monetary(char **newval, void **extra, GucSource source);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 8ad8900cf93..e29497dc7d2 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -41,6 +41,7 @@
 #define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
 
 /* GUC settings */
+extern PGDLLIMPORT char *locale_collate;
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
 extern PGDLLIMPORT char *locale_numeric;
-- 
2.43.0

#81

Chao Li

li.evan.chao@gmail.com

about 1 month ago

In reply to: Jeff Davis (#80)

Re: Remaining dependency on setlocale()

On Dec 13, 2025, at 04:11, Jeff Davis <pgsql@j-davis.com> wrote:

I believe v12 0001-0005 are about ready for commit, and 0003 should be
backported.

I quickly went through 0001-0005, and got a few nitpicks:

1 - 0001
```
+ int match_mblen pg_attribute_unused();
```

Why this variable is marked unused? It’s actually used.

2 - 0002

I did a search and found one place that you missed at line 181 in pg_locale.h
```
extern bool char_is_cased(char ch, pg_locale_t locale);
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#82

Jeff Davis

pgsql@j-davis.com

28 days ago

In reply to: Chao Li (#81)

Re: Remaining dependency on setlocale()

On Sat, 2025-12-13 at 17:48 +0800, Chao Li wrote:

1 - 0001
```
+ int match_mblen pg_attribute_unused();
```

Why this variable is marked unused? It’s actually used.

Fixed and committed.

I originally marked it unused because I just had an:

Assert(match[match_mblen] == '\0');

but I changed it to just set it:

match[match_mblen] = '\0';

for clarity, even though I think the underlying API guarantees NUL-
termination.

2 - 0002

I did a search and found one place that you missed at line 181 in
pg_locale.h
```
extern bool char_is_cased(char ch, pg_locale_t locale);
```

Thank you, fixed and committed.

I'll continue committing v12 0003-0005. Note that I'm planning to
backport 0003.

I'm also inclined to move forward with 0006. It's not a long-term
solution, but it solves an instance of the problem under discussion in
this thread (removes setlocale() dependency) and is a narrow fix.

After that, remaining loose ends:

* 0007: Can we just rip out the non-ASCII support? If it's gone all
this time without UTF8 support, and there's no suggestion about what
the non-ASCII behavior should be (in docs or source code), then it's
probably not terribly important.

* Use of pg_strncasecmp() in the backend, which uses tolower() to do
case-insensitive matching of command options, e.g. "if
(pg_strcasecmp(a, "plain") == 0)" to check for PLAIN storage in CREATE
TYPE.

* strerror(): either consider an lc_ctype GUC, or the approach over
here[1]/messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com.

* Daniel suggests that we need some way to set LC_COLLATE for
extensions/dependencies.

* address calls to pg_tolower in datetime.c and tzparser.c -- can those
just be pg_ascii_tolower()?

* examine remaining isalpha(), etc., calls in the backend

Regards,
Jeff Davis

[1]: /messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com
/messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com

#83

Chao Li

li.evan.chao@gmail.com

28 days ago

In reply to: Jeff Davis (#82)

Re: Remaining dependency on setlocale()

On Dec 16, 2025, at 03:34, Jeff Davis <pgsql@j-davis.com> wrote:

On Sat, 2025-12-13 at 17:48 +0800, Chao Li wrote:

1 - 0001
```
+ int match_mblen pg_attribute_unused();
```

Why this variable is marked unused? It’s actually used.

Fixed and committed.

I originally marked it unused because I just had an:

Assert(match[match_mblen] == '\0');

but I changed it to just set it:

match[match_mblen] = '\0';

for clarity, even though I think the underlying API guarantees NUL-
termination.

2 - 0002

I did a search and found one place that you missed at line 181 in
pg_locale.h
```
extern bool char_is_cased(char ch, pg_locale_t locale);
```

Thank you, fixed and committed.

I'll continue committing v12 0003-0005. Note that I'm planning to
backport 0003.

I have re-reviewed 0003-0005 last week, they all look good to me.

I have no comment on backport 0003.

I'm also inclined to move forward with 0006. It's not a long-term
solution, but it solves an instance of the problem under discussion in
this thread (removes setlocale() dependency) and is a narrow fix.

After that, remaining loose ends:

* 0007: Can we just rip out the non-ASCII support? If it's gone all
this time without UTF8 support, and there's no suggestion about what
the non-ASCII behavior should be (in docs or source code), then it's
probably not terribly important.

* Use of pg_strncasecmp() in the backend, which uses tolower() to do
case-insensitive matching of command options, e.g. "if
(pg_strcasecmp(a, "plain") == 0)" to check for PLAIN storage in CREATE
TYPE.

* strerror(): either consider an lc_ctype GUC, or the approach over
here[1].

* Daniel suggests that we need some way to set LC_COLLATE for
extensions/dependencies.

* address calls to pg_tolower in datetime.c and tzparser.c -- can those
just be pg_ascii_tolower()?

* examine remaining isalpha(), etc., calls in the backend

Regards,
Jeff Davis

[1]
/messages/by-id/90f176c5b85b9da26a3265b2630ece3552068566.camel@j-davis.com

I just reviewed 0006-0007. Only got one comment on 0006:

```
@@ -306,6 +342,7 @@ create_pg_locale_icu(Oid collid, MemoryContext context)
result = MemoryContextAllocZero(context, sizeof(struct pg_locale_struct));
result->icu.locale = MemoryContextStrdup(context, iculocstr);
result->icu.ucol = collator;
+ result->icu.lt = loc;
```

The old code didn’t create a locale object and store in result, thus it didn’t have a logic to free the created locale. This patch now dose that, but I don’t see where the created locale object is free-ed. I suppose newlocale() will allocate memory from the OS, so I guess the memory should be free-ed somewhere.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#84

Jeff Davis

pgsql@j-davis.com

27 days ago

In reply to: Chao Li (#83)

Re: Remaining dependency on setlocale()

On Tue, 2025-12-16 at 09:32 +0800, Chao Li wrote:

I have re-reviewed 0003-0005 last week, they all look good to me.

I have no comment on backport 0003.

Committed 0003 and backported to 14.

Committing 0004 also. For the archives, the bug in that case is:

-- generate some randomly-cased non-ASCII data
CREATE DATABASE i TEMPLATE template0 LOCALE 'C'
LOCALE_PROVIDER 'icu' ICU_LOCALE 'en';
\c i
CREATE EXTENSION ltree;
CREATE TABLE test(path ltree);
CREATE FUNCTION gen() RETURNS TEXT LANGUAGE plpgsql AS $$
declare
s TEXT;
begin
s := '';
for i in 1..5 loop
s := s || case when random() > 0.5 then lower(U&'\00C1') else
U&'\00C1' end;
s := s || case when random() > 0.5 then lower(U&'\00C9') else
U&'\00C9' end;
s := s || case when random() > 0.5 then lower(U&'\00CD') else
U&'\00CD' end;
s := s || case when random() > 0.5 then lower(U&'\00D3') else
U&'\00D3' end;
s := s || case when random() > 0.5 then lower(U&'\00DA') else
U&'\00DA' end;
end loop;
return s;
end;
$$;
INSERT INTO test select ('a.'||gen()||'.z')::ltree
FROM generate_series(1,10000);
CREATE INDEX test_idx ON test USING gist (path);
-- returns 10000
SET enable_seqscan = true;
SET enable_indexscan = false;
SET enable_bitmapscan = false;
SELECT COUNT(*) FROM test
WHERE path ~ U&'a.áéíóúáéíóúáéíóúáéíóúáéíóú@.z'::lquery;
-- returns fewer tuples when using index scan
SET enable_seqscan = false;
SET enable_indexscan = true;
SET enable_bitmapscan = true;
SELECT COUNT(*) FROM test
WHERE path ~ U&'a.áéíóúáéíóúáéíóúáéíóúáéíóú@.z'::lquery;

Probably a smaller case would do, but I think it requires page splits
to hit the bug. 0004 fixes the bug.

The old code didn’t create a locale object and store in result, thus
it didn’t have a logic to free the created locale. This patch now
dose that, but I don’t see where the created locale object is free-
ed. I suppose newlocale() will allocate memory from the OS, so I
guess the memory should be free-ed somewhere.

The pg_locale_t objects are cached for the life of the backend, and
never freed. We may want to change that eventually, but in practice
it's not much of a problem.

Regards,
Jeff Davis

#85

Jeff Davis

pgsql@j-davis.com

27 days ago

In reply to: Jeff Davis (#84)

Re: Remaining dependency on setlocale()

On Tue, 2025-12-16 at 12:04 -0800, Jeff Davis wrote:

Probably a smaller case would do, but I think it requires page splits
to hit the bug. 0004 fixes the bug.

Because it's a clear bug, I elected to backport 0004 to v18, where the
casefolding APIs were introduced. It's a bug before that as well, but
backporting further would be more complex and/or invasive.

Regards,
Jeff Davis

#86

Peter Eisentraut

peter@eisentraut.org

26 days ago

In reply to: Jeff Davis (#80)

Re: Remaining dependency on setlocale()

On 12.12.25 21:11, Jeff Davis wrote:

case '\xc7': /* C with cedilla */

so the premise that "fuzzystrmatch is designed for ASCII" does not
appear to be correct. Needs more analysis.

(But apparently it's not multibyte aware at all, so I don't know what
to
do about that.)

I didn't notice that, thank you. Agreed, we need a bit more discussion
around this case as well as soundex().

Soundex is an ASCII-only algorithm, there is no expectation that the
algorithm does anything useful with non-ASCII characters, and it doesn't
do so now. So I think using pg_ascii_toupper() is ok. (Users could for
example use unaccent to preprocess text.)

One might wonder if the presence of non-ASCII characters should be an
error, but that doesn't have to be the subject of this thread. I
noticed that the Wikipedia page for Soundex even calls out PostgreSQL
for doing things slightly different than everyone else, but I haven't
studied the details.

For Metaphone, I found the reference implementation linked from its
Wikipedia page, and it looks like our implementation is pretty closely
aligned to that. That reference implementation also contains the
C-with-cedilla case explicitly. The correct fix here would probably be
to change the implementation to work on wide characters. But I think
for the moment you could try a shortcut like, use pg_ascii_toupper(),
but if the encoding is LATIN1 (or LATIN9 or whichever other encodings
also contain C-with-cedilla at that code point), then explicitly
uppercase that one as well. This would preserve the existing behavior.

Note that the documentation calls out: "At present, the soundex,
metaphone, dmetaphone, and dmetaphone_alt functions do not work well
with multibyte encodings (such as UTF-8)."

#87

Jeff Davis

pgsql@j-davis.com

20 days ago

In reply to: Peter Eisentraut (#86)

2 attachment(s)

Re: Remaining dependency on setlocale()

On Wed, 2025-12-17 at 11:39 +0100, Peter Eisentraut wrote:

For Metaphone, I found the reference implementation linked from its
Wikipedia page, and it looks like our implementation is pretty
closely
aligned to that. That reference implementation also contains the
C-with-cedilla case explicitly. The correct fix here would probably
be
to change the implementation to work on wide characters. But I think
for the moment you could try a shortcut like, use pg_ascii_toupper(),
but if the encoding is LATIN1 (or LATIN9 or whichever other encodings
also contain C-with-cedilla at that code point), then explicitly
uppercase that one as well. This would preserve the existing
behavior.

Done, attached new patches.

Interestingly, WIN1256 encodes only the SMALL LETTER C WITH CEDILLA. I
think, for the purposes here, we can still consider it to "uppercase"
to \xc7, so that it can still be treated as the same sound. Technically
I think that would be an improvement over the current code in this edge
case, and suggests that case folding would be a better approach than
uppercasing.

Regards,
Jeff Davis

Attachments:

v13-0001-fuzzystrmatch-use-pg_ascii_toupper.patchtext/x-patch; charset=UTF-8; name=v13-0001-fuzzystrmatch-use-pg_ascii_toupper.patchDownload

From 8161ca49ae2044e004d3f36c04f60b03e97f4071 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Wed, 19 Nov 2025 13:24:38 -0800
Subject: [PATCH v13 1/2] fuzzystrmatch: use pg_ascii_toupper().

fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.

TODO: what about \xc7 case? Also, what should the behavior be for
soundex()?

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 contrib/fuzzystrmatch/dmetaphone.c    | 45 +++++++++++++++++++++++++--
 contrib/fuzzystrmatch/fuzzystrmatch.c | 43 ++++++++++++++-----------
 2 files changed, 67 insertions(+), 21 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 227d8b11ddc..9a4e5ae7e0e 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -98,6 +98,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 
 #include "postgres.h"
 
+#include "mb/pg_wchar.h"
 #include "utils/builtins.h"
 
 /* turn off assertions for embedded function */
@@ -116,6 +117,9 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include <assert.h>
 #include <ctype.h>
 
+#define SMALL_LETTER_C_WITH_CEDILLA		'\xe7'
+#define CAPITAL_LETTER_C_WITH_CEDILLA	'\xc7'
+
 /* prototype for the main function we got from the perl module */
 static void DoubleMetaphone(char *str, char **codes);
 
@@ -282,9 +286,46 @@ static void
 MakeUpper(metastring *s)
 {
 	char	   *i;
+	bool		c_with_cedilla;
+
+	/*
+	 * C WITH CEDILLA should be uppercased, as well.
+	 *
+	 * XXX: Only works in single-byte encodings that encode lowercase C WITH
+	 * CEDILLA as \xe7. Should have proper multibyte support.
+	 *
+	 * NB: WIN1256 encodes only the lowercase C WITH CEDILLA, but for the
+	 * purposes of metaphone, we can still "uppercase" it to \xc7 here so that
+	 * it's recognized later.
+	 */
+	switch (GetDatabaseEncoding())
+	{
+		case PG_LATIN1:
+		case PG_LATIN2:
+		case PG_LATIN3:
+		case PG_LATIN5:
+		case PG_LATIN8:
+		case PG_LATIN9:
+		case PG_LATIN10:
+		case PG_WIN1250:
+		case PG_WIN1252:
+		case PG_WIN1254:
+		case PG_WIN1256:
+		case PG_WIN1258:
+			c_with_cedilla = true;
+			break;
+		default:
+			c_with_cedilla = false;
+			break;
+	}
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+	{
+		if (c_with_cedilla && *i == SMALL_LETTER_C_WITH_CEDILLA)
+			*i = CAPITAL_LETTER_C_WITH_CEDILLA;
+		else
+			*i = pg_ascii_toupper((unsigned char) *i);
+	}
 }
 
 
@@ -463,7 +504,7 @@ DoubleMetaphone(char *str, char **codes)
 					current += 1;
 				break;
 
-			case '\xc7':		/* C with cedilla */
+			case CAPITAL_LETTER_C_WITH_CEDILLA:
 				MetaphAdd(primary, "S");
 				MetaphAdd(secondary, "S");
 				current += 1;
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..319302af0e4 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -62,7 +62,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = pg_ascii_toupper((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -122,16 +122,21 @@ static const char _codes[26] = {
 static int
 getcode(char c)
 {
-	if (isalpha((unsigned char) c))
-	{
-		c = toupper((unsigned char) c);
-		/* Defend against non-ASCII letters */
-		if (c >= 'A' && c <= 'Z')
-			return _codes[c - 'A'];
-	}
+	c = pg_ascii_toupper((unsigned char) c);
+	/* Defend against non-ASCII letters */
+	if (c >= 'A' && c <= 'Z')
+		return _codes[c - 'A'];
+
 	return 0;
 }
 
+static bool
+ascii_isalpha(char c)
+{
+	return (c >= 'A' && c <= 'Z') ||
+		(c >= 'a' && c <= 'z');
+}
+
 #define isvowel(c)	(getcode(c) & 1)	/* AEIOU */
 
 /* These letters are passed through unchanged */
@@ -301,18 +306,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (pg_ascii_toupper((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (pg_ascii_toupper((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? pg_ascii_toupper((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? pg_ascii_toupper((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) pg_ascii_toupper((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -340,7 +345,7 @@ Lookahead(char *word, int how_far)
 #define Phone_Len	(p_idx)
 
 /* Note is a letter is a 'break' in the word */
-#define Isbreak(c)	(!isalpha((unsigned char) (c)))
+#define Isbreak(c)	(!ascii_isalpha((unsigned char) (c)))
 
 
 static void
@@ -379,7 +384,7 @@ _metaphone(char *word,			/* IN */
 
 	/*-- The first phoneme has to be processed specially. --*/
 	/* Find our first letter */
-	for (; !isalpha((unsigned char) (Curr_Letter)); w_idx++)
+	for (; !ascii_isalpha((unsigned char) (Curr_Letter)); w_idx++)
 	{
 		/* On the off chance we were given nothing but crap... */
 		if (Curr_Letter == '\0')
@@ -478,7 +483,7 @@ _metaphone(char *word,			/* IN */
 		 */
 
 		/* Ignore non-alphas */
-		if (!isalpha((unsigned char) (Curr_Letter)))
+		if (!ascii_isalpha((unsigned char) (Curr_Letter)))
 			continue;
 
 		/* Drop duplicates, except CC */
@@ -731,7 +736,7 @@ _soundex(const char *instr, char *outstr)
 	Assert(outstr);
 
 	/* Skip leading non-alphabetic characters */
-	while (*instr && !isalpha((unsigned char) *instr))
+	while (*instr && !ascii_isalpha((unsigned char) *instr))
 		++instr;
 
 	/* If no string left, return all-zeroes buffer */
@@ -742,12 +747,12 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) pg_ascii_toupper((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
 	{
-		if (isalpha((unsigned char) *instr) &&
+		if (ascii_isalpha((unsigned char) *instr) &&
 			soundex_code(*instr) != soundex_code(*(instr - 1)))
 		{
 			*outstr = soundex_code(*instr);
-- 
2.43.0

v13-0002-Control-LC_COLLATE-with-GUC.patchtext/x-patch; charset=UTF-8; name=v13-0002-Control-LC_COLLATE-with-GUC.patchDownload

From 5d8d22077aaa6b7365c52b016ad0e22296b68b05 Mon Sep 17 00:00:00 2001
From: Jeff Davis <jeff@j-davis.com>
Date: Mon, 24 Nov 2025 14:00:52 -0800
Subject: [PATCH v13 2/2] Control LC_COLLATE with GUC.

Now that the global LC_COLLATE setting is not used for any in-core
purpose at all (see commit 5e6e42e44f), allow it to be set with a
GUC. This may be useful for extensions or procedural languages that
still depend on the global LC_COLLATE setting.

TODO: needs discussion

Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 +++++++++++++++++++
 src/backend/utils/init/postinit.c             |  2 +
 src/backend/utils/misc/guc_parameters.dat     |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/bin/initdb/initdb.c                       |  3 +
 src/include/utils/guc_hooks.h                 |  2 +
 src/include/utils/pg_locale.h                 |  1 +
 7 files changed, 78 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index ee08ac045b7..6dfbe8af47b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -81,6 +81,7 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
+char	   *locale_collate;
 char	   *locale_messages;
 char	   *locale_monetary;
 char	   *locale_numeric;
@@ -369,6 +370,64 @@ assign_locale_time(const char *newval, void *extra)
 	CurrentLCTimeValid = false;
 }
 
+/*
+ * We allow LC_COLLATE to actually be set globally.
+ *
+ * Note: we normally disallow value = "" because it wouldn't have consistent
+ * semantics (it'd effectively just use the previous value).  However, this
+ * is the value passed for PGC_S_DEFAULT, so don't complain in that case,
+ * not even if the attempted setting fails due to invalid environment value.
+ * The idea there is just to accept the environment setting *if possible*
+ * during startup, until we can read the proper value from postgresql.conf.
+ */
+bool
+check_locale_collate(char **newval, void **extra, GucSource source)
+{
+	int			locale_enc;
+	int			db_enc;
+
+	if (**newval == '\0')
+	{
+		if (source == PGC_S_DEFAULT)
+			return true;
+		else
+			return false;
+	}
+
+	locale_enc = pg_get_encoding_from_locale(*newval, true);
+	db_enc = GetDatabaseEncoding();
+
+	if (!(locale_enc == db_enc ||
+		  locale_enc == PG_SQL_ASCII ||
+		  db_enc == PG_SQL_ASCII ||
+		  locale_enc == -1))
+	{
+		if (source == PGC_S_FILE)
+		{
+			guc_free(*newval);
+			*newval = guc_strdup(LOG, "C");
+			if (!*newval)
+				return false;
+		}
+		else if (source != PGC_S_TEST)
+		{
+			ereport(WARNING,
+					(errmsg("encoding mismatch"),
+					 errdetail("Locale \"%s\" uses encoding \"%s\", which does not match database encoding \"%s\".",
+							   *newval, pg_encoding_to_char(locale_enc), pg_encoding_to_char(db_enc))));
+			return false;
+		}
+	}
+
+	return check_locale(LC_COLLATE, *newval, NULL);
+}
+
+void
+assign_locale_collate(const char *newval, void *extra)
+{
+	(void) pg_perm_setlocale(LC_COLLATE, newval);
+}
+
 /*
  * We allow LC_MESSAGES to actually be set globally.
  *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index b7e94ca45bd..eee0b971590 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -404,6 +404,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	 * the pg_database tuple.
 	 */
 	SetDatabaseEncoding(dbform->encoding);
+	/* Reset lc_collate to check encoding, and fall back to C if necessary */
+	SetConfigOption("lc_collate", locale_collate, PGC_POSTMASTER, PGC_S_FILE);
 	/* Record it as a GUC internal option, too */
 	SetConfigOption("server_encoding", GetDatabaseEncodingName(),
 					PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index ac0c7c36c56..cf7675aa2bb 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1466,6 +1466,15 @@
   boot_val => 'PG_KRB_SRVTAB',
 },
 
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
+  long_desc => 'An empty string means use the operating system setting.',
+  variable => 'locale_collate',
+  boot_val => '""',
+  check_hook => 'check_locale_collate',
+  assign_hook => 'assign_locale_collate',
+},
+
 { name => 'lc_messages', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
   short_desc => 'Sets the language in which messages are displayed.',
   long_desc => 'An empty string means use the operating system setting.',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..19332e39e82 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -798,6 +798,8 @@
                                         # encoding
 
 # These settings are initialized by initdb, but they can be changed.
+#lc_collate = ''                        # locale for text ordering (only affects
+                                        # extensions)
 #lc_messages = ''                       # locale for system error message
                                         # strings
 #lc_monetary = 'C'                      # locale for monetary formatting
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92fe2f531f7..8b2e7bfab6f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1312,6 +1312,9 @@ setup_config(void)
 	conflines = replace_guc_value(conflines, "shared_buffers",
 								  repltok, false);
 
+	conflines = replace_guc_value(conflines, "lc_collate",
+								  lc_collate, false);
+
 	conflines = replace_guc_value(conflines, "lc_messages",
 								  lc_messages, false);
 
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index fbe0b1e2e3d..f3bfc8dfb7e 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -66,6 +66,8 @@ extern bool check_huge_page_size(int *newval, void **extra, GucSource source);
 extern void assign_io_method(int newval, void *extra);
 extern bool check_io_max_concurrency(int *newval, void **extra, GucSource source);
 extern const char *show_in_hot_standby(void);
+extern bool check_locale_collate(char **newval, void **extra, GucSource source);
+extern void assign_locale_collate(const char *newval, void *extra);
 extern bool check_locale_messages(char **newval, void **extra, GucSource source);
 extern void assign_locale_messages(const char *newval, void *extra);
 extern bool check_locale_monetary(char **newval, void **extra, GucSource source);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 86016b9344e..096ea1e4963 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -41,6 +41,7 @@
 #define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * MAX_MULTIBYTE_CHAR_LEN)
 
 /* GUC settings */
+extern PGDLLIMPORT char *locale_collate;
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
 extern PGDLLIMPORT char *locale_numeric;
-- 
2.43.0

#88

Peter Eisentraut

peter@eisentraut.org

6 days ago

In reply to: Jeff Davis (#87)

1 attachment(s)

Re: Remaining dependency on setlocale()

On 23.12.25 21:09, Jeff Davis wrote:

On Wed, 2025-12-17 at 11:39 +0100, Peter Eisentraut wrote:

For Metaphone, I found the reference implementation linked from its
Wikipedia page, and it looks like our implementation is pretty
closely
aligned to that. That reference implementation also contains the
C-with-cedilla case explicitly. The correct fix here would probably
be
to change the implementation to work on wide characters. But I think
for the moment you could try a shortcut like, use pg_ascii_toupper(),
but if the encoding is LATIN1 (or LATIN9 or whichever other encodings
also contain C-with-cedilla at that code point), then explicitly
uppercase that one as well. This would preserve the existing
behavior.

Done, attached new patches.

Interestingly, WIN1256 encodes only the SMALL LETTER C WITH CEDILLA. I
think, for the purposes here, we can still consider it to "uppercase"
to \xc7, so that it can still be treated as the same sound. Technically
I think that would be an improvement over the current code in this edge
case, and suggests that case folding would be a better approach than
uppercasing.

On further reflection, it seems just as easy to have dmetaphone() take
the input collation and use that to do a proper collation-aware
upper-casing. This has the same effect (that is, it will still only
support certain single-byte encodings), but it avoids elaborately
hard-coding a bunch of things, and if we ever want to make this
multibyte-aware, then we'll have to go this way anyway, I think. See
attached patch.

Attachments:

0001-Make-dmetaphone-collation-aware.patch.nocfbottext/plain; charset=UTF-8; name=0001-Make-dmetaphone-collation-aware.patch.nocfbotDownload

From cd7fa005a286c9c4f4a27e8c61ca15787ee80bbd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Tue, 6 Jan 2026 20:45:33 +0100
Subject: [PATCH] Make dmetaphone collation-aware

---
 contrib/fuzzystrmatch/dmetaphone.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 227d8b11ddc..062667527c2 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -99,6 +99,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include "postgres.h"
 
 #include "utils/builtins.h"
+#include "utils/formatting.h"
 
 /* turn off assertions for embedded function */
 #define NDEBUG
@@ -117,7 +118,7 @@ The remaining code is authored by Andrew Dunstan <amdunstan@ncshp.org> and
 #include <ctype.h>
 
 /* prototype for the main function we got from the perl module */
-static void DoubleMetaphone(char *str, char **codes);
+static void DoubleMetaphone(const char *str, Oid collid, char **codes);
 
 #ifndef DMETAPHONE_MAIN
 
@@ -142,7 +143,7 @@ dmetaphone(PG_FUNCTION_ARGS)
 	arg = PG_GETARG_TEXT_PP(0);
 	aptr = text_to_cstring(arg);
 
-	DoubleMetaphone(aptr, codes);
+	DoubleMetaphone(aptr, PG_GET_COLLATION(), codes);
 	code = codes[0];
 	if (!code)
 		code = "";
@@ -171,7 +172,7 @@ dmetaphone_alt(PG_FUNCTION_ARGS)
 	arg = PG_GETARG_TEXT_PP(0);
 	aptr = text_to_cstring(arg);
 
-	DoubleMetaphone(aptr, codes);
+	DoubleMetaphone(aptr, PG_GET_COLLATION(), codes);
 	code = codes[1];
 	if (!code)
 		code = "";
@@ -278,13 +279,17 @@ IncreaseBuffer(metastring *s, int chars_needed)
 }
 
 
-static void
-MakeUpper(metastring *s)
+static metastring *
+MakeUpper(metastring *s, Oid collid)
 {
-	char	   *i;
+	char	   *newstr;
+	metastring *newms;
+
+	newstr = str_toupper(s->str, s->length, collid);
+	newms = NewMetaString(newstr);
+	DestroyMetaString(s);
 
-	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+	return newms;
 }
 
 
@@ -392,7 +397,7 @@ MetaphAdd(metastring *s, const char *new_str)
 
 
 static void
-DoubleMetaphone(char *str, char **codes)
+DoubleMetaphone(const char *str, Oid collid, char **codes)
 {
 	int			length;
 	metastring *original;
@@ -414,7 +419,7 @@ DoubleMetaphone(char *str, char **codes)
 	primary->free_string_on_destroy = 0;
 	secondary->free_string_on_destroy = 0;
 
-	MakeUpper(original);
+	original = MakeUpper(original, collid);
 
 	/* skip these when at start of word */
 	if (StringAt(original, 0, 2, "GN", "KN", "PN", "WR", "PS", ""))
@@ -1430,7 +1435,7 @@ main(int argc, char **argv)
 
 	if (argc > 1)
 	{
-		DoubleMetaphone(argv[1], codes);
+		DoubleMetaphone(argv[1], DEFAULT_COLLATION_OID, codes);
 		printf("%s|%s\n", codes[0], codes[1]);
 	}
 }
-- 
2.52.0

#89

Jeff Davis

pgsql@j-davis.com

6 days ago

In reply to: Peter Eisentraut (#88)

Re: Remaining dependency on setlocale()

On Tue, 2026-01-06 at 20:54 +0100, Peter Eisentraut wrote:

On further reflection, it seems just as easy to have dmetaphone()
take
the input collation and use that to do a proper collation-aware
upper-casing. This has the same effect (that is, it will still only
support certain single-byte encodings), but it avoids elaborately
hard-coding a bunch of things, and if we ever want to make this
multibyte-aware, then we'll have to go this way anyway, I think. See
attached patch.

Looks good to me.

After you commit that, we still need the changes in fuzzystrmatch.c,
right?

Regards,
Jeff Davis

#90

Peter Eisentraut

peter@eisentraut.org

about 8 hours ago

In reply to: Jeff Davis (#89)

Re: Remaining dependency on setlocale()

On 06.01.26 23:20, Jeff Davis wrote:

On Tue, 2026-01-06 at 20:54 +0100, Peter Eisentraut wrote:

On further reflection, it seems just as easy to have dmetaphone()
take
the input collation and use that to do a proper collation-aware
upper-casing. This has the same effect (that is, it will still only
support certain single-byte encodings), but it avoids elaborately
hard-coding a bunch of things, and if we ever want to make this
multibyte-aware, then we'll have to go this way anyway, I think. See
attached patch.

Looks good to me.

committed

After you commit that, we still need the changes in fuzzystrmatch.c,
right?

right