check_strxfrm_bug()
Hi
While studying Jeff's new crop of collation patches I noticed in
passing that check_strxfrm_bug() must surely by now be unnecessary.
The buffer overrun bugs were fixed a decade ago, and the relevant
systems are way out of support. If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking. In any case, we also
completely disable this stuff because of bugs and quality problems in
every other known implementation, via TRUST_STRXFRM (or rather the
lack of it). So I think it's time to remove that function; please see
attached.
Just by the way, if you like slow motion domino runs, check this out:
* Original pgsql-bugs investigation into strxfrm() inconsistencies
/messages/by-id/111D0E27-A8F3-4A84-A4E0-B0FB703863DF@s24.com
* I happened to test that on bleeding-edge FreeBSD 11 (wasn't released
yet), because at that time FreeBSD was in the process of adopting
illumos's new collation code, and reported teething problems:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208266
* FreeBSD, DragonFly and illumos's trees were then partially fixed by
the authors, but our strcolltest.c still showed some remaining
problems in some locales (and it still does on my local FreeBSD
battlestation):
https://github.com/freebsd/freebsd-src/commit/c48dc2a193b9befceda8dfc6f894d73251cc00a4
https://www.illumos.org/rb/r/402/
* The authors traced the remaining problem to flaws in the Unicode
project's CLDR/POSIX data, and the report was accepted:
https://www.illumos.org/issues/7962
https://unicode-org.atlassian.net/browse/CLDR-10394
Eventually that'll be fixed, and (I guess) trigger at least a CLDR
minor version bump affecting all downstream consumers (ICU, ...).
Then... maybe... at least FreeBSD will finally pass that test. I do
wonder whether other consumer libraries are also confused by that
problem source data, and if not, why not; are glibc's problems related
or just random code or data quality problems in different areas? (I
also don't know why a problem in that data should affect strxfrm() and
strcoll() differently, but I don't plan to find out owing to an acute
shortage of round tuits).
But in the meantime, I still can't recommend turning on TRUST_STRXFRM
on any OS that I know of! The strcolltest.c program certainly still
finds fault with glibc 2.36 despite the last update on that redhat
bugzilla ticket that suggested that the big resync back in 2.28 was
going to fix it.
To be fair, macOS does actually pass that test for all locales, but
the strxfrm() result is too narrow to be useful, according to comments
in our tree. I would guess that a couple of other OSes with the old
Berkeley locale code are similar.
Attachments:
0001-Remove-obsolete-defense-against-strxfrm-bugs.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-obsolete-defense-against-strxfrm-bugs.patchDownload
From 8a70596a7de680e02fb2a6db59e622a653f097c2 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 15 Dec 2022 13:38:57 +1300
Subject: [PATCH] Remove obsolete defense against strxfrm() bugs.
Old versions of Solaris and illumos had buffer overrun bugs in their
strxfrm() implementations. The bugs were fixed a decade ago and the
relevant releases are long out of vendor support. It's time to remove
the defense added by commit be8b06c3.
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 37989ba567..bf6992d9fe 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -137,8 +137,6 @@ main(int argc, char *argv[])
*/
unsetenv("LC_ALL");
- check_strxfrm_bug();
-
/*
* Catch standard options before doing much else, in particular before we
* insist on not being root.
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..aa60008ad7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1134,64 +1134,6 @@ IsoLocaleName(const char *winlocname)
#endif /* WIN32 && LC_MESSAGES */
-/*
- * Detect aging strxfrm() implementations that, in a subset of locales, write
- * past the specified buffer length. Affected users must update OS packages
- * before using PostgreSQL 9.5 or later.
- *
- * Assume that the bug can come and go from one postmaster startup to another
- * due to physical replication among diverse machines. Assume that the bug's
- * presence will not change during the life of a particular postmaster. Given
- * those assumptions, call this no less than once per postmaster startup per
- * LC_COLLATE setting used. No known-affected system offers strxfrm_l(), so
- * there is no need to consider pg_collation locales.
- */
-void
-check_strxfrm_bug(void)
-{
- char buf[32];
- const int canary = 0x7F;
- bool ok = true;
-
- /*
- * Given a two-byte ASCII string and length limit 7, 8 or 9, Solaris 10
- * 05/08 returns 18 and modifies 10 bytes. It respects limits above or
- * below that range.
- *
- * The bug is present in Solaris 8 as well; it is absent in Solaris 10
- * 01/13 and Solaris 11.2. Affected locales include is_IS.ISO8859-1,
- * en_US.UTF-8, en_US.ISO8859-1, and ru_RU.KOI8-R. Unaffected locales
- * include de_DE.UTF-8, de_DE.ISO8859-1, zh_TW.UTF-8, and C.
- */
- buf[7] = canary;
- (void) strxfrm(buf, "ab", 7);
- if (buf[7] != canary)
- ok = false;
-
- /*
- * illumos bug #1594 was present in the source tree from 2010-10-11 to
- * 2012-02-01. Given an ASCII string of any length and length limit 1,
- * affected systems ignore the length limit and modify a number of bytes
- * one less than the return value. The problem inputs for this bug do not
- * overlap those for the Solaris bug, hence a distinct test.
- *
- * Affected systems include smartos-20110926T021612Z. Affected locales
- * include en_US.ISO8859-1 and en_US.UTF-8. Unaffected locales include C.
- */
- buf[1] = canary;
- (void) strxfrm(buf, "a", 1);
- if (buf[1] != canary)
- ok = false;
-
- if (!ok)
- ereport(ERROR,
- (errcode(ERRCODE_SYSTEM_ERROR),
- errmsg_internal("strxfrm(), in locale \"%s\", writes past the specified array length",
- setlocale(LC_COLLATE, NULL)),
- errhint("Apply system library package updates.")));
-}
-
-
/*
* Cache mechanism for collation information.
*
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index a990c833c5..d07adb0a20 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -475,8 +475,6 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
- check_strxfrm_bug();
-
ReleaseSysCache(tup);
}
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a875942123..60ccd7e34d 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -50,7 +50,6 @@ extern PGDLLIMPORT char *localized_full_months[];
extern bool check_locale(int category, const char *locale, char **canonname);
extern char *pg_perm_setlocale(int category, const char *locale);
-extern void check_strxfrm_bug(void);
extern bool lc_collate_is_c(Oid collation);
extern bool lc_ctype_is_c(Oid collation);
--
2.38.1
On Thu, Dec 15, 2022 at 3:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:
... If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking.
With my garbage collector hat on, that made me wonder if there was
some more potential cleanup here: could we require locale_t yet? The
last straggler systems on our target OS list to add the POSIX locale_t
stuff were Solaris 11.4 (2018) and OpenBSD 6.2 (2018). Apparently
it's still too soon: we have two EOL'd OSes in the farm that are older
than that. But here's an interesting fact about wrasse, assuming its
host is gcc211: it looks like it can't even apply further OS updates
because the hardware[1]https://cfarm.tetaneutral.net/machines/list/ is so old that Solaris doesn't support it
anymore[2]https://support.oracle.com/knowledge/Sun%20Microsystems/2382427_1.html.
[1]: https://cfarm.tetaneutral.net/machines/list/
[2]: https://support.oracle.com/knowledge/Sun%20Microsystems/2382427_1.html
On Sun, Dec 18, 2022 at 10:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
With my garbage collector hat on, that made me wonder if there was
some more potential cleanup here: could we require locale_t yet? The
last straggler systems on our target OS list to add the POSIX locale_t
stuff were Solaris 11.4 (2018) and OpenBSD 6.2 (2018). Apparently
it's still too soon: we have two EOL'd OSes in the farm that are older
than that. But here's an interesting fact about wrasse, assuming its
host is gcc211: it looks like it can't even apply further OS updates
because the hardware[1] is so old that Solaris doesn't support it
anymore[2].
For the record, now the OpenBSD machines have been upgraded, so now
"wrasse" is the last relevant computer on earth with no POSIX
locale_t. Unfortunately there is no reason to think it's going to go
away soon, so I'm just noting this fact here as a reminder for when it
eventually does...
On Thu, Dec 15, 2022 at 03:22:59PM +1300, Thomas Munro wrote:
While studying Jeff's new crop of collation patches I noticed in
passing that check_strxfrm_bug() must surely by now be unnecessary.
The buffer overrun bugs were fixed a decade ago, and the relevant
systems are way out of support. If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking. In any case, we also
completely disable this stuff because of bugs and quality problems in
every other known implementation, via TRUST_STRXFRM (or rather the
lack of it). So I think it's time to remove that function; please see
attached.
Seems reasonable to me.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes:
On Thu, Dec 15, 2022 at 03:22:59PM +1300, Thomas Munro wrote:
While studying Jeff's new crop of collation patches I noticed in
passing that check_strxfrm_bug() must surely by now be unnecessary.
The buffer overrun bugs were fixed a decade ago, and the relevant
systems are way out of support. If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking. In any case, we also
completely disable this stuff because of bugs and quality problems in
every other known implementation, via TRUST_STRXFRM (or rather the
lack of it). So I think it's time to remove that function; please see
attached.
Seems reasonable to me.
+1. I wonder if we should go further and get rid of TRUST_STRXFRM
and the not-so-trivial amount of code around it (pg_strxfrm_enabled
etc). Carrying that indefinitely in the probably-vain hope that
the libraries will become trustworthy seems rather pointless.
Besides, if such a miracle does occur, we can dig the code out
of our git history.
regards, tom lane
On Mon, Apr 17, 2023 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
+1. I wonder if we should go further and get rid of TRUST_STRXFRM
and the not-so-trivial amount of code around it (pg_strxfrm_enabled
etc). Carrying that indefinitely in the probably-vain hope that
the libraries will become trustworthy seems rather pointless.
Besides, if such a miracle does occur, we can dig the code out
of our git history.
+1 for getting rid of TRUST_STRXFRM.
ICU-based collations (which aren't affected by TRUST_STRXFRM) are
becoming the de facto standard (possibly even the de jure standard).
So even if we thought that the situation with strxfrm() had improved,
we'd still have little motivation to do anything about it.
--
Peter Geoghegan
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
On Mon, Apr 17, 2023 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
+1. I wonder if we should go further and get rid of TRUST_STRXFRM
and the not-so-trivial amount of code around it (pg_strxfrm_enabled
etc). Carrying that indefinitely in the probably-vain hope that
the libraries will become trustworthy seems rather pointless.
Besides, if such a miracle does occur, we can dig the code out
of our git history.+1 for getting rid of TRUST_STRXFRM.
ICU-based collations (which aren't affected by TRUST_STRXFRM) are
becoming the de facto standard (possibly even the de jure standard).
So even if we thought that the situation with strxfrm() had improved,
we'd still have little motivation to do anything about it.
Makes sense to do some cleanup now as this is new in the tree.
Perhaps somebody from the RMT would like to comment?
FYI, Jeff has also posted patches to replace this CFLAGS with a GUC:
/messages/by-id/6ec4ad7f93f255dbb885da0a664d9c77ed4872c4.camel@j-davis.com
--
Michael
On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
+1 for getting rid of TRUST_STRXFRM.
+1
The situation is not improving fast, and requires hard work to follow
on each OS. Clearly, mainstreaming ICU is the way to go. libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.
On 4/18/23 9:19 PM, Thomas Munro wrote:
On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
+1 for getting rid of TRUST_STRXFRM.
+1
The situation is not improving fast, and requires hard work to follow
on each OS. Clearly, mainstreaming ICU is the way to go. libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.
[RMT hat, personal opinion on RMT]
To be clear, is the proposal to remove both "check_strxfrm_bug" and
"TRUST_STRXFRM"?
Given a bunch of folks who have expertise in this area of code all agree
with removing the above as part of the collation cleanups targeted for
v16, I'm inclined to agree. I don't really see the need for an explicit
RMT action, but based on the consensus this seems OK to add as an open item.
Thanks,
Jonathan
On 4/18/23 21:19, Thomas Munro wrote:
On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
+1 for getting rid of TRUST_STRXFRM.
+1
The situation is not improving fast, and requires hard work to follow
on each OS. Clearly, mainstreaming ICU is the way to go. libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.
I have wondered a few times, given the known issues with strxfrm, how is
the use in selfuncs.c still ok. Has anyone taken a hard look at that?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes:
I have wondered a few times, given the known issues with strxfrm, how is
the use in selfuncs.c still ok. Has anyone taken a hard look at that?
On the one hand, we only need approximately-correct results in that
code. On the other, the result is fed to convert_string_to_scalar(),
which has a rather naive idea that it's dealing with ASCII text.
I've seen at least some strxfrm output that isn't even vaguely
textual-looking.
regards, tom lane
On Wed, Apr 19, 2023 at 2:31 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
To be clear, is the proposal to remove both "check_strxfrm_bug" and
"TRUST_STRXFRM"?Given a bunch of folks who have expertise in this area of code all agree
with removing the above as part of the collation cleanups targeted for
v16, I'm inclined to agree. I don't really see the need for an explicit
RMT action, but based on the consensus this seems OK to add as an open item.
Thanks all. I went ahead and removed check_strxfrm_bug().
I could write a patch to remove the libc strxfrm support, but since
Jeff recently wrote new code in 16 to abstract that stuff, he might
prefer to look at it?
On Thu, 2023-04-20 at 13:34 +1200, Thomas Munro wrote:
I could write a patch to remove the libc strxfrm support, but since
Jeff recently wrote new code in 16 to abstract that stuff, he might
prefer to look at it?
+1 to removing it.
As far as how it's removed, we could directly check:
if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU))
abbreviate = false;
as it was before, or we could still try to hide it as a detail behind a
function. I don't have a strong opinion there, though I thought it
might be good for varlena.c to not know those internal details.
Regards,
Jeff Davis
On 4/19/23 9:34 PM, Thomas Munro wrote:
On Wed, Apr 19, 2023 at 2:31 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
To be clear, is the proposal to remove both "check_strxfrm_bug" and
"TRUST_STRXFRM"?Given a bunch of folks who have expertise in this area of code all agree
with removing the above as part of the collation cleanups targeted for
v16, I'm inclined to agree. I don't really see the need for an explicit
RMT action, but based on the consensus this seems OK to add as an open item.Thanks all. I went ahead and removed check_strxfrm_bug().
Thanks! For housekeeping, I put this into "Open Items" and marked it as
resolved.
I could write a patch to remove the libc strxfrm support, but since
Jeff recently wrote new code in 16 to abstract that stuff, he might
prefer to look at it?
I believe we'd be qualifying this as an open item too? If so, let's add it.
Thanks,
Jonathan
On Mon, Apr 17, 2023 at 8:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sun, Dec 18, 2022 at 10:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
With my garbage collector hat on, that made me wonder if there was
some more potential cleanup here: could we require locale_t yet? The
last straggler systems on our target OS list to add the POSIX locale_t
stuff were Solaris 11.4 (2018) and OpenBSD 6.2 (2018). Apparently
it's still too soon: we have two EOL'd OSes in the farm that are older
than that. But here's an interesting fact about wrasse, assuming its
host is gcc211: it looks like it can't even apply further OS updates
because the hardware[1] is so old that Solaris doesn't support it
anymore[2].For the record, now the OpenBSD machines have been upgraded, so now
"wrasse" is the last relevant computer on earth with no POSIX
locale_t. Unfortunately there is no reason to think it's going to go
away soon, so I'm just noting this fact here as a reminder for when it
eventually does...
Since talk of server threads erupted again, I just wanted to note that
this system locale API stuff would be on the long list of
micro-obstacles. You'd *have* to use the locale_t-based interfaces
(or come up with replacements using a big ugly lock to serialise all
access to the process-global locale, or allow only ICU locale support
in that build, but those seem like strange lengths to go to just to
support a dead version of Solaris). There are still at least a couple
of functions that lack XXX_l variants in the standard: mbstowcs() and
wcstombs() (though we use the non-standard _l variants if we find them
in <xlocale.h>), but that's OK because we use uselocale() and not
setlocale(), because uselocale() is required to be thread-local. The
use of setlocale() to set up the per-backend/per-database default
locale would have to be replaced with uselocale(). In other words, I
think wrasse would not be coming with us on this hypothetical quest.
On 12/06/2023 01:15, Thomas Munro wrote:
There are still at least a couple
of functions that lack XXX_l variants in the standard: mbstowcs() and
wcstombs() (though we use the non-standard _l variants if we find them
in <xlocale.h>), but that's OK because we use uselocale() and not
setlocale(), because uselocale() is required to be thread-local.
Right, mbstowcs() and wcstombs() are already thread-safe, that's why
there are no _l variants of them.
The use of setlocale() to set up the per-backend/per-database
default locale would have to be replaced with uselocale().
This recent bug report is also related to that:
/messages/by-id/17946-3e84cb577e9551c3@postgresql.org.
In a nutshell, libperl calls uselocale(), which overrides the setting we
try set with setlocale().
--
Heikki Linnakangas
Neon (https://neon.tech)
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.
It's slightly annoying that Windows has locale_t but doesn't have
uselocale(). It does have thread-local locales via another API,
though. I wonder how hard it would be to get to a point where all
systems have uselocale() too, by supplying a replacement. I noticed
that some other projects eg older versions of LLVM libcxx do that. I
see from one of their discussions[1]https://reviews.llvm.org/D40181 that it worked, except that
thread-local locales are only available with one of the MinGW C
runtimes and not another. We'd have to get to the bottom of that.
On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.
That would look something like this.
Attachments:
0001-All-supported-computers-have-locale_t.patchtext/x-patch; charset=US-ASCII; name=0001-All-supported-computers-have-locale_t.patchDownload
From 87268870aa9b6f22eb0bf831614bd38c529b0c8c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 28 Jun 2023 10:23:32 +1200
Subject: [PATCH] All supported computers have locale_t.
locale_t is defined by POSIX:1-2008 and SUSv4, and available on all
targeted systems.
Definitions for HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L had to be moved into
win32_port.h, because this change revealed that MinGW builds on Windows
were failing to detect these functions, but without them we couldn't
build due to the assumption that uselocale() exists.
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
diff --git a/config/c-library.m4 b/config/c-library.m4
index c1dd804679..aa8223d2ef 100644
--- a/config/c-library.m4
+++ b/config/c-library.m4
@@ -86,9 +86,9 @@ AC_DEFUN([PGAC_STRUCT_SOCKADDR_SA_LEN],
# PGAC_TYPE_LOCALE_T
# ------------------
# Check for the locale_t type and find the right header file. macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
-# xlocale.h file that we should not use.
-#
+# needs xlocale.h; standard is locale.h, but glibc <= 2.25 also had an
+# xlocale.h file that we should not use, so we check the standard
+# header first.
AC_DEFUN([PGAC_TYPE_LOCALE_T],
[AC_CACHE_CHECK([for locale_t], pgac_cv_type_locale_t,
[AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
@@ -102,10 +102,6 @@ locale_t x;],
[])],
[pgac_cv_type_locale_t='yes (in xlocale.h)'],
[pgac_cv_type_locale_t=no])])])
-if test "$pgac_cv_type_locale_t" != no; then
- AC_DEFINE(HAVE_LOCALE_T, 1,
- [Define to 1 if the system has the type `locale_t'.])
-fi
if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
AC_DEFINE(LOCALE_T_IN_XLOCALE, 1,
[Define to 1 if `locale_t' requires <xlocale.h>.])
diff --git a/configure b/configure
index a1c3cb0036..b8bc03e2c7 100755
--- a/configure
+++ b/configure
@@ -15122,11 +15122,6 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_type_locale_t" >&5
$as_echo "$pgac_cv_type_locale_t" >&6; }
-if test "$pgac_cv_type_locale_t" != no; then
-
-$as_echo "#define HAVE_LOCALE_T 1" >>confdefs.h
-
-fi
if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
$as_echo "#define LOCALE_T_IN_XLOCALE 1" >>confdefs.h
diff --git a/meson.build b/meson.build
index 7bc7310c44..7cc4cbc034 100644
--- a/meson.build
+++ b/meson.build
@@ -2283,17 +2283,12 @@ else
cdata.set('STRERROR_R_INT', false)
endif
-# Check for the locale_t type and find the right header file. macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
+# Find the right header file for the locale_t type. macOS
+# needs xlocale.h; standard is locale.h, but older glibc also had an
# xlocale.h file that we should not use. MSVC has a replacement
# defined in src/include/port/win32_port.h.
-if cc.has_type('locale_t', prefix: '#include <locale.h>')
- cdata.set('HAVE_LOCALE_T', 1)
-elif cc.has_type('locale_t', prefix: '#include <xlocale.h>')
- cdata.set('HAVE_LOCALE_T', 1)
+if not cc.has_type('locale_t', prefix: '#include <locale.h>') and cc.has_type('locale_t', prefix: '#include <xlocale.h>')
cdata.set('LOCALE_T_IN_XLOCALE', 1)
-elif cc.get_id() == 'msvc'
- cdata.set('HAVE_LOCALE_T', 1)
endif
# Check if the C compiler understands typeof or a variant. Define
@@ -2489,13 +2484,6 @@ if cc.has_function('syslog', args: test_c_args) and \
endif
-# MSVC has replacements defined in src/include/port/win32_port.h.
-if cc.get_id() == 'msvc'
- cdata.set('HAVE_WCSTOMBS_L', 1)
- cdata.set('HAVE_MBSTOWCS_L', 1)
-endif
-
-
# if prerequisites for unnamed posix semas aren't fulfilled, fall back to sysv
# semaphores
if sema_kind == 'unnamed_posix' and \
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index efb8b4d289..cc239b4d14 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -534,7 +534,7 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
/* will we use "locale -a" in pg_import_system_collations? */
-#if defined(HAVE_LOCALE_T) && !defined(WIN32)
+#if !defined(WIN32)
#define READ_LOCALE_A_OUTPUT
#endif
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 00ce735fdd..31e6300b5d 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -44,8 +44,7 @@
* the platform's wchar_t representation matches what we do in pg_wchar
* conversions.
*
- * 3. Other collations are only supported on platforms that HAVE_LOCALE_T.
- * Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
+ * 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
* functions, under exactly the same cases as #2.
*
* There is one notable difference between cases 2 and 3: in the "default"
@@ -252,11 +251,6 @@ pg_set_regex_collation(Oid collation)
}
else
{
- /*
- * NB: pg_newlocale_from_collation will fail if not HAVE_LOCALE_T; the
- * case of pg_regex_locale != 0 but not HAVE_LOCALE_T does not have to
- * be considered below.
- */
pg_regex_locale = pg_newlocale_from_collation(collation);
if (!pg_locale_deterministic(pg_regex_locale))
@@ -304,16 +298,12 @@ pg_wc_isdigit(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isdigit((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -340,16 +330,12 @@ pg_wc_isalpha(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isalpha((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -376,16 +362,12 @@ pg_wc_isalnum(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isalnum((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -421,16 +403,12 @@ pg_wc_isupper(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isupper((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -457,16 +435,12 @@ pg_wc_islower(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
islower((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
islower_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -493,16 +467,12 @@ pg_wc_isgraph(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isgraph((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -529,16 +499,12 @@ pg_wc_isprint(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isprint((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -565,16 +531,12 @@ pg_wc_ispunct(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
ispunct((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -601,16 +563,12 @@ pg_wc_isspace(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isspace((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -645,16 +603,12 @@ pg_wc_toupper(pg_wchar c)
return toupper((unsigned char) c);
return c;
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
if (c <= (pg_wchar) UCHAR_MAX)
return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-#endif
return c;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -689,16 +643,12 @@ pg_wc_tolower(pg_wchar c)
return tolower((unsigned char) c);
return c;
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
if (c <= (pg_wchar) UCHAR_MAX)
return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-#endif
return c;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index e6246dc44b..e27ea8ef97 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1613,12 +1613,6 @@ u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
* in multibyte character sets. Note that in either case we are effectively
* assuming that the database character encoding matches the encoding implied
* by LC_CTYPE.
- *
- * If the system provides locale_t and associated functions (which are
- * standardized by Open Group's XBD), we can support collations that are
- * neither default nor C. The code is written to handle both combinations
- * of have-wide-characters and have-locale_t, though it's rather unlikely
- * a platform would have the latter without the former.
*/
/*
@@ -1696,11 +1690,9 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
else
-#endif
workspace[curr_char] = towlower(workspace[curr_char]);
}
@@ -1729,11 +1721,9 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
*p = tolower_l((unsigned char) *p, mylocale->info.lt);
else
-#endif
*p = pg_tolower((unsigned char) *p);
}
}
@@ -1818,11 +1808,9 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
else
-#endif
workspace[curr_char] = towupper(workspace[curr_char]);
}
@@ -1851,11 +1839,9 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
*p = toupper_l((unsigned char) *p, mylocale->info.lt);
else
-#endif
*p = pg_toupper((unsigned char) *p);
}
}
@@ -1941,7 +1927,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
{
if (wasalnum)
@@ -1951,7 +1936,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
}
else
-#endif
{
if (wasalnum)
workspace[curr_char] = towlower(workspace[curr_char]);
@@ -1986,7 +1970,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
{
if (wasalnum)
@@ -1996,7 +1979,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
}
else
-#endif
{
if (wasalnum)
*p = pg_tolower((unsigned char) *p);
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 33a2f46aab..62902caefa 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -95,10 +95,8 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
{
if (locale_is_c)
return pg_ascii_tolower(c);
-#ifdef HAVE_LOCALE_T
else if (locale)
return tolower_l(c, locale->info.lt);
-#endif
else
return pg_tolower(c);
}
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 9b603d42f3..34e1b709ae 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1509,10 +1509,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
else if (locale && locale->provider == COLLPROVIDER_ICU)
return IS_HIGHBIT_SET(c) ||
(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-#ifdef HAVE_LOCALE_T
else if (locale && locale->provider == COLLPROVIDER_LIBC)
return isalpha_l((unsigned char) c, locale->info.lt);
-#endif
else
return isalpha((unsigned char) c);
}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c8b36f3af2..83c5b960a7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1420,7 +1420,6 @@ make_icu_collator(const char *iculocstr,
/* simple subroutine for reporting errors from newlocale() */
-#ifdef HAVE_LOCALE_T
static void
report_newlocale_failure(const char *localename)
{
@@ -1449,7 +1448,6 @@ report_newlocale_failure(const char *localename)
errdetail("The operating system could not find any locale data for the locale name \"%s\".",
localename) : 0)));
}
-#endif /* HAVE_LOCALE_T */
bool
pg_locale_deterministic(pg_locale_t locale)
@@ -1466,10 +1464,6 @@ pg_locale_deterministic(pg_locale_t locale)
* lifetime of the backend. Thus, do not free the result with freelocale().
*
* As a special optimization, the default/database collation returns 0.
- * Callers should then revert to the non-locale_t-enabled code path.
- * Also, callers should avoid calling this before going down a C/POSIX
- * fastpath, because such a fastpath should work even on platforms without
- * locale_t support in the C library.
*
* For simplicity, we always generate COLLATE + CTYPE even though we
* might only need one of them. Since this is called only once per session,
@@ -1515,7 +1509,6 @@ pg_newlocale_from_collation(Oid collid)
if (collform->collprovider == COLLPROVIDER_LIBC)
{
-#ifdef HAVE_LOCALE_T
const char *collcollate;
const char *collctype pg_attribute_unused();
locale_t loc;
@@ -1566,12 +1559,6 @@ pg_newlocale_from_collation(Oid collid)
}
result.info.lt = loc;
-#else /* not HAVE_LOCALE_T */
- /* platform that doesn't support locale_t */
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("collation provider LIBC is not supported on this platform")));
-#endif /* not HAVE_LOCALE_T */
}
else if (collform->collprovider == COLLPROVIDER_ICU)
{
@@ -1788,11 +1775,9 @@ pg_strncoll_libc_win32_utf8(const char *arg1, size_t len1, const char *arg2,
((LPWSTR) a2p)[r] = 0;
errno = 0;
-#ifdef HAVE_LOCALE_T
if (locale)
result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
else
-#endif
result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
if (result == 2147483647) /* _NLSCMPERROR; missing from mingw headers */
ereport(ERROR,
@@ -1831,14 +1816,7 @@ pg_strcoll_libc(const char *arg1, const char *arg2, pg_locale_t locale)
else
#endif /* WIN32 */
if (locale)
- {
-#ifdef HAVE_LOCALE_T
result = strcoll_l(arg1, arg2, locale->info.lt);
-#else
- /* shouldn't happen */
- elog(ERROR, "unsupported collprovider: %c", locale->provider);
-#endif
- }
else
result = strcoll(arg1, arg2);
@@ -2065,11 +2043,9 @@ pg_strxfrm_libc(char *dest, const char *src, size_t destsize,
Assert(!locale || locale->provider == COLLPROVIDER_LIBC);
#ifdef TRUST_STRXFRM
-#ifdef HAVE_LOCALE_T
if (locale)
return strxfrm_l(dest, src, destsize, locale->info.lt);
else
-#endif
return strxfrm(dest, src, destsize);
#else
/* shouldn't happen */
@@ -2955,7 +2931,6 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
}
else
{
-#ifdef HAVE_LOCALE_T
#ifdef HAVE_WCSTOMBS_L
/* Use wcstombs_l for nondefault locales */
result = wcstombs_l(to, from, tolen, locale->info.lt);
@@ -2967,11 +2942,6 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
uselocale(save_locale);
#endif /* HAVE_WCSTOMBS_L */
-#else /* !HAVE_LOCALE_T */
- /* Can't have locale != 0 without HAVE_LOCALE_T */
- elog(ERROR, "wcstombs_l is not available");
- result = 0; /* keep compiler quiet */
-#endif /* HAVE_LOCALE_T */
}
return result;
@@ -3032,7 +3002,6 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
}
else
{
-#ifdef HAVE_LOCALE_T
#ifdef HAVE_MBSTOWCS_L
/* Use mbstowcs_l for nondefault locales */
result = mbstowcs_l(to, str, tolen, locale->info.lt);
@@ -3044,11 +3013,6 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
uselocale(save_locale);
#endif /* HAVE_MBSTOWCS_L */
-#else /* !HAVE_LOCALE_T */
- /* Can't have locale != 0 without HAVE_LOCALE_T */
- elog(ERROR, "mbstowcs_l is not available");
- result = 0; /* keep compiler quiet */
-#endif /* HAVE_LOCALE_T */
}
pfree(str);
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 6d572c3820..c33fc4b8cd 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -298,9 +298,6 @@
/* Define to 1 if you have the `zstd' library (-lzstd). */
#undef HAVE_LIBZSTD
-/* Define to 1 if the system has the type `locale_t'. */
-#undef HAVE_LOCALE_T
-
/* Define to 1 if `long int' works and is 64 bits. */
#undef HAVE_LONG_INT_64
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index b957d5c598..dee85fa3b2 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -458,6 +458,9 @@ extern int _pglstat64(const char *name, struct stat *buf);
#define wcstombs_l _wcstombs_l
#define mbstowcs_l _mbstowcs_l
+#define HAVE_WCSTOMBS_L 1
+#define HAVE_MBSTOWCS_L 1
+
/*
* Versions of libintl >= 0.18? try to replace setlocale() with a macro
* to their own versions. Remove the macro, if it exists, because it
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e2a7243542..6447bea8e0 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -67,9 +67,7 @@ extern void cache_locale_time(void);
/*
- * We define our own wrapper around locale_t so we can keep the same
- * function signatures for all builds, while not having to create a
- * fake version of the standard type locale_t in the global namespace.
+ * We use a discriminated union to hold either a locale_t or an ICU collator.
* pg_locale_t is occasionally checked for truth, so make it a pointer.
*/
struct pg_locale_struct
@@ -78,9 +76,7 @@ struct pg_locale_struct
bool deterministic;
union
{
-#ifdef HAVE_LOCALE_T
locale_t lt;
-#endif
#ifdef USE_ICU
struct
{
@@ -88,7 +84,6 @@ struct pg_locale_struct
UCollator *ucol;
} icu;
#endif
- int dummy; /* in case we have neither LOCALE_T nor ICU */
} info;
};
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index b6d31c3583..bf708f15b3 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -296,7 +296,6 @@ sub GenerateFiles
HAVE_LIBXSLT => undef,
HAVE_LIBZ => $self->{options}->{zlib} ? 1 : undef,
HAVE_LIBZSTD => undef,
- HAVE_LOCALE_T => 1,
HAVE_LONG_INT_64 => undef,
HAVE_LONG_LONG_INT_64 => 1,
HAVE_MBARRIER_H => undef,
--
2.39.2
On Wed, Jun 28, 2023 at 01:02:21PM +1200, Thomas Munro wrote:
On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.That would look something like this.
This removes thirty-eight ifdefs, most of them located in the middle of
function bodies. That's far more beneficial than most proposals to raise
minimum requirements. +1 for revoking support for wrasse's OS version.
(wrasse wouldn't move, but it would stop testing v17+.)
On Sun, Jul 2, 2023 at 4:25 AM Noah Misch <noah@leadboat.com> wrote:
On Wed, Jun 28, 2023 at 01:02:21PM +1200, Thomas Munro wrote:
On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.That would look something like this.
This removes thirty-eight ifdefs, most of them located in the middle of
function bodies. That's far more beneficial than most proposals to raise
minimum requirements. +1 for revoking support for wrasse's OS version.
(wrasse wouldn't move, but it would stop testing v17+.)
Great. It sounds like I should wait a few days for any other feedback
and then push this patch. Thanks for looking.
On Sun Jul 2, 2023 at 8:49 PM CDT, Thomas Munro wrote:
On Sun, Jul 2, 2023 at 4:25 AM Noah Misch <noah@leadboat.com> wrote:
On Wed, Jun 28, 2023 at 01:02:21PM +1200, Thomas Munro wrote:
On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.That would look something like this.
This removes thirty-eight ifdefs, most of them located in the middle of
function bodies. That's far more beneficial than most proposals to raise
minimum requirements. +1 for revoking support for wrasse's OS version.
(wrasse wouldn't move, but it would stop testing v17+.)Great. It sounds like I should wait a few days for any other feedback
and then push this patch. Thanks for looking.
The patch looks good to me as well. Happy to rebase my other patch on
this one.
--
Tristan Partin
Neon (https://neon.tech)
On Tue, Jul 4, 2023 at 2:52 AM Tristan Partin <tristan@neon.tech> wrote:
The patch looks good to me as well. Happy to rebase my other patch on
this one.
Thanks. Here is a slightly tidier version. It passes on CI[1]https://cirrus-ci.com/build/5298278007308288
including the optional extra MinGW64/Meson task, and the
MinGW64/autoconf configure+build that is in the SanityCheck task.
There are two questions I'm hoping to get feedback on: (1) I believe
that defining HAVE_MBSTOWCS_L etc in win32_port.h is the best idea
because that is also where we define mbstowcs_l() etc. Does that make
sense? (2) IIRC, ye olde Solution.pm system might break if I were to
completely remove HAVE_MBSTOWCS_L and HAVE_WCSTOMBS_L from Solution.pm
(there must be a check somewhere that compares it with pg_config.h.in
or something like that), but it would also break if I defined them as
1 there (macro redefinition). Will undef in Solution.pm be
acceptable (ie define nothing to avoid redefinition, but side-step the
sanity check)? It's a bit of a kludge, but IIRC we're dropping that
3rd build system in 17 so maybe that's OK? (Not tested as I don't
have Windows and CI doesn't test Solution.pm, so I'd be grateful if
someone who has Windows/Solution.pm setup could try this.)
Attachments:
v2-0001-All-supported-systems-have-locale_t.patchtext/x-patch; charset=US-ASCII; name=v2-0001-All-supported-systems-have-locale_t.patchDownload
From ab0705a9e169627a77bf4e3cc36ea53603162921 Mon Sep 17 00:00:00 2001
From: Thomas Munro <tmunro@postgresql.org>
Date: Tue, 4 Jul 2023 16:32:35 +1200
Subject: [PATCH v2] All supported systems have locale_t.
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems. For Windows, win32_port.h redirects to a partial
implementation called _locale_t. It's time to remove a lot of
compile-time tests for HAVE_LOCALE_T, and a few associated comments and
dead code branches.
Definitions for HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L had to be moved into
win32_port.h, because this change revealed that MinGW builds on Windows
were failing to detect these functions, but without them we couldn't
build due to the assumption that uselocale() exists.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
---
config/c-library.m4 | 10 ++----
configure | 5 ---
meson.build | 22 +++---------
src/backend/commands/collationcmds.c | 2 +-
src/backend/regex/regc_pg_locale.c | 52 +---------------------------
src/backend/utils/adt/formatting.c | 18 ----------
src/backend/utils/adt/like.c | 2 --
src/backend/utils/adt/like_support.c | 2 --
src/backend/utils/adt/pg_locale.c | 36 -------------------
src/include/pg_config.h.in | 3 --
src/include/port/win32_port.h | 3 ++
src/include/utils/pg_locale.h | 7 +---
src/tools/msvc/Solution.pm | 5 ++-
13 files changed, 16 insertions(+), 151 deletions(-)
diff --git a/config/c-library.m4 b/config/c-library.m4
index c1dd804679..aa8223d2ef 100644
--- a/config/c-library.m4
+++ b/config/c-library.m4
@@ -86,9 +86,9 @@ AC_DEFUN([PGAC_STRUCT_SOCKADDR_SA_LEN],
# PGAC_TYPE_LOCALE_T
# ------------------
# Check for the locale_t type and find the right header file. macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
-# xlocale.h file that we should not use.
-#
+# needs xlocale.h; standard is locale.h, but glibc <= 2.25 also had an
+# xlocale.h file that we should not use, so we check the standard
+# header first.
AC_DEFUN([PGAC_TYPE_LOCALE_T],
[AC_CACHE_CHECK([for locale_t], pgac_cv_type_locale_t,
[AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
@@ -102,10 +102,6 @@ locale_t x;],
[])],
[pgac_cv_type_locale_t='yes (in xlocale.h)'],
[pgac_cv_type_locale_t=no])])])
-if test "$pgac_cv_type_locale_t" != no; then
- AC_DEFINE(HAVE_LOCALE_T, 1,
- [Define to 1 if the system has the type `locale_t'.])
-fi
if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
AC_DEFINE(LOCALE_T_IN_XLOCALE, 1,
[Define to 1 if `locale_t' requires <xlocale.h>.])
diff --git a/configure b/configure
index 997d42d8f7..f1d4630007 100755
--- a/configure
+++ b/configure
@@ -15120,11 +15120,6 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_type_locale_t" >&5
$as_echo "$pgac_cv_type_locale_t" >&6; }
-if test "$pgac_cv_type_locale_t" != no; then
-
-$as_echo "#define HAVE_LOCALE_T 1" >>confdefs.h
-
-fi
if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
$as_echo "#define LOCALE_T_IN_XLOCALE 1" >>confdefs.h
diff --git a/meson.build b/meson.build
index 3ea4b0d72a..82a966fcc2 100644
--- a/meson.build
+++ b/meson.build
@@ -2283,17 +2283,12 @@ else
cdata.set('STRERROR_R_INT', false)
endif
-# Check for the locale_t type and find the right header file. macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
-# xlocale.h file that we should not use. MSVC has a replacement
-# defined in src/include/port/win32_port.h.
-if cc.has_type('locale_t', prefix: '#include <locale.h>')
- cdata.set('HAVE_LOCALE_T', 1)
-elif cc.has_type('locale_t', prefix: '#include <xlocale.h>')
- cdata.set('HAVE_LOCALE_T', 1)
+# Find the right header file for the locale_t type. macOS needs xlocale.h;
+# standard is locale.h, but glibc <= 2.25 also had an xlocale.h file that
+# we should not use so we check the standard header first. MSVC has a
+# replacement defined in src/include/port/win32_port.h.
+if not cc.has_type('locale_t', prefix: '#include <locale.h>') and cc.has_type('locale_t', prefix: '#include <xlocale.h>')
cdata.set('LOCALE_T_IN_XLOCALE', 1)
-elif cc.get_id() == 'msvc'
- cdata.set('HAVE_LOCALE_T', 1)
endif
# Check if the C compiler understands typeof or a variant. Define
@@ -2489,13 +2484,6 @@ if cc.has_function('syslog', args: test_c_args) and \
endif
-# MSVC has replacements defined in src/include/port/win32_port.h.
-if cc.get_id() == 'msvc'
- cdata.set('HAVE_WCSTOMBS_L', 1)
- cdata.set('HAVE_MBSTOWCS_L', 1)
-endif
-
-
# if prerequisites for unnamed posix semas aren't fulfilled, fall back to sysv
# semaphores
if sema_kind == 'unnamed_posix' and \
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index efb8b4d289..cc239b4d14 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -534,7 +534,7 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
/* will we use "locale -a" in pg_import_system_collations? */
-#if defined(HAVE_LOCALE_T) && !defined(WIN32)
+#if !defined(WIN32)
#define READ_LOCALE_A_OUTPUT
#endif
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 00ce735fdd..31e6300b5d 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -44,8 +44,7 @@
* the platform's wchar_t representation matches what we do in pg_wchar
* conversions.
*
- * 3. Other collations are only supported on platforms that HAVE_LOCALE_T.
- * Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
+ * 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
* functions, under exactly the same cases as #2.
*
* There is one notable difference between cases 2 and 3: in the "default"
@@ -252,11 +251,6 @@ pg_set_regex_collation(Oid collation)
}
else
{
- /*
- * NB: pg_newlocale_from_collation will fail if not HAVE_LOCALE_T; the
- * case of pg_regex_locale != 0 but not HAVE_LOCALE_T does not have to
- * be considered below.
- */
pg_regex_locale = pg_newlocale_from_collation(collation);
if (!pg_locale_deterministic(pg_regex_locale))
@@ -304,16 +298,12 @@ pg_wc_isdigit(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isdigit((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -340,16 +330,12 @@ pg_wc_isalpha(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isalpha((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -376,16 +362,12 @@ pg_wc_isalnum(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isalnum((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -421,16 +403,12 @@ pg_wc_isupper(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isupper((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -457,16 +435,12 @@ pg_wc_islower(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
islower((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
islower_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -493,16 +467,12 @@ pg_wc_isgraph(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isgraph((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -529,16 +499,12 @@ pg_wc_isprint(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isprint((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -565,16 +531,12 @@ pg_wc_ispunct(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
ispunct((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -601,16 +563,12 @@ pg_wc_isspace(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isspace((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -645,16 +603,12 @@ pg_wc_toupper(pg_wchar c)
return toupper((unsigned char) c);
return c;
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
if (c <= (pg_wchar) UCHAR_MAX)
return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-#endif
return c;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -689,16 +643,12 @@ pg_wc_tolower(pg_wchar c)
return tolower((unsigned char) c);
return c;
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
if (c <= (pg_wchar) UCHAR_MAX)
return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-#endif
return c;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index e6246dc44b..e27ea8ef97 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1613,12 +1613,6 @@ u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
* in multibyte character sets. Note that in either case we are effectively
* assuming that the database character encoding matches the encoding implied
* by LC_CTYPE.
- *
- * If the system provides locale_t and associated functions (which are
- * standardized by Open Group's XBD), we can support collations that are
- * neither default nor C. The code is written to handle both combinations
- * of have-wide-characters and have-locale_t, though it's rather unlikely
- * a platform would have the latter without the former.
*/
/*
@@ -1696,11 +1690,9 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
else
-#endif
workspace[curr_char] = towlower(workspace[curr_char]);
}
@@ -1729,11 +1721,9 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
*p = tolower_l((unsigned char) *p, mylocale->info.lt);
else
-#endif
*p = pg_tolower((unsigned char) *p);
}
}
@@ -1818,11 +1808,9 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
else
-#endif
workspace[curr_char] = towupper(workspace[curr_char]);
}
@@ -1851,11 +1839,9 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
*p = toupper_l((unsigned char) *p, mylocale->info.lt);
else
-#endif
*p = pg_toupper((unsigned char) *p);
}
}
@@ -1941,7 +1927,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
{
if (wasalnum)
@@ -1951,7 +1936,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
}
else
-#endif
{
if (wasalnum)
workspace[curr_char] = towlower(workspace[curr_char]);
@@ -1986,7 +1970,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
{
if (wasalnum)
@@ -1996,7 +1979,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
}
else
-#endif
{
if (wasalnum)
*p = pg_tolower((unsigned char) *p);
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 33a2f46aab..62902caefa 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -95,10 +95,8 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
{
if (locale_is_c)
return pg_ascii_tolower(c);
-#ifdef HAVE_LOCALE_T
else if (locale)
return tolower_l(c, locale->info.lt);
-#endif
else
return pg_tolower(c);
}
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 9b603d42f3..34e1b709ae 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1509,10 +1509,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
else if (locale && locale->provider == COLLPROVIDER_ICU)
return IS_HIGHBIT_SET(c) ||
(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-#ifdef HAVE_LOCALE_T
else if (locale && locale->provider == COLLPROVIDER_LIBC)
return isalpha_l((unsigned char) c, locale->info.lt);
-#endif
else
return isalpha((unsigned char) c);
}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c8b36f3af2..83c5b960a7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1420,7 +1420,6 @@ make_icu_collator(const char *iculocstr,
/* simple subroutine for reporting errors from newlocale() */
-#ifdef HAVE_LOCALE_T
static void
report_newlocale_failure(const char *localename)
{
@@ -1449,7 +1448,6 @@ report_newlocale_failure(const char *localename)
errdetail("The operating system could not find any locale data for the locale name \"%s\".",
localename) : 0)));
}
-#endif /* HAVE_LOCALE_T */
bool
pg_locale_deterministic(pg_locale_t locale)
@@ -1466,10 +1464,6 @@ pg_locale_deterministic(pg_locale_t locale)
* lifetime of the backend. Thus, do not free the result with freelocale().
*
* As a special optimization, the default/database collation returns 0.
- * Callers should then revert to the non-locale_t-enabled code path.
- * Also, callers should avoid calling this before going down a C/POSIX
- * fastpath, because such a fastpath should work even on platforms without
- * locale_t support in the C library.
*
* For simplicity, we always generate COLLATE + CTYPE even though we
* might only need one of them. Since this is called only once per session,
@@ -1515,7 +1509,6 @@ pg_newlocale_from_collation(Oid collid)
if (collform->collprovider == COLLPROVIDER_LIBC)
{
-#ifdef HAVE_LOCALE_T
const char *collcollate;
const char *collctype pg_attribute_unused();
locale_t loc;
@@ -1566,12 +1559,6 @@ pg_newlocale_from_collation(Oid collid)
}
result.info.lt = loc;
-#else /* not HAVE_LOCALE_T */
- /* platform that doesn't support locale_t */
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("collation provider LIBC is not supported on this platform")));
-#endif /* not HAVE_LOCALE_T */
}
else if (collform->collprovider == COLLPROVIDER_ICU)
{
@@ -1788,11 +1775,9 @@ pg_strncoll_libc_win32_utf8(const char *arg1, size_t len1, const char *arg2,
((LPWSTR) a2p)[r] = 0;
errno = 0;
-#ifdef HAVE_LOCALE_T
if (locale)
result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
else
-#endif
result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
if (result == 2147483647) /* _NLSCMPERROR; missing from mingw headers */
ereport(ERROR,
@@ -1831,14 +1816,7 @@ pg_strcoll_libc(const char *arg1, const char *arg2, pg_locale_t locale)
else
#endif /* WIN32 */
if (locale)
- {
-#ifdef HAVE_LOCALE_T
result = strcoll_l(arg1, arg2, locale->info.lt);
-#else
- /* shouldn't happen */
- elog(ERROR, "unsupported collprovider: %c", locale->provider);
-#endif
- }
else
result = strcoll(arg1, arg2);
@@ -2065,11 +2043,9 @@ pg_strxfrm_libc(char *dest, const char *src, size_t destsize,
Assert(!locale || locale->provider == COLLPROVIDER_LIBC);
#ifdef TRUST_STRXFRM
-#ifdef HAVE_LOCALE_T
if (locale)
return strxfrm_l(dest, src, destsize, locale->info.lt);
else
-#endif
return strxfrm(dest, src, destsize);
#else
/* shouldn't happen */
@@ -2955,7 +2931,6 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
}
else
{
-#ifdef HAVE_LOCALE_T
#ifdef HAVE_WCSTOMBS_L
/* Use wcstombs_l for nondefault locales */
result = wcstombs_l(to, from, tolen, locale->info.lt);
@@ -2967,11 +2942,6 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
uselocale(save_locale);
#endif /* HAVE_WCSTOMBS_L */
-#else /* !HAVE_LOCALE_T */
- /* Can't have locale != 0 without HAVE_LOCALE_T */
- elog(ERROR, "wcstombs_l is not available");
- result = 0; /* keep compiler quiet */
-#endif /* HAVE_LOCALE_T */
}
return result;
@@ -3032,7 +3002,6 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
}
else
{
-#ifdef HAVE_LOCALE_T
#ifdef HAVE_MBSTOWCS_L
/* Use mbstowcs_l for nondefault locales */
result = mbstowcs_l(to, str, tolen, locale->info.lt);
@@ -3044,11 +3013,6 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
uselocale(save_locale);
#endif /* HAVE_MBSTOWCS_L */
-#else /* !HAVE_LOCALE_T */
- /* Can't have locale != 0 without HAVE_LOCALE_T */
- elog(ERROR, "mbstowcs_l is not available");
- result = 0; /* keep compiler quiet */
-#endif /* HAVE_LOCALE_T */
}
pfree(str);
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index ca3a49c552..d03f6e8de8 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -298,9 +298,6 @@
/* Define to 1 if you have the `zstd' library (-lzstd). */
#undef HAVE_LIBZSTD
-/* Define to 1 if the system has the type `locale_t'. */
-#undef HAVE_LOCALE_T
-
/* Define to 1 if `long int' works and is 64 bits. */
#undef HAVE_LONG_INT_64
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index b957d5c598..dee85fa3b2 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -458,6 +458,9 @@ extern int _pglstat64(const char *name, struct stat *buf);
#define wcstombs_l _wcstombs_l
#define mbstowcs_l _mbstowcs_l
+#define HAVE_WCSTOMBS_L 1
+#define HAVE_MBSTOWCS_L 1
+
/*
* Versions of libintl >= 0.18? try to replace setlocale() with a macro
* to their own versions. Remove the macro, if it exists, because it
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e2a7243542..6447bea8e0 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -67,9 +67,7 @@ extern void cache_locale_time(void);
/*
- * We define our own wrapper around locale_t so we can keep the same
- * function signatures for all builds, while not having to create a
- * fake version of the standard type locale_t in the global namespace.
+ * We use a discriminated union to hold either a locale_t or an ICU collator.
* pg_locale_t is occasionally checked for truth, so make it a pointer.
*/
struct pg_locale_struct
@@ -78,9 +76,7 @@ struct pg_locale_struct
bool deterministic;
union
{
-#ifdef HAVE_LOCALE_T
locale_t lt;
-#endif
#ifdef USE_ICU
struct
{
@@ -88,7 +84,6 @@ struct pg_locale_struct
UCollator *ucol;
} icu;
#endif
- int dummy; /* in case we have neither LOCALE_T nor ICU */
} info;
};
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index e6d8f9fedc..b6fac0749c 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -296,11 +296,10 @@ sub GenerateFiles
HAVE_LIBXSLT => undef,
HAVE_LIBZ => $self->{options}->{zlib} ? 1 : undef,
HAVE_LIBZSTD => undef,
- HAVE_LOCALE_T => 1,
HAVE_LONG_INT_64 => undef,
HAVE_LONG_LONG_INT_64 => 1,
HAVE_MBARRIER_H => undef,
- HAVE_MBSTOWCS_L => 1,
+ HAVE_MBSTOWCS_L => undef, # defined in win32_port.h
HAVE_MEMORY_H => 1,
HAVE_MEMSET_S => undef,
HAVE_MKDTEMP => undef,
@@ -369,7 +368,7 @@ sub GenerateFiles
HAVE_UUID_OSSP => undef,
HAVE_UUID_H => undef,
HAVE_UUID_UUID_H => undef,
- HAVE_WCSTOMBS_L => 1,
+ HAVE_WCSTOMBS_L => undef, # defined in win32_port.h
HAVE_VISIBILITY_ATTRIBUTE => undef,
HAVE_X509_GET_SIGNATURE_INFO => undef,
HAVE_X86_64_POPCNTQ => undef,
--
2.39.2
On Wed, Jul 5, 2023 at 10:15 AM Thomas Munro <thomas.munro@gmail.com> wrote:
That'll teach me to be impatient. I only waited for compiling to
finish after triggering the optional extra MinGW task before sending
the above email, figuring that the only risk was there, but then the
pg_upgrade task failed due to mismatched locales. Apparently there is
something I don't understand yet about locale_t support under MinGW.
On 05.07.23 00:15, Thomas Munro wrote:
On Tue, Jul 4, 2023 at 2:52 AM Tristan Partin <tristan@neon.tech> wrote:
The patch looks good to me as well. Happy to rebase my other patch on
this one.Thanks. Here is a slightly tidier version. It passes on CI[1]
including the optional extra MinGW64/Meson task, and the
MinGW64/autoconf configure+build that is in the SanityCheck task.
There are two questions I'm hoping to get feedback on: (1) I believe
that defining HAVE_MBSTOWCS_L etc in win32_port.h is the best idea
because that is also where we define mbstowcs_l() etc. Does that make
sense? (2) IIRC, ye olde Solution.pm system might break if I were to
completely remove HAVE_MBSTOWCS_L and HAVE_WCSTOMBS_L from Solution.pm
(there must be a check somewhere that compares it with pg_config.h.in
or something like that), but it would also break if I defined them as
1 there (macro redefinition).
I think the correct solution is to set HAVE_MBSTOWCS_L in Solution.pm.
Compare HAVE_FSEEKO, which is set in Solution.pm with fseeko being
defined in win32_port.h.
On Fri, Jul 7, 2023 at 4:20 AM Peter Eisentraut <peter@eisentraut.org> wrote:
I think the correct solution is to set HAVE_MBSTOWCS_L in Solution.pm.
Compare HAVE_FSEEKO, which is set in Solution.pm with fseeko being
defined in win32_port.h.
In this version I have it in there, but set it to undef. This way
configure (MinGW), meson (MinGW or MSVC), and Solution.pm all agree.
This version passes on CI (not quite sure what I screwed up before).
To restate the problem I am solving with this Solution.pm change +
associated changes in the C: configure+MinGW disabled the libc
provider completely before. With this patch that is fixed, and that
forced me to address the inconsistency (because if you have the libc
provider but no _l functions, you hit uselocale() code paths that
Windows can't do). It was a bug, really, but I don't plan to
back-patch anything (nobody really uses configure+MinGW builds AFAIK,
they exist purely as a developer [in]convenience). But that explains
why I needed to make a change.
Thinking about how to bring this all into "normal" form -- where
HAVE_XXX means "system defines XXX", not "system defines XXX || we
define a replacement" -- leads to the attached. Do you like this one?
Attachments:
v3-0001-All-supported-systems-have-locale_t.patchtext/x-patch; charset=US-ASCII; name=v3-0001-All-supported-systems-have-locale_t.patchDownload
From a5decf23c6198eb469835b23e9f4ea7778df869c Mon Sep 17 00:00:00 2001
From: Thomas Munro <tmunro@postgresql.org>
Date: Tue, 4 Jul 2023 16:32:35 +1200
Subject: [PATCH v3] All supported systems have locale_t.
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems. For Windows, win32_port.h redirects to a partial
implementation called _locale_t. It's time to remove a lot of
compile-time tests for HAVE_LOCALE_T, and a few associated comments and
dead code branches.
Since configure + MinGW builds didn't detect locale_t but now we assume
that all systems have it, further inconsistencies among the 3 Windows build
systems were revealed. With this commit, we no longer define
HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but
we have logic to deal with that so that replacements are available where
appropriate.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
---
config/c-library.m4 | 10 +---
configure | 5 --
meson.build | 22 ++-----
src/backend/commands/collationcmds.c | 2 +-
src/backend/regex/regc_pg_locale.c | 52 +----------------
src/backend/utils/adt/formatting.c | 18 ------
src/backend/utils/adt/like.c | 2 -
src/backend/utils/adt/like_support.c | 2 -
src/backend/utils/adt/pg_locale.c | 86 +++++++++++-----------------
src/include/pg_config.h.in | 3 -
src/include/utils/pg_locale.h | 7 +--
src/tools/msvc/Solution.pm | 5 +-
12 files changed, 45 insertions(+), 169 deletions(-)
diff --git a/config/c-library.m4 b/config/c-library.m4
index c1dd804679..aa8223d2ef 100644
--- a/config/c-library.m4
+++ b/config/c-library.m4
@@ -86,9 +86,9 @@ AC_DEFUN([PGAC_STRUCT_SOCKADDR_SA_LEN],
# PGAC_TYPE_LOCALE_T
# ------------------
# Check for the locale_t type and find the right header file. macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
-# xlocale.h file that we should not use.
-#
+# needs xlocale.h; standard is locale.h, but glibc <= 2.25 also had an
+# xlocale.h file that we should not use, so we check the standard
+# header first.
AC_DEFUN([PGAC_TYPE_LOCALE_T],
[AC_CACHE_CHECK([for locale_t], pgac_cv_type_locale_t,
[AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
@@ -102,10 +102,6 @@ locale_t x;],
[])],
[pgac_cv_type_locale_t='yes (in xlocale.h)'],
[pgac_cv_type_locale_t=no])])])
-if test "$pgac_cv_type_locale_t" != no; then
- AC_DEFINE(HAVE_LOCALE_T, 1,
- [Define to 1 if the system has the type `locale_t'.])
-fi
if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
AC_DEFINE(LOCALE_T_IN_XLOCALE, 1,
[Define to 1 if `locale_t' requires <xlocale.h>.])
diff --git a/configure b/configure
index c4463cb17a..842ef855fc 100755
--- a/configure
+++ b/configure
@@ -15120,11 +15120,6 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_type_locale_t" >&5
$as_echo "$pgac_cv_type_locale_t" >&6; }
-if test "$pgac_cv_type_locale_t" != no; then
-
-$as_echo "#define HAVE_LOCALE_T 1" >>confdefs.h
-
-fi
if test "$pgac_cv_type_locale_t" = 'yes (in xlocale.h)'; then
$as_echo "#define LOCALE_T_IN_XLOCALE 1" >>confdefs.h
diff --git a/meson.build b/meson.build
index 62942ead09..3d4a85f8c6 100644
--- a/meson.build
+++ b/meson.build
@@ -2283,17 +2283,12 @@ else
cdata.set('STRERROR_R_INT', false)
endif
-# Check for the locale_t type and find the right header file. macOS
-# needs xlocale.h; standard is locale.h, but glibc also has an
-# xlocale.h file that we should not use. MSVC has a replacement
-# defined in src/include/port/win32_port.h.
-if cc.has_type('locale_t', prefix: '#include <locale.h>')
- cdata.set('HAVE_LOCALE_T', 1)
-elif cc.has_type('locale_t', prefix: '#include <xlocale.h>')
- cdata.set('HAVE_LOCALE_T', 1)
+# Find the right header file for the locale_t type. macOS needs xlocale.h;
+# standard is locale.h, but glibc <= 2.25 also had an xlocale.h file that
+# we should not use so we check the standard header first. MSVC has a
+# replacement defined in src/include/port/win32_port.h.
+if not cc.has_type('locale_t', prefix: '#include <locale.h>') and cc.has_type('locale_t', prefix: '#include <xlocale.h>')
cdata.set('LOCALE_T_IN_XLOCALE', 1)
-elif cc.get_id() == 'msvc'
- cdata.set('HAVE_LOCALE_T', 1)
endif
# Check if the C compiler understands typeof or a variant. Define
@@ -2489,13 +2484,6 @@ if cc.has_function('syslog', args: test_c_args) and \
endif
-# MSVC has replacements defined in src/include/port/win32_port.h.
-if cc.get_id() == 'msvc'
- cdata.set('HAVE_WCSTOMBS_L', 1)
- cdata.set('HAVE_MBSTOWCS_L', 1)
-endif
-
-
# if prerequisites for unnamed posix semas aren't fulfilled, fall back to sysv
# semaphores
if sema_kind == 'unnamed_posix' and \
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index efb8b4d289..cc239b4d14 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -534,7 +534,7 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
/* will we use "locale -a" in pg_import_system_collations? */
-#if defined(HAVE_LOCALE_T) && !defined(WIN32)
+#if !defined(WIN32)
#define READ_LOCALE_A_OUTPUT
#endif
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 00ce735fdd..31e6300b5d 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -44,8 +44,7 @@
* the platform's wchar_t representation matches what we do in pg_wchar
* conversions.
*
- * 3. Other collations are only supported on platforms that HAVE_LOCALE_T.
- * Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
+ * 3. Here, we use the locale_t-extended forms of the <wctype.h> and <ctype.h>
* functions, under exactly the same cases as #2.
*
* There is one notable difference between cases 2 and 3: in the "default"
@@ -252,11 +251,6 @@ pg_set_regex_collation(Oid collation)
}
else
{
- /*
- * NB: pg_newlocale_from_collation will fail if not HAVE_LOCALE_T; the
- * case of pg_regex_locale != 0 but not HAVE_LOCALE_T does not have to
- * be considered below.
- */
pg_regex_locale = pg_newlocale_from_collation(collation);
if (!pg_locale_deterministic(pg_regex_locale))
@@ -304,16 +298,12 @@ pg_wc_isdigit(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isdigit((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswdigit_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isdigit_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -340,16 +330,12 @@ pg_wc_isalpha(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isalpha((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswalpha_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isalpha_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -376,16 +362,12 @@ pg_wc_isalnum(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isalnum((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswalnum_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isalnum_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -421,16 +403,12 @@ pg_wc_isupper(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isupper((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswupper_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isupper_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -457,16 +435,12 @@ pg_wc_islower(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
islower((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswlower_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
islower_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -493,16 +467,12 @@ pg_wc_isgraph(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isgraph((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswgraph_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isgraph_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -529,16 +499,12 @@ pg_wc_isprint(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isprint((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswprint_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isprint_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -565,16 +531,12 @@ pg_wc_ispunct(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
ispunct((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswpunct_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
ispunct_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -601,16 +563,12 @@ pg_wc_isspace(pg_wchar c)
return (c <= (pg_wchar) UCHAR_MAX &&
isspace((unsigned char) c));
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return iswspace_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
return (c <= (pg_wchar) UCHAR_MAX &&
isspace_l((unsigned char) c, pg_regex_locale->info.lt));
-#endif
break;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -645,16 +603,12 @@ pg_wc_toupper(pg_wchar c)
return toupper((unsigned char) c);
return c;
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return towupper_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
if (c <= (pg_wchar) UCHAR_MAX)
return toupper_l((unsigned char) c, pg_regex_locale->info.lt);
-#endif
return c;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
@@ -689,16 +643,12 @@ pg_wc_tolower(pg_wchar c)
return tolower((unsigned char) c);
return c;
case PG_REGEX_LOCALE_WIDE_L:
-#ifdef HAVE_LOCALE_T
if (sizeof(wchar_t) >= 4 || c <= (pg_wchar) 0xFFFF)
return towlower_l((wint_t) c, pg_regex_locale->info.lt);
-#endif
/* FALL THRU */
case PG_REGEX_LOCALE_1BYTE_L:
-#ifdef HAVE_LOCALE_T
if (c <= (pg_wchar) UCHAR_MAX)
return tolower_l((unsigned char) c, pg_regex_locale->info.lt);
-#endif
return c;
case PG_REGEX_LOCALE_ICU:
#ifdef USE_ICU
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index e6246dc44b..e27ea8ef97 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1613,12 +1613,6 @@ u_strToTitle_default_BI(UChar *dest, int32_t destCapacity,
* in multibyte character sets. Note that in either case we are effectively
* assuming that the database character encoding matches the encoding implied
* by LC_CTYPE.
- *
- * If the system provides locale_t and associated functions (which are
- * standardized by Open Group's XBD), we can support collations that are
- * neither default nor C. The code is written to handle both combinations
- * of have-wide-characters and have-locale_t, though it's rather unlikely
- * a platform would have the latter without the former.
*/
/*
@@ -1696,11 +1690,9 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
workspace[curr_char] = towlower_l(workspace[curr_char], mylocale->info.lt);
else
-#endif
workspace[curr_char] = towlower(workspace[curr_char]);
}
@@ -1729,11 +1721,9 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
*p = tolower_l((unsigned char) *p, mylocale->info.lt);
else
-#endif
*p = pg_tolower((unsigned char) *p);
}
}
@@ -1818,11 +1808,9 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
workspace[curr_char] = towupper_l(workspace[curr_char], mylocale->info.lt);
else
-#endif
workspace[curr_char] = towupper(workspace[curr_char]);
}
@@ -1851,11 +1839,9 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
*p = toupper_l((unsigned char) *p, mylocale->info.lt);
else
-#endif
*p = pg_toupper((unsigned char) *p);
}
}
@@ -1941,7 +1927,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
for (curr_char = 0; workspace[curr_char] != 0; curr_char++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
{
if (wasalnum)
@@ -1951,7 +1936,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
wasalnum = iswalnum_l(workspace[curr_char], mylocale->info.lt);
}
else
-#endif
{
if (wasalnum)
workspace[curr_char] = towlower(workspace[curr_char]);
@@ -1986,7 +1970,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
*/
for (p = result; *p; p++)
{
-#ifdef HAVE_LOCALE_T
if (mylocale)
{
if (wasalnum)
@@ -1996,7 +1979,6 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
wasalnum = isalnum_l((unsigned char) *p, mylocale->info.lt);
}
else
-#endif
{
if (wasalnum)
*p = pg_tolower((unsigned char) *p);
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 33a2f46aab..62902caefa 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -95,10 +95,8 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
{
if (locale_is_c)
return pg_ascii_tolower(c);
-#ifdef HAVE_LOCALE_T
else if (locale)
return tolower_l(c, locale->info.lt);
-#endif
else
return pg_tolower(c);
}
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 9b603d42f3..34e1b709ae 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1509,10 +1509,8 @@ pattern_char_isalpha(char c, bool is_multibyte,
else if (locale && locale->provider == COLLPROVIDER_ICU)
return IS_HIGHBIT_SET(c) ||
(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-#ifdef HAVE_LOCALE_T
else if (locale && locale->provider == COLLPROVIDER_LIBC)
return isalpha_l((unsigned char) c, locale->info.lt);
-#endif
else
return isalpha((unsigned char) c);
}
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c8b36f3af2..0eb764e897 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -154,6 +154,38 @@ static void icu_set_collation_attributes(UCollator *collator, const char *loc,
UErrorCode *status);
#endif
+#ifndef WIN32
+/*
+ * POSIX doesn't define _l-variants of these functions, but several systems
+ * have them. We provide our own replacements here. For Windows, we have
+ * macros in win32_port.h.
+ */
+#ifndef HAVE_MBSTOWCS_L
+static size_t
+mbstowcs_l(wchar_t *dest, const char *src, size_t n, locale_t loc)
+{
+ size_t result;
+ locale_t save_locale = uselocale(loc);
+
+ result = mbstowcs(dest, src, n);
+ uselocale(save_locale);
+ return result;
+}
+#endif
+#ifndef HAVE_WCSTOMBS_L
+static size_t
+wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
+{
+ size_t result;
+ locale_t save_locale = uselocale(loc);
+
+ result = wcstombs(dest, src, n);
+ uselocale(save_locale);
+ return result;
+}
+#endif
+#endif
+
/*
* pg_perm_setlocale
*
@@ -1420,7 +1452,6 @@ make_icu_collator(const char *iculocstr,
/* simple subroutine for reporting errors from newlocale() */
-#ifdef HAVE_LOCALE_T
static void
report_newlocale_failure(const char *localename)
{
@@ -1449,7 +1480,6 @@ report_newlocale_failure(const char *localename)
errdetail("The operating system could not find any locale data for the locale name \"%s\".",
localename) : 0)));
}
-#endif /* HAVE_LOCALE_T */
bool
pg_locale_deterministic(pg_locale_t locale)
@@ -1466,10 +1496,6 @@ pg_locale_deterministic(pg_locale_t locale)
* lifetime of the backend. Thus, do not free the result with freelocale().
*
* As a special optimization, the default/database collation returns 0.
- * Callers should then revert to the non-locale_t-enabled code path.
- * Also, callers should avoid calling this before going down a C/POSIX
- * fastpath, because such a fastpath should work even on platforms without
- * locale_t support in the C library.
*
* For simplicity, we always generate COLLATE + CTYPE even though we
* might only need one of them. Since this is called only once per session,
@@ -1515,7 +1541,6 @@ pg_newlocale_from_collation(Oid collid)
if (collform->collprovider == COLLPROVIDER_LIBC)
{
-#ifdef HAVE_LOCALE_T
const char *collcollate;
const char *collctype pg_attribute_unused();
locale_t loc;
@@ -1566,12 +1591,6 @@ pg_newlocale_from_collation(Oid collid)
}
result.info.lt = loc;
-#else /* not HAVE_LOCALE_T */
- /* platform that doesn't support locale_t */
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("collation provider LIBC is not supported on this platform")));
-#endif /* not HAVE_LOCALE_T */
}
else if (collform->collprovider == COLLPROVIDER_ICU)
{
@@ -1788,11 +1807,9 @@ pg_strncoll_libc_win32_utf8(const char *arg1, size_t len1, const char *arg2,
((LPWSTR) a2p)[r] = 0;
errno = 0;
-#ifdef HAVE_LOCALE_T
if (locale)
result = wcscoll_l((LPWSTR) a1p, (LPWSTR) a2p, locale->info.lt);
else
-#endif
result = wcscoll((LPWSTR) a1p, (LPWSTR) a2p);
if (result == 2147483647) /* _NLSCMPERROR; missing from mingw headers */
ereport(ERROR,
@@ -1831,14 +1848,7 @@ pg_strcoll_libc(const char *arg1, const char *arg2, pg_locale_t locale)
else
#endif /* WIN32 */
if (locale)
- {
-#ifdef HAVE_LOCALE_T
result = strcoll_l(arg1, arg2, locale->info.lt);
-#else
- /* shouldn't happen */
- elog(ERROR, "unsupported collprovider: %c", locale->provider);
-#endif
- }
else
result = strcoll(arg1, arg2);
@@ -2065,11 +2075,9 @@ pg_strxfrm_libc(char *dest, const char *src, size_t destsize,
Assert(!locale || locale->provider == COLLPROVIDER_LIBC);
#ifdef TRUST_STRXFRM
-#ifdef HAVE_LOCALE_T
if (locale)
return strxfrm_l(dest, src, destsize, locale->info.lt);
else
-#endif
return strxfrm(dest, src, destsize);
#else
/* shouldn't happen */
@@ -2955,23 +2963,8 @@ wchar2char(char *to, const wchar_t *from, size_t tolen, pg_locale_t locale)
}
else
{
-#ifdef HAVE_LOCALE_T
-#ifdef HAVE_WCSTOMBS_L
/* Use wcstombs_l for nondefault locales */
result = wcstombs_l(to, from, tolen, locale->info.lt);
-#else /* !HAVE_WCSTOMBS_L */
- /* We have to temporarily set the locale as current ... ugh */
- locale_t save_locale = uselocale(locale->info.lt);
-
- result = wcstombs(to, from, tolen);
-
- uselocale(save_locale);
-#endif /* HAVE_WCSTOMBS_L */
-#else /* !HAVE_LOCALE_T */
- /* Can't have locale != 0 without HAVE_LOCALE_T */
- elog(ERROR, "wcstombs_l is not available");
- result = 0; /* keep compiler quiet */
-#endif /* HAVE_LOCALE_T */
}
return result;
@@ -3032,23 +3025,8 @@ char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen,
}
else
{
-#ifdef HAVE_LOCALE_T
-#ifdef HAVE_MBSTOWCS_L
/* Use mbstowcs_l for nondefault locales */
result = mbstowcs_l(to, str, tolen, locale->info.lt);
-#else /* !HAVE_MBSTOWCS_L */
- /* We have to temporarily set the locale as current ... ugh */
- locale_t save_locale = uselocale(locale->info.lt);
-
- result = mbstowcs(to, str, tolen);
-
- uselocale(save_locale);
-#endif /* HAVE_MBSTOWCS_L */
-#else /* !HAVE_LOCALE_T */
- /* Can't have locale != 0 without HAVE_LOCALE_T */
- elog(ERROR, "mbstowcs_l is not available");
- result = 0; /* keep compiler quiet */
-#endif /* HAVE_LOCALE_T */
}
pfree(str);
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index ca3a49c552..d03f6e8de8 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -298,9 +298,6 @@
/* Define to 1 if you have the `zstd' library (-lzstd). */
#undef HAVE_LIBZSTD
-/* Define to 1 if the system has the type `locale_t'. */
-#undef HAVE_LOCALE_T
-
/* Define to 1 if `long int' works and is 64 bits. */
#undef HAVE_LONG_INT_64
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index e2a7243542..6447bea8e0 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -67,9 +67,7 @@ extern void cache_locale_time(void);
/*
- * We define our own wrapper around locale_t so we can keep the same
- * function signatures for all builds, while not having to create a
- * fake version of the standard type locale_t in the global namespace.
+ * We use a discriminated union to hold either a locale_t or an ICU collator.
* pg_locale_t is occasionally checked for truth, so make it a pointer.
*/
struct pg_locale_struct
@@ -78,9 +76,7 @@ struct pg_locale_struct
bool deterministic;
union
{
-#ifdef HAVE_LOCALE_T
locale_t lt;
-#endif
#ifdef USE_ICU
struct
{
@@ -88,7 +84,6 @@ struct pg_locale_struct
UCollator *ucol;
} icu;
#endif
- int dummy; /* in case we have neither LOCALE_T nor ICU */
} info;
};
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 133acc09f2..f09b3826ff 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -296,11 +296,10 @@ sub GenerateFiles
HAVE_LIBXSLT => undef,
HAVE_LIBZ => $self->{options}->{zlib} ? 1 : undef,
HAVE_LIBZSTD => undef,
- HAVE_LOCALE_T => 1,
HAVE_LONG_INT_64 => undef,
HAVE_LONG_LONG_INT_64 => 1,
HAVE_MBARRIER_H => undef,
- HAVE_MBSTOWCS_L => 1,
+ HAVE_MBSTOWCS_L => undef,
HAVE_MEMORY_H => 1,
HAVE_MEMSET_S => undef,
HAVE_MKDTEMP => undef,
@@ -369,7 +368,7 @@ sub GenerateFiles
HAVE_UUID_OSSP => undef,
HAVE_UUID_H => undef,
HAVE_UUID_UUID_H => undef,
- HAVE_WCSTOMBS_L => 1,
+ HAVE_WCSTOMBS_L => undef,
HAVE_VISIBILITY_ATTRIBUTE => undef,
HAVE_X509_GET_SIGNATURE_INFO => undef,
HAVE_X86_64_POPCNTQ => undef,
--
2.39.2
On Fri Jul 7, 2023 at 3:30 PM CDT, Thomas Munro wrote:
On Fri, Jul 7, 2023 at 4:20 AM Peter Eisentraut <peter@eisentraut.org> wrote:
I think the correct solution is to set HAVE_MBSTOWCS_L in Solution.pm.
Compare HAVE_FSEEKO, which is set in Solution.pm with fseeko being
defined in win32_port.h.In this version I have it in there, but set it to undef. This way
configure (MinGW), meson (MinGW or MSVC), and Solution.pm all agree.
This version passes on CI (not quite sure what I screwed up before).To restate the problem I am solving with this Solution.pm change +
associated changes in the C: configure+MinGW disabled the libc
provider completely before. With this patch that is fixed, and that
forced me to address the inconsistency (because if you have the libc
provider but no _l functions, you hit uselocale() code paths that
Windows can't do). It was a bug, really, but I don't plan to
back-patch anything (nobody really uses configure+MinGW builds AFAIK,
they exist purely as a developer [in]convenience). But that explains
why I needed to make a change.Thinking about how to bring this all into "normal" form -- where
HAVE_XXX means "system defines XXX", not "system defines XXX || we
define a replacement" -- leads to the attached. Do you like this one?
Should you wrap the two _l function replacements in HAVE_USELOCALE
instead of WIN32?
+if not cc.has_type('locale_t', prefix: '#include <locale.h>') and cc.has_type('locale_t', prefix: '#include <xlocale.h>')
I wouldn't mind a line break after the 'and'.
Other than these comments, the patch looks fine to me.
--
Tristan Partin
Neon (https://neon.tech)
On Sat, Jul 8, 2023 at 8:52 AM Tristan Partin <tristan@neon.tech> wrote:
Should you wrap the two _l function replacements in HAVE_USELOCALE
instead of WIN32?
I find that more confusing, and I'm also not sure if HAVE_USELOCALE is
even going to survive (based on your nearby thread). I mean, by the
usual criteria that we applied to a lot of other system library
functions in the 16 cycle, I'd drop it. It's in the standard and all
relevant systems have it except Windows which we have to handle with
special pain-in-the-neck logic anyway.
+if not cc.has_type('locale_t', prefix: '#include <locale.h>') and cc.has_type('locale_t', prefix: '#include <xlocale.h>')
I wouldn't mind a line break after the 'and'.
Ah, right, I am still learning what is allowed in this syntax... will do.
Other than these comments, the patch looks fine to me.
Thanks. I will wait a bit to see if Peter has any other comments and
then push this. I haven't actually tested with Solution.pm due to
lack of CI for that, but fingers crossed, since the build systems will
now agree, reducing the screw-up surface.
On Sat, Jul 8, 2023 at 10:13 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Thanks. I will wait a bit to see if Peter has any other comments and
then push this. I haven't actually tested with Solution.pm due to
lack of CI for that, but fingers crossed, since the build systems will
now agree, reducing the screw-up surface.
Done. Let's see what the build farm thinks...
On 07.07.23 22:30, Thomas Munro wrote:
Thinking about how to bring this all into "normal" form -- where
HAVE_XXX means "system defines XXX", not "system defines XXX || we
define a replacement"
HAVE_XXX means "code can use XXX", doesn't matter how it got there (it
could also be a libpgport replacement).
So I don't think this code is correct. AFAICT, there is nothing right
now that can possibly define HAVE_MBSTOWCS_L on Windows/MSVC. Was that
the intention?
I think these changes would need to be reverted to make this correct:
-# MSVC has replacements defined in src/include/port/win32_port.h.
-if cc.get_id() == 'msvc'
- cdata.set('HAVE_WCSTOMBS_L', 1)
- cdata.set('HAVE_MBSTOWCS_L', 1)
-endif
- HAVE_MBSTOWCS_L => 1,
+ HAVE_MBSTOWCS_L => undef,
- HAVE_WCSTOMBS_L => 1,
+ HAVE_WCSTOMBS_L => undef,
On Sun, Jul 9, 2023 at 6:20 PM Peter Eisentraut <peter@eisentraut.org> wrote:
So I don't think this code is correct. AFAICT, there is nothing right
now that can possibly define HAVE_MBSTOWCS_L on Windows/MSVC. Was that
the intention?
Yes, that was my intention. Windows actually doesn't have them. The
autoconf/MinGW test result was telling the truth. Somehow I had to
make the three build systems agree on this. Either by strong-arming
all three of them to emit a hard-coded claim that it does, or by
removing the test that produces a different answer in different build
systems. I will happily do it the other way if you insist, which
would involve restoring the meson.build and Solultion.pm kludges you
quoted, but I'd also have to add a compatible kludge to configure.ac.
It doesn't seem like an improvement to me but I don't feel strongly
about it. In the end, Solution.pm and configure.ac will be vaporised
by lasers, so we'll be left with 0 or 1 special cases. I don't care
much, but I like 0, it's nice and round.
On Sun, Jul 9, 2023 at 6:35 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sun, Jul 9, 2023 at 6:20 PM Peter Eisentraut <peter@eisentraut.org> wrote:
So I don't think this code is correct. AFAICT, there is nothing right
now that can possibly define HAVE_MBSTOWCS_L on Windows/MSVC. Was that
the intention?Yes, that was my intention. Windows actually doesn't have them.
Thinking about that some more... Its _XXX implementations don't deal
with UTF-8 the way Unix-based developers would expect, and are
therefore just portability hazards, aren't they? What would we gain
by restoring the advertisement that they are available? Perhaps we
should go the other way completely and remove the relevant #defines
from win32_port.h, and fully confine knowledge of them to pg_locale.c?
It knows how to deal with that. Here is a patch trying this idea
out, with as slightly longer explanation.
Attachments:
0001-Don-t-expose-Windows-mbstowcs_l-and-wcstombs_l.patchtext/x-patch; charset=US-ASCII; name=0001-Don-t-expose-Windows-mbstowcs_l-and-wcstombs_l.patchDownload
From 1c04d781195266b6bc88d8a5a584a64aac07f613 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Mon, 10 Jul 2023 13:41:27 +1200
Subject: [PATCH] Don't expose Windows' mbstowcs_l() and wcstombs_l().
Windows has similar functions with leading underscores, and we provided
the rename via a macro in win32_port.h. In fact its functions are not
always good replacements for the Unix functions, since they can't deal
with UTF-8. They are only currently used by pg_locale.c, which is
careful to redirect to other Windows routines for UTF-8. Given that
portability hazard, it seem unlikely to be a good idea to encourage any
other code to think of these functions as being available outside
pg_locale.c. Any code that thinks it wants these functions probably
wants our wchar2char() or char2wchar() routines instead, or it won't
actually work on Windows in UTF-8 databases.
Furthermore, some major libc implementations including glibc don't have
them (they only have the standard variants without _l), so external code
is very unlikely to require them to exist.
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 0eb764e897..61daa8190b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -154,36 +154,41 @@ static void icu_set_collation_attributes(UCollator *collator, const char *loc,
UErrorCode *status);
#endif
-#ifndef WIN32
/*
* POSIX doesn't define _l-variants of these functions, but several systems
- * have them. We provide our own replacements here. For Windows, we have
- * macros in win32_port.h.
+ * have them. We provide our own replacements here.
*/
#ifndef HAVE_MBSTOWCS_L
static size_t
mbstowcs_l(wchar_t *dest, const char *src, size_t n, locale_t loc)
{
+#ifdef WIN32
+ return _mbstowcs_l(dest, src, n, loc);
+#else
size_t result;
locale_t save_locale = uselocale(loc);
result = mbstowcs(dest, src, n);
uselocale(save_locale);
return result;
+#endif
}
#endif
#ifndef HAVE_WCSTOMBS_L
static size_t
wcstombs_l(char *dest, const wchar_t *src, size_t n, locale_t loc)
{
+#ifdef WIN32
+ return _wcstombs_l(dest, src, n, loc);
+#else
size_t result;
locale_t save_locale = uselocale(loc);
result = wcstombs(dest, src, n);
uselocale(save_locale);
return result;
-}
#endif
+}
#endif
/*
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index b957d5c598..0e6d608330 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -455,8 +455,6 @@ extern int _pglstat64(const char *name, struct stat *buf);
#define strcoll_l _strcoll_l
#define strxfrm_l _strxfrm_l
#define wcscoll_l _wcscoll_l
-#define wcstombs_l _wcstombs_l
-#define mbstowcs_l _mbstowcs_l
/*
* Versions of libintl >= 0.18? try to replace setlocale() with a macro
--
2.41.0
On 10.07.23 04:51, Thomas Munro wrote:
On Sun, Jul 9, 2023 at 6:35 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sun, Jul 9, 2023 at 6:20 PM Peter Eisentraut <peter@eisentraut.org> wrote:
So I don't think this code is correct. AFAICT, there is nothing right
now that can possibly define HAVE_MBSTOWCS_L on Windows/MSVC. Was that
the intention?Yes, that was my intention. Windows actually doesn't have them.
Thinking about that some more... Its _XXX implementations don't deal
with UTF-8 the way Unix-based developers would expect, and are
therefore just portability hazards, aren't they? What would we gain
by restoring the advertisement that they are available? Perhaps we
should go the other way completely and remove the relevant #defines
from win32_port.h, and fully confine knowledge of them to pg_locale.c?
It knows how to deal with that. Here is a patch trying this idea
out, with as slightly longer explanation.
This looks sensible to me.
If we ever need mbstowcs_l() etc. outside of pg_locale.c, then the
proper way would be to make a mbstowcs_l.c file in src/port/.
But I like your approach for now because it moves us more firmly into
the direction of having it contained in pg_locale.c, instead of having
some knowledge global and some local.
On Tue, Jul 11, 2023 at 2:28 AM Peter Eisentraut <peter@eisentraut.org> wrote:
This looks sensible to me.
If we ever need mbstowcs_l() etc. outside of pg_locale.c, then the
proper way would be to make a mbstowcs_l.c file in src/port/.But I like your approach for now because it moves us more firmly into
the direction of having it contained in pg_locale.c, instead of having
some knowledge global and some local.
Thanks. Pushed.
That leaves only one further cleanup opportunity from discussion this
thread, for future work: a TRUST_STRXFRM-ectomy.