check_strxfrm_bug()
Hi
While studying Jeff's new crop of collation patches I noticed in
passing that check_strxfrm_bug() must surely by now be unnecessary.
The buffer overrun bugs were fixed a decade ago, and the relevant
systems are way out of support. If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking. In any case, we also
completely disable this stuff because of bugs and quality problems in
every other known implementation, via TRUST_STRXFRM (or rather the
lack of it). So I think it's time to remove that function; please see
attached.
Just by the way, if you like slow motion domino runs, check this out:
* Original pgsql-bugs investigation into strxfrm() inconsistencies
/messages/by-id/111D0E27-A8F3-4A84-A4E0-B0FB703863DF@s24.com
* I happened to test that on bleeding-edge FreeBSD 11 (wasn't released
yet), because at that time FreeBSD was in the process of adopting
illumos's new collation code, and reported teething problems:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208266
* FreeBSD, DragonFly and illumos's trees were then partially fixed by
the authors, but our strcolltest.c still showed some remaining
problems in some locales (and it still does on my local FreeBSD
battlestation):
https://github.com/freebsd/freebsd-src/commit/c48dc2a193b9befceda8dfc6f894d73251cc00a4
https://www.illumos.org/rb/r/402/
* The authors traced the remaining problem to flaws in the Unicode
project's CLDR/POSIX data, and the report was accepted:
https://www.illumos.org/issues/7962
https://unicode-org.atlassian.net/browse/CLDR-10394
Eventually that'll be fixed, and (I guess) trigger at least a CLDR
minor version bump affecting all downstream consumers (ICU, ...).
Then... maybe... at least FreeBSD will finally pass that test. I do
wonder whether other consumer libraries are also confused by that
problem source data, and if not, why not; are glibc's problems related
or just random code or data quality problems in different areas? (I
also don't know why a problem in that data should affect strxfrm() and
strcoll() differently, but I don't plan to find out owing to an acute
shortage of round tuits).
But in the meantime, I still can't recommend turning on TRUST_STRXFRM
on any OS that I know of! The strcolltest.c program certainly still
finds fault with glibc 2.36 despite the last update on that redhat
bugzilla ticket that suggested that the big resync back in 2.28 was
going to fix it.
To be fair, macOS does actually pass that test for all locales, but
the strxfrm() result is too narrow to be useful, according to comments
in our tree. I would guess that a couple of other OSes with the old
Berkeley locale code are similar.
Attachments:
0001-Remove-obsolete-defense-against-strxfrm-bugs.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-obsolete-defense-against-strxfrm-bugs.patchDownload+0-64
On Thu, Dec 15, 2022 at 3:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:
... If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking.
With my garbage collector hat on, that made me wonder if there was
some more potential cleanup here: could we require locale_t yet? The
last straggler systems on our target OS list to add the POSIX locale_t
stuff were Solaris 11.4 (2018) and OpenBSD 6.2 (2018). Apparently
it's still too soon: we have two EOL'd OSes in the farm that are older
than that. But here's an interesting fact about wrasse, assuming its
host is gcc211: it looks like it can't even apply further OS updates
because the hardware[1]https://cfarm.tetaneutral.net/machines/list/ is so old that Solaris doesn't support it
anymore[2]https://support.oracle.com/knowledge/Sun%20Microsystems/2382427_1.html.
[1]: https://cfarm.tetaneutral.net/machines/list/
[2]: https://support.oracle.com/knowledge/Sun%20Microsystems/2382427_1.html
On Sun, Dec 18, 2022 at 10:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
With my garbage collector hat on, that made me wonder if there was
some more potential cleanup here: could we require locale_t yet? The
last straggler systems on our target OS list to add the POSIX locale_t
stuff were Solaris 11.4 (2018) and OpenBSD 6.2 (2018). Apparently
it's still too soon: we have two EOL'd OSes in the farm that are older
than that. But here's an interesting fact about wrasse, assuming its
host is gcc211: it looks like it can't even apply further OS updates
because the hardware[1] is so old that Solaris doesn't support it
anymore[2].
For the record, now the OpenBSD machines have been upgraded, so now
"wrasse" is the last relevant computer on earth with no POSIX
locale_t. Unfortunately there is no reason to think it's going to go
away soon, so I'm just noting this fact here as a reminder for when it
eventually does...
On Thu, Dec 15, 2022 at 03:22:59PM +1300, Thomas Munro wrote:
While studying Jeff's new crop of collation patches I noticed in
passing that check_strxfrm_bug() must surely by now be unnecessary.
The buffer overrun bugs were fixed a decade ago, and the relevant
systems are way out of support. If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking. In any case, we also
completely disable this stuff because of bugs and quality problems in
every other known implementation, via TRUST_STRXFRM (or rather the
lack of it). So I think it's time to remove that function; please see
attached.
Seems reasonable to me.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes:
On Thu, Dec 15, 2022 at 03:22:59PM +1300, Thomas Munro wrote:
While studying Jeff's new crop of collation patches I noticed in
passing that check_strxfrm_bug() must surely by now be unnecessary.
The buffer overrun bugs were fixed a decade ago, and the relevant
systems are way out of support. If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking. In any case, we also
completely disable this stuff because of bugs and quality problems in
every other known implementation, via TRUST_STRXFRM (or rather the
lack of it). So I think it's time to remove that function; please see
attached.
Seems reasonable to me.
+1. I wonder if we should go further and get rid of TRUST_STRXFRM
and the not-so-trivial amount of code around it (pg_strxfrm_enabled
etc). Carrying that indefinitely in the probably-vain hope that
the libraries will become trustworthy seems rather pointless.
Besides, if such a miracle does occur, we can dig the code out
of our git history.
regards, tom lane
On Mon, Apr 17, 2023 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
+1. I wonder if we should go further and get rid of TRUST_STRXFRM
and the not-so-trivial amount of code around it (pg_strxfrm_enabled
etc). Carrying that indefinitely in the probably-vain hope that
the libraries will become trustworthy seems rather pointless.
Besides, if such a miracle does occur, we can dig the code out
of our git history.
+1 for getting rid of TRUST_STRXFRM.
ICU-based collations (which aren't affected by TRUST_STRXFRM) are
becoming the de facto standard (possibly even the de jure standard).
So even if we thought that the situation with strxfrm() had improved,
we'd still have little motivation to do anything about it.
--
Peter Geoghegan
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
On Mon, Apr 17, 2023 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
+1. I wonder if we should go further and get rid of TRUST_STRXFRM
and the not-so-trivial amount of code around it (pg_strxfrm_enabled
etc). Carrying that indefinitely in the probably-vain hope that
the libraries will become trustworthy seems rather pointless.
Besides, if such a miracle does occur, we can dig the code out
of our git history.+1 for getting rid of TRUST_STRXFRM.
ICU-based collations (which aren't affected by TRUST_STRXFRM) are
becoming the de facto standard (possibly even the de jure standard).
So even if we thought that the situation with strxfrm() had improved,
we'd still have little motivation to do anything about it.
Makes sense to do some cleanup now as this is new in the tree.
Perhaps somebody from the RMT would like to comment?
FYI, Jeff has also posted patches to replace this CFLAGS with a GUC:
/messages/by-id/6ec4ad7f93f255dbb885da0a664d9c77ed4872c4.camel@j-davis.com
--
Michael
On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
+1 for getting rid of TRUST_STRXFRM.
+1
The situation is not improving fast, and requires hard work to follow
on each OS. Clearly, mainstreaming ICU is the way to go. libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.
On 4/18/23 9:19 PM, Thomas Munro wrote:
On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
+1 for getting rid of TRUST_STRXFRM.
+1
The situation is not improving fast, and requires hard work to follow
on each OS. Clearly, mainstreaming ICU is the way to go. libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.
[RMT hat, personal opinion on RMT]
To be clear, is the proposal to remove both "check_strxfrm_bug" and
"TRUST_STRXFRM"?
Given a bunch of folks who have expertise in this area of code all agree
with removing the above as part of the collation cleanups targeted for
v16, I'm inclined to agree. I don't really see the need for an explicit
RMT action, but based on the consensus this seems OK to add as an open item.
Thanks,
Jonathan
On 4/18/23 21:19, Thomas Munro wrote:
On Tue, Apr 18, 2023 at 11:52 AM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 17, 2023 at 03:40:14PM -0700, Peter Geoghegan wrote:
+1 for getting rid of TRUST_STRXFRM.
+1
The situation is not improving fast, and requires hard work to follow
on each OS. Clearly, mainstreaming ICU is the way to go. libc
support will always have niche uses, to be compatible with other
software on the box, but trusting strxfrm doesn't seem to be on the
cards any time soon.
I have wondered a few times, given the known issues with strxfrm, how is
the use in selfuncs.c still ok. Has anyone taken a hard look at that?
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes:
I have wondered a few times, given the known issues with strxfrm, how is
the use in selfuncs.c still ok. Has anyone taken a hard look at that?
On the one hand, we only need approximately-correct results in that
code. On the other, the result is fed to convert_string_to_scalar(),
which has a rather naive idea that it's dealing with ASCII text.
I've seen at least some strxfrm output that isn't even vaguely
textual-looking.
regards, tom lane
On Wed, Apr 19, 2023 at 2:31 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
To be clear, is the proposal to remove both "check_strxfrm_bug" and
"TRUST_STRXFRM"?Given a bunch of folks who have expertise in this area of code all agree
with removing the above as part of the collation cleanups targeted for
v16, I'm inclined to agree. I don't really see the need for an explicit
RMT action, but based on the consensus this seems OK to add as an open item.
Thanks all. I went ahead and removed check_strxfrm_bug().
I could write a patch to remove the libc strxfrm support, but since
Jeff recently wrote new code in 16 to abstract that stuff, he might
prefer to look at it?
On Thu, 2023-04-20 at 13:34 +1200, Thomas Munro wrote:
I could write a patch to remove the libc strxfrm support, but since
Jeff recently wrote new code in 16 to abstract that stuff, he might
prefer to look at it?
+1 to removing it.
As far as how it's removed, we could directly check:
if (!collate_c && !(locale && locale->provider == COLLPROVIDER_ICU))
abbreviate = false;
as it was before, or we could still try to hide it as a detail behind a
function. I don't have a strong opinion there, though I thought it
might be good for varlena.c to not know those internal details.
Regards,
Jeff Davis
On 4/19/23 9:34 PM, Thomas Munro wrote:
On Wed, Apr 19, 2023 at 2:31 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
To be clear, is the proposal to remove both "check_strxfrm_bug" and
"TRUST_STRXFRM"?Given a bunch of folks who have expertise in this area of code all agree
with removing the above as part of the collation cleanups targeted for
v16, I'm inclined to agree. I don't really see the need for an explicit
RMT action, but based on the consensus this seems OK to add as an open item.Thanks all. I went ahead and removed check_strxfrm_bug().
Thanks! For housekeeping, I put this into "Open Items" and marked it as
resolved.
I could write a patch to remove the libc strxfrm support, but since
Jeff recently wrote new code in 16 to abstract that stuff, he might
prefer to look at it?
I believe we'd be qualifying this as an open item too? If so, let's add it.
Thanks,
Jonathan
On Mon, Apr 17, 2023 at 8:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sun, Dec 18, 2022 at 10:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
With my garbage collector hat on, that made me wonder if there was
some more potential cleanup here: could we require locale_t yet? The
last straggler systems on our target OS list to add the POSIX locale_t
stuff were Solaris 11.4 (2018) and OpenBSD 6.2 (2018). Apparently
it's still too soon: we have two EOL'd OSes in the farm that are older
than that. But here's an interesting fact about wrasse, assuming its
host is gcc211: it looks like it can't even apply further OS updates
because the hardware[1] is so old that Solaris doesn't support it
anymore[2].For the record, now the OpenBSD machines have been upgraded, so now
"wrasse" is the last relevant computer on earth with no POSIX
locale_t. Unfortunately there is no reason to think it's going to go
away soon, so I'm just noting this fact here as a reminder for when it
eventually does...
Since talk of server threads erupted again, I just wanted to note that
this system locale API stuff would be on the long list of
micro-obstacles. You'd *have* to use the locale_t-based interfaces
(or come up with replacements using a big ugly lock to serialise all
access to the process-global locale, or allow only ICU locale support
in that build, but those seem like strange lengths to go to just to
support a dead version of Solaris). There are still at least a couple
of functions that lack XXX_l variants in the standard: mbstowcs() and
wcstombs() (though we use the non-standard _l variants if we find them
in <xlocale.h>), but that's OK because we use uselocale() and not
setlocale(), because uselocale() is required to be thread-local. The
use of setlocale() to set up the per-backend/per-database default
locale would have to be replaced with uselocale(). In other words, I
think wrasse would not be coming with us on this hypothetical quest.
On 12/06/2023 01:15, Thomas Munro wrote:
There are still at least a couple
of functions that lack XXX_l variants in the standard: mbstowcs() and
wcstombs() (though we use the non-standard _l variants if we find them
in <xlocale.h>), but that's OK because we use uselocale() and not
setlocale(), because uselocale() is required to be thread-local.
Right, mbstowcs() and wcstombs() are already thread-safe, that's why
there are no _l variants of them.
The use of setlocale() to set up the per-backend/per-database
default locale would have to be replaced with uselocale().
This recent bug report is also related to that:
/messages/by-id/17946-3e84cb577e9551c3@postgresql.org.
In a nutshell, libperl calls uselocale(), which overrides the setting we
try set with setlocale().
--
Heikki Linnakangas
Neon (https://neon.tech)
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.
It's slightly annoying that Windows has locale_t but doesn't have
uselocale(). It does have thread-local locales via another API,
though. I wonder how hard it would be to get to a point where all
systems have uselocale() too, by supplying a replacement. I noticed
that some other projects eg older versions of LLVM libcxx do that. I
see from one of their discussions[1]https://reviews.llvm.org/D40181 that it worked, except that
thread-local locales are only available with one of the MinGW C
runtimes and not another. We'd have to get to the bottom of that.
On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.
That would look something like this.
Attachments:
0001-All-supported-computers-have-locale_t.patchtext/x-patch; charset=US-ASCII; name=0001-All-supported-computers-have-locale_t.patchDownload+12-148
On Wed, Jun 28, 2023 at 01:02:21PM +1200, Thomas Munro wrote:
On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.That would look something like this.
This removes thirty-eight ifdefs, most of them located in the middle of
function bodies. That's far more beneficial than most proposals to raise
minimum requirements. +1 for revoking support for wrasse's OS version.
(wrasse wouldn't move, but it would stop testing v17+.)
On Sun, Jul 2, 2023 at 4:25 AM Noah Misch <noah@leadboat.com> wrote:
On Wed, Jun 28, 2023 at 01:02:21PM +1200, Thomas Munro wrote:
On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro <thomas.munro@gmail.com> wrote:
The GCC build farm has just received some SPARC hardware new enough to
run modern Solaris (hostname gcc106), so if wrasse were moved over
there we could finally assume all systems have POSIX 2008 (AKA
SUSv4)'s locale_t.That would look something like this.
This removes thirty-eight ifdefs, most of them located in the middle of
function bodies. That's far more beneficial than most proposals to raise
minimum requirements. +1 for revoking support for wrasse's OS version.
(wrasse wouldn't move, but it would stop testing v17+.)
Great. It sounds like I should wait a few days for any other feedback
and then push this patch. Thanks for looking.