Collation version tracking for macOS

Started by Peter Eisentrautabout 4 years ago148 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

During development, I have been using the attached patch to simulate
libc collation versions on macOS. It just uses the internal major OS
version number. I don't know to what the extend the libc locales on
macOS are maintained or updated at all, so I don't know what practical
effect this would have. Again, it's mainly for development. If there
is interest from others, I think we could add this, maybe disabled by
default, or we just keep it in the mailing list archives for interested
parties.

Attachments:

0001-Collation-version-tracking-for-macOS.patchtext/plain; charset=UTF-8; name=0001-Collation-version-tracking-for-macOS.patchDownload+26-1
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#1)
Re: Collation version tracking for macOS

On Mon, Feb 14, 2022 at 10:00 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

During development, I have been using the attached patch to simulate
libc collation versions on macOS. It just uses the internal major OS
version number. I don't know to what the extend the libc locales on
macOS are maintained or updated at all, so I don't know what practical
effect this would have. Again, it's mainly for development. If there
is interest from others, I think we could add this, maybe disabled by
default, or we just keep it in the mailing list archives for interested
parties.

Last time I looked into this it seemed like macOS's strcoll() gave
sensible answers in the traditional single-byte encodings, but didn't
understand UTF-8 at all so you get C/strcmp() order. In other words
there was effectively nothing to version. I remember that other old
Unixes used to be like that, and I suspect that they might be using
old pre-UTF-8 FreeBSD code for locales based on a quick peek at [1]https://github.com/apple-open-source-mirror/Libc/tree/master/locale
(though FreeBSD itself has since learned to do CLDR-based UTF-8
sorting with a completely new implementation shared with other OSes).
This makes me wonder if Apple is hiding another collation
implementation somewhere up its sleeve -- surely that libc support is
not good enough for the world's shiny globalised macOS/iOS apps?
Maybe UCCompareText() and friends (UnicodeUtilitiesCoreLib) and the
various Obj-C NSString comparison stuff, all of which probably
predates Unixoid macOS (google tells me that UnicodeUtilities.h was
present in macOS 9). It wouldn't be surprising if it shares nothing
with the modern OS's C runtime stuff that came via NeXT. Just
mentioning this as a curiosity, because I was trying to figure out how
that could be left non-working without anyone complaining...

[1]: https://github.com/apple-open-source-mirror/Libc/tree/master/locale

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#2)
Re: Collation version tracking for macOS

On 07.05.22 02:31, Thomas Munro wrote:

During development, I have been using the attached patch to simulate
libc collation versions on macOS. It just uses the internal major OS
version number. I don't know to what the extend the libc locales on
macOS are maintained or updated at all, so I don't know what practical
effect this would have. Again, it's mainly for development. If there
is interest from others, I think we could add this, maybe disabled by
default, or we just keep it in the mailing list archives for interested
parties.

Last time I looked into this it seemed like macOS's strcoll() gave
sensible answers in the traditional single-byte encodings, but didn't
understand UTF-8 at all so you get C/strcmp() order. In other words
there was effectively nothing to version.

Someone recently told me that collations in macOS have actually changed
recently and that this is a live problem. See explanation here:

https://github.com/PostgresApp/PostgresApp/blob/master/docs/documentation/reindex-warning.md?plain=1#L66

So I think we should reconsider this patch, even for PG15.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#3)
Re: Collation version tracking for macOS

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 07.05.22 02:31, Thomas Munro wrote:

Last time I looked into this it seemed like macOS's strcoll() gave
sensible answers in the traditional single-byte encodings, but didn't
understand UTF-8 at all so you get C/strcmp() order. In other words
there was effectively nothing to version.

Someone recently told me that collations in macOS have actually changed
recently and that this is a live problem. See explanation here:

https://github.com/PostgresApp/PostgresApp/blob/master/docs/documentation/reindex-warning.md?plain=1#L66

So I think we should reconsider this patch, even for PG15.

According to that document, they changed it in macOS 11, which came out
a year and a half ago. Given the lack of complaints, it doesn't seem
like this is urgent enough to mandate a post-beta change that would
have lots of downside (namely, false-positive warnings for every other
macOS update).

regards, tom lane

#5Jeremy Schneider
schneider@ardentperf.com
In reply to: Tom Lane (#4)
Re: Collation version tracking for macOS

On 6/3/22 9:21 AM, Tom Lane wrote:

According to that document, they changed it in macOS 11, which came out
a year and a half ago. Given the lack of complaints, it doesn't seem
like this is urgent enough to mandate a post-beta change that would
have lots of downside (namely, false-positive warnings for every other
macOS update).

Sorry, I'm going to rant for a minute... it is my very strong opinion
that using language like "false positive" here is misguided and dangerous.

If new version of sort order is released, for example when they recently
updated backwards-secondary sorting in french [CLDR-2905] or matching of
v and w in swedish and finnish [CLDR-7088], it is very dangerous to use
language like “false positive” to describe a database where there just
didn't happen to be any rows with accented french characters at the
point in time where PostgreSQL magically changed which version of sort
order it was using from the 2010 french version to the 2020 french version.

No other piece of software that calls itself a database would do what
PostgreSQL is doing: just give users a "warning" after suddenly changing
the sort order algorithm (most users won't even read warnings in their
logs). Oracle, DB2, SQL Server and even MySQL carefully version
collation data, hardcode a pseudo-linguistic collation into the DB (like
PG does for timezones), and if they provide updates to linguistic sort
order (from Unicode CLDR) then they allow the user to explicitly specify
which version of french or german ICU sorting they are want to use.
Different versions are treated as different sort orders; they are not
conflated.

I have personally seen PostgreSQL databases where an update to an old
version of glibc was applied (I'm not even talking 2.28 here) and it
resulted in data loss b/c crash recovery couldn't replay WAL records and
the user had to do a PITR. That's aside from the more common issues of
segfaults or duplicate records that violate unique constraints or wrong
query results like missing data. And it's not just updates - people can
set up a hot standby on a different version and see many of these
problems too.

Collation versioning absolutely must be first class and directly
controlled by users, and it's very dangerous to allow users - at all -
to take an index and then use a different version than what the index
was built with.

Not to mention all the other places in the DB where collation is used...
partitioning, constraints, and any other place where persisted data can
make an assumption about any sort of string comparison.

It feels to me like we're still not really thinking clearly about this
within the PG community, and that the seriousness of this issue is not
fully understood.

-Jeremy Schneider

--
http://about.me/jeremy_schneider

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#3)
Re: Collation version tracking for macOS

On Sat, Jun 4, 2022 at 12:17 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 07.05.22 02:31, Thomas Munro wrote:

Last time I looked into this it seemed like macOS's strcoll() gave
sensible answers in the traditional single-byte encodings, but didn't
understand UTF-8 at all so you get C/strcmp() order. In other words
there was effectively nothing to version.

Someone recently told me that collations in macOS have actually changed
recently and that this is a live problem. See explanation here:

https://github.com/PostgresApp/PostgresApp/blob/master/docs/documentation/reindex-warning.md?plain=1#L66

How can I see evidence of this? I'm comparing Debian, FreeBSD and
macOS 12.4 and when I run "LC_COLLATE=en_US.UTF-8 sort
/usr/share/dict/words" I get upper and lower case mixed together on
the other OSes, but on the Mac the upper case comes first, which is my
usual smoke test for "am I looking at binary sort order?"

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Jeremy Schneider (#5)
Re: Collation version tracking for macOS

On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
<schneider@ardentperf.com> wrote:

No other piece of software that calls itself a database would do what
PostgreSQL is doing: just give users a "warning" after suddenly changing
the sort order algorithm (most users won't even read warnings in their
logs). Oracle, DB2, SQL Server and even MySQL carefully version
collation data, hardcode a pseudo-linguistic collation into the DB (like
PG does for timezones), and if they provide updates to linguistic sort
order (from Unicode CLDR) then they allow the user to explicitly specify
which version of french or german ICU sorting they are want to use.
Different versions are treated as different sort orders; they are not
conflated.

I guess you know this but for the record, there have been discussions
before about supporting multiple versions of ICU concurrently, like
DB2. For example, one idea was that each ICU version could be a
separate "provider" in PostgreSQL, so you can concurrently use
multiple versions.

One of several places this came up:
/messages/by-id/CADE5jYJTnYaTNXMFKOK-0p44+Dm5LMcRcJ5kVi1MVHomb2QTkQ@mail.gmail.com

It feels to me like we're still not really thinking clearly about this
within the PG community, and that the seriousness of this issue is not
fully understood.

FWIW A couple of us tried quite hard to make smarter warnings, and
that thread and others discussed a lot of those topics, like the
relevance to constraints and so forth.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#7)
Re: Collation version tracking for macOS

Thomas Munro <thomas.munro@gmail.com> writes:

On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
<schneider@ardentperf.com> wrote:

It feels to me like we're still not really thinking clearly about this
within the PG community, and that the seriousness of this issue is not
fully understood.

FWIW A couple of us tried quite hard to make smarter warnings, and
that thread and others discussed a lot of those topics, like the
relevance to constraints and so forth.

I think the real problem here is that the underlying software mostly
doesn't take this issue seriously. Unfortunately, that leads one to
the conclusion that we need to maintain our own collation code and
data (e.g., our own fork of ICU), and that isn't happening. Unlike
say Oracle, we do not have the manpower; nor do we want to bloat our
code base that much.

Short of maintaining our own fork, ranting about the imperfections
of the situation is a waste of time.

regards, tom lane

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#8)
Re: Collation version tracking for macOS

On 6/3/22 3:58 PM, Tom Lane wrote

Thomas Munro <thomas.munro@gmail.com> writes:

On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
<schneider@ardentperf.com> wrote:

It feels to me like we're still not really thinking clearly about this
within the PG community, and that the seriousness of this issue is not
fully understood.

FWIW A couple of us tried quite hard to make smarter warnings, and
that thread and others discussed a lot of those topics, like the
relevance to constraints and so forth.

I think the real problem here is that the underlying software mostly
doesn't take this issue seriously. Unfortunately, that leads one to
the conclusion that we need to maintain our own collation code and
data (e.g., our own fork of ICU), and that isn't happening. Unlike
say Oracle, we do not have the manpower; nor do we want to bloat our
code base that much.

Short of maintaining our own fork, ranting about the imperfections
of the situation is a waste of time.

The first step to a solution is admitting that the problem exists.
Ignoring broken backups, segfaults and data corruption as a "rant"
implies that we simply throw in the towel and tell users to suck it up
or switch engines. There are other ways to address this short of the
community doing all the work itself. One simple example would be to
refuse to start if the collation provider has changed since initdb
(which we'd need to allow users to override). A more sophisticated
option would be to provide the machinery for supporting multiple
collation libraries. Both of those at least ensure that users are aware
any time there's a problem, which IMO is *enormously* better than
letting core functionality silently stop working.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#9)
Re: Collation version tracking for macOS

Jim Nasby <nasbyj@amazon.com> writes:

I think the real problem here is that the underlying software mostly
doesn't take this issue seriously.

The first step to a solution is admitting that the problem exists.
Ignoring broken backups, segfaults and data corruption as a "rant"
implies that we simply throw in the towel and tell users to suck it up
or switch engines. There are other ways to address this short of the
community doing all the work itself. One simple example would be to
refuse to start if the collation provider has changed since initdb
(which we'd need to allow users to override).

You're conveniently skipping over the hard part, which is to tell
whether the collation provider has changed behavior (which we'd better
do with pretty darn high accuracy, if we're going to refuse to start
on the basis of thinking it has). Unfortunately, giving a reliable
indication of collation behavioral changes is *exactly* the thing
that the providers aren't taking seriously.

regards, tom lane

#11Jeremy Schneider
schneider@ardentperf.com
In reply to: Jim Nasby (#9)
Re: Collation version tracking for macOS

On Jun 6, 2022, at 17:10, Jim Nasby <nasbyj@amazon.com> wrote:
Ignoring broken backups, segfaults and data corruption as a "rant" implies that we simply throw in the towel and tell users to suck it up or switch engines.

Well now, let’s be clear, I was the one who called my email a “rant”. 🙂

And I do apologize for that - it was grumpy and impulsive and Tom isn’t wrong that rants don’t usually help move things forward.

Thomas - thanks for the link back to one of the threads. I spent some time reading through that and it’s a lot of material; I haven’t read the whole thread yet. If you have some others that would also be particularly good background, let me know. I’m doing a chunk of this in my spare time at the moment, but I do want to keep getting more up to speed. I was pulled into a bunch of various things related to PostgreSQL and ICU and collation and OS’s over the past couple years, so I learned a lot from on-the-ground experience and I am interested in trying to get a little more involved in the conversation here.

Personally, I really do think there should at least be an *option* to tell the DB to fully error rather than just warn on version mismatch. Correctness matters to many users, and being able to *trust* string comparisons are correct is pretty damn fundamental all throughout a database. It really doesn’t get any more basic and the potential for bad things to happen is pretty astronomical, if you can’t trust those. I understand the consternation about dealing with upgrades of large & busy databases, but I’m still surprised that the community consensus arrived at the present behavior, and I have a lot of reading to do, to really understand how that happened and where the dialogue is today.

Multiple versions of ICU sounds nice for users who need real linguistic collation (like what Oracle and DB2 offer), but I still feel like there needs to be a super simple basic “pseudo-linguistic” collation baked in, that’s “good enough” for 99% of users and that is guaranteed to be the same everywhere on every platform and just won’t ever change. I think glibc needs to be phased out somehow. At a minimum, not the default for new users… to stop the bleeding. If MySQL wasn’t GPL then I’d say to just copy their collations. I’d be reluctant to spend too much time on a POC now though, it feels like my idea is the outlier and the general PG hacker consensus would be to reject this idea. (But maybe I’m wrong?)

Anyway, again, apologies for my pants-on-fire email last week. I hope I can enjoy a few beers someday - or coffee for the non-drinkers - with a few other PG collation nerds (which I never set out to be, but it may have befallen me <g>).

-Jeremy

Sent from my TI-83

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Jim Nasby (#9)
Re: Collation version tracking for macOS

On Tue, Jun 7, 2022 at 12:10 PM Jim Nasby <nasbyj@amazon.com> wrote:

On 6/3/22 3:58 PM, Tom Lane wrote

Thomas Munro <thomas.munro@gmail.com> writes:

On Sat, Jun 4, 2022 at 7:13 AM Jeremy Schneider
<schneider@ardentperf.com> wrote:

It feels to me like we're still not really thinking clearly about this
within the PG community, and that the seriousness of this issue is not
fully understood.

FWIW A couple of us tried quite hard to make smarter warnings, and
that thread and others discussed a lot of those topics, like the
relevance to constraints and so forth.

I think the real problem here is that the underlying software mostly
doesn't take this issue seriously. Unfortunately, that leads one to
the conclusion that we need to maintain our own collation code and
data (e.g., our own fork of ICU), and that isn't happening. Unlike
say Oracle, we do not have the manpower; nor do we want to bloat our
code base that much.

Short of maintaining our own fork, ranting about the imperfections
of the situation is a waste of time.

The first step to a solution is admitting that the problem exists.

We've been discussing this topic for years and I don't think anyone
thinks the case is closed...

Ignoring broken backups, segfaults and data corruption as a "rant"
implies that we simply throw in the towel and tell users to suck it up
or switch engines. There are other ways to address this short of the
community doing all the work itself. One simple example would be to
refuse to start if the collation provider has changed since initdb
(which we'd need to allow users to override).

Yeah, it's been discussed, but never proposed. The problem is that
you need to start up to fix the problem. Another option is not to use
affected indexes, but that doesn't help with other forms of the
problem (partition constraints, etc).

A more sophisticated
option would be to provide the machinery for supporting multiple
collation libraries.

Earlier I mentioned distinct "providers" but I take that back, that's
too complicated. Reprising an old idea that comes up each time we
talk about this, this time with some more straw-man detail: what about
teaching our ICU support to understand "libicu18n.so.71:en" to mean
that it should dlopen() that library and use its functions? Or some
cleverer, shorter notation. Then it's the user's problem to make sure
the right libraries are installed, and it'll fail if they're not. For
example, on Debian bookworm right now you can install libicu63,
libicu67, libicu71, though only the "current" -dev package, but which
I'm sure we can cope with. You're at the mercy of the distro or
add-on package repos to keep a lot of versions around, but that seems
OK. Maintaining our own fork(s) of ICU would seem like massive
overkill and I don't think anyone has suggested that; the question on
my mind is whether we could rely on existing packages. Then you'd be
exposed only to changes that happen within (say) the ICU 63 package's
lifetime... I recall looking into whether that can happen but ... I
don't recall the answer.

#13Rod Taylor
rbt@rbt.ca
In reply to: Tom Lane (#10)
Re: Collation version tracking for macOS

On Mon, Jun 6, 2022 at 8:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jim Nasby <nasbyj@amazon.com> writes:

I think the real problem here is that the underlying software mostly
doesn't take this issue seriously.

The first step to a solution is admitting that the problem exists.
Ignoring broken backups, segfaults and data corruption as a "rant"
implies that we simply throw in the towel and tell users to suck it up
or switch engines. There are other ways to address this short of the
community doing all the work itself. One simple example would be to
refuse to start if the collation provider has changed since initdb
(which we'd need to allow users to override).

You're conveniently skipping over the hard part, which is to tell
whether the collation provider has changed behavior (which we'd better
do with pretty darn high accuracy, if we're going to refuse to start
on the basis of thinking it has). Unfortunately, giving a reliable
indication of collation behavioral changes is *exactly* the thing
that the providers aren't taking seriously.

Is this more involved than creating a list of all valid Unicode characters
(~144 thousand), sorting them, then running crc32 over the sorted order to
create the "version" for the library/collation pair? Far from free but few
databases use more than a couple different collations.

--
Rod Taylor

#14Thomas Munro
thomas.munro@gmail.com
In reply to: Rod Taylor (#13)
Re: Collation version tracking for macOS

On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor <rbt@rbt.ca> wrote:

Is this more involved than creating a list of all valid Unicode characters (~144 thousand), sorting them, then running crc32 over the sorted order to create the "version" for the library/collation pair? Far from free but few databases use more than a couple different collations.

Collation rules have multiple levels and all kinds of quirks, so that
won't work.

#15Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: Collation version tracking for macOS

On Fri, Jun 3, 2022 at 4:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think the real problem here is that the underlying software mostly
doesn't take this issue seriously. Unfortunately, that leads one to
the conclusion that we need to maintain our own collation code and
data (e.g., our own fork of ICU), and that isn't happening. Unlike
say Oracle, we do not have the manpower; nor do we want to bloat our
code base that much.

You don't, but that opinion isn't universally held, or at least not
with the same vigor that you hold it. See e.g.
/messages/by-id/a4019c5e570d4dbb5e3f816c080fb57c76ab604a.camel@cybertec.at
and subsequent discussion, for example.

In fact, I'd go so far as to argue that you're basically sticking your
head in the sand here. You wrote:

"Given the lack of complaints, it doesn't seem
like this is urgent enough to mandate a post-beta change that would
have lots of downside (namely, false-positive warnings for every other
macOS update)."

But you wrote that to Peter, who was essentially complaining that we
hadn't done anything, and linked to another source, which was also
complaining about the problem, and then Jeremy Schneider replied to
your email and complained some more.

Complaining about "false positives" doesn't really make sense to me.
It's true that we don't have any false positives right now, but we
also have no true positives. Even a stopped clock is right twice a
day, but not in a useful way. People want to be notified when a
problem might exist, even if sometimes it doesn't actually. The
alternative is having no idea at all that things might be broken,
which is not better.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#14)
Re: Collation version tracking for macOS

Thomas Munro <thomas.munro@gmail.com> writes:

On Wed, Jun 8, 2022 at 3:58 AM Rod Taylor <rbt@rbt.ca> wrote:

Is this more involved than creating a list of all valid Unicode characters (~144 thousand), sorting them, then running crc32 over the sorted order to create the "version" for the library/collation pair? Far from free but few databases use more than a couple different collations.

Collation rules have multiple levels and all kinds of quirks, so that
won't work.

Yeah, and it's exactly at the level of quirks that things are likely
to change. Nobody's going to suddenly start sorting B before A.
They might, say, change their minds about where the digram "cz"
sorts relative to single letters, in languages where special rules
for that are a thing.

The idea of fingerprinting a collation's behavior is interesting,
but I've got doubts about whether we can make a sufficiently thorough
fingerprint.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: Collation version tracking for macOS

Robert Haas <robertmhaas@gmail.com> writes:

In fact, I'd go so far as to argue that you're basically sticking your
head in the sand here. You wrote:

No, I quite agree that we have a problem. What I don't agree is that
issuing a lot of false-positive warnings is a solution. That will
just condition people to ignore the warnings, and then when their
platform really does change behavior, they're still screwed. If we
could *accurately* report collation behavioral changes, I'd be all
for that.

Rod's idea upthread is certainly way too simplistic, but could we
build a set of test cases that do detect known changes in collation
behaviors? We'd be shooting at a moving target; but even if we're
late in noticing that platform X changed the behavior of collation Y,
we could help users who run in the problem afterwards.

regards, tom lane

In reply to: Robert Haas (#15)
Re: Collation version tracking for macOS

On Tue, Jun 7, 2022 at 12:37 PM Robert Haas <robertmhaas@gmail.com> wrote:

It's true that we don't have any false positives right now, but we
also have no true positives. Even a stopped clock is right twice a
day, but not in a useful way. People want to be notified when a
problem might exist, even if sometimes it doesn't actually.

Collations by their very nature are unlikely to change all that much.
Obviously they can and do change, but the details are presumably
pretty insignificant to a native speaker. Stands to reason that the
issue (which is fundamentally a problem for natural language experts)
would have been resolved far sooner if there really was a significant
controversy about something that tends to come up often.

It's pretty clear that glibc as a project doesn't take the issue very
seriously, because they see it as a problem of the GUI sorting a table
in a way that seems slightly suboptimal to scholars of a natural
language. Clearly that isn't actually a big deal. But the latent
possibility of wrong answers to queries is a very big deal. Both are
true. It's just a matter of priorities in each case.

I agree that "false positive" is not a valid way of describing a
breaking change in a Postgres collation that happens to not affect one
index in particular, due to the current phase of the moon. It's
probably very likely that most individual indexes that we warn about
will be so-called false positives. I bet Postgres that there are many
near-misses that we never get to hear about already. That's rather
beside the point. The index must be assumed to be corrupt.

--
Peter Geoghegan

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#18)
Re: Collation version tracking for macOS

Peter Geoghegan <pg@bowt.ie> writes:

I agree that "false positive" is not a valid way of describing a
breaking change in a Postgres collation that happens to not affect one
index in particular, due to the current phase of the moon. It's
probably very likely that most individual indexes that we warn about
will be so-called false positives.

This is not the concern that I have. I agree that if we tell a user
that collation X changed behavior and he'd better reindex his indexes
that use collation X, but none of them actually contain any cases that
changed behavior, that's not a "false positive" --- that's "it's cheaper
to reindex than to try to identify whether there's a problem". What
I mean by "false positive" is telling every macOS user that they'd better
reindex everything every year, when in point of fact Apple changes those
collations almost never. We will soon lose those users' attention ---
see fable about boy crying wolf --- and then when Apple actually does
change something, we've got a problem. So if we give collation-change
warnings, they'd better have some measurable connection to reality.

regards, tom lane

In reply to: Thomas Munro (#12)
Re: Collation version tracking for macOS

On Mon, Jun 6, 2022 at 5:45 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Earlier I mentioned distinct "providers" but I take that back, that's
too complicated. Reprising an old idea that comes up each time we
talk about this, this time with some more straw-man detail: what about
teaching our ICU support to understand "libicu18n.so.71:en" to mean
that it should dlopen() that library and use its functions? Or some
cleverer, shorter notation. Then it's the user's problem to make sure
the right libraries are installed, and it'll fail if they're not. For
example, on Debian bookworm right now you can install libicu63,
libicu67, libicu71, though only the "current" -dev package, but which
I'm sure we can cope with. You're at the mercy of the distro or
add-on package repos to keep a lot of versions around, but that seems
OK.

Right. Postgres could link to multiple versions of ICU at the same
time. Right now it doesn't, and right now the ICU C symbol names that
we use are actually versioned (this isn't immediately apparent because
the C preprocessor makes it appear that ICU symbol names are generic).

We could perhaps invent a new indirection that knows about
multiple ICU versions, each of which is an independent collation
provider, or maybe a related collation provider that gets used by
default on REINDEX. ICU is designed for this kind of thing. That
approach more or less puts packagers on the hook for managing
collation stability. But now long term collation stability is at least
feasible -- we at least have a coherent strategy. In the worst case
the community .deb and .rpm repos might continue to support an older
ICU version, or lobby for its continued support by the distro (while
actively discouraging its use in new databases). This isn't the same
thing as forking ICU. It's a compromise between that extreme, and
the current situation.

--
Peter Geoghegan

#21Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#16)
#22Jeremy Schneider
schneider@ardentperf.com
In reply to: Peter Geoghegan (#18)
In reply to: Tom Lane (#19)
In reply to: Jeremy Schneider (#22)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Jeremy Schneider (#22)
#26Jeremy Schneider
schneider@ardentperf.com
In reply to: Peter Geoghegan (#24)
#27Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#16)
In reply to: Jeremy Schneider (#26)
#29Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#20)
In reply to: Thomas Munro (#29)
#31Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#30)
In reply to: Thomas Munro (#31)
#33Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#32)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#17)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#34)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
#38Daniel Verite
daniel@manitou-mail.org
In reply to: Tom Lane (#16)
In reply to: Tom Lane (#37)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Verite (#38)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#40)
#42Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tom Lane (#19)
#43Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#37)
#44Jeremy Schneider
schneider@ardentperf.com
In reply to: Thomas Munro (#43)
#45Jeremy Schneider
schneider@ardentperf.com
In reply to: Thomas Munro (#33)
In reply to: Jeremy Schneider (#45)
In reply to: Peter Geoghegan (#46)
#48Jeremy Schneider
schneider@ardentperf.com
In reply to: Peter Geoghegan (#46)
In reply to: Jeremy Schneider (#48)
In reply to: Jeremy Schneider (#48)
#51Jim Finnerty
jfinnert@amazon.com
In reply to: Thomas Munro (#33)
In reply to: Jim Finnerty (#51)
#53Thomas Munro
thomas.munro@gmail.com
In reply to: Jim Finnerty (#51)
In reply to: Thomas Munro (#53)
#55Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#52)
In reply to: Thomas Munro (#55)
#57Tobias Bussmann
t.bussmann@gmx.net
In reply to: Thomas Munro (#6)
#58Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#56)
In reply to: Thomas Munro (#58)
#60Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#59)
#61Tobias Bussmann
t.bussmann@gmx.net
In reply to: Tom Lane (#36)
In reply to: Thomas Munro (#60)
#63Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#62)
#64Thomas Munro
thomas.munro@gmail.com
In reply to: Tobias Bussmann (#57)
#65Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#63)
In reply to: Thomas Munro (#65)
In reply to: Thomas Munro (#63)
#68Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#66)
#69Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#67)
In reply to: Thomas Munro (#68)
In reply to: Thomas Munro (#69)
#72Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Geoghegan (#71)
#73Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#72)
#74Thomas Munro
thomas.munro@gmail.com
In reply to: Jeremy Schneider (#11)
#75Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#43)
In reply to: Thomas Munro (#75)
#77Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Geoghegan (#67)
#78Jeremy Schneider
schneider@ardentperf.com
In reply to: Peter Eisentraut (#77)
#79Thomas Munro
thomas.munro@gmail.com
In reply to: Jeremy Schneider (#78)
#80Jeremy Schneider
schneider@ardentperf.com
In reply to: Thomas Munro (#79)
#81Peter Eisentraut
peter_e@gmx.net
In reply to: Jeremy Schneider (#78)
#82Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#81)
#83Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#82)
#84Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#83)
#85Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#84)
#86Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#85)
#87Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#86)
#88Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#87)
#89Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#83)
#90Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#83)
#91Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#90)
#92Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#91)
#93Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#91)
#94Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#93)
#95Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#94)
#96Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#95)
#97Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#95)
#98Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#97)
#99Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#98)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#94)
#101Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#98)
#102Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#101)
#103Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#102)
#104Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#101)
#105Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#100)
#106Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#98)
#107Joe Conway
mail@joeconway.com
In reply to: Robert Haas (#100)
#108Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#105)
#109Jeremy Schneider
schneider@ardentperf.com
In reply to: Jeff Davis (#101)
#110Thomas Munro
thomas.munro@gmail.com
In reply to: Jeremy Schneider (#109)
#111Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#108)
#112Jeff Davis
pgsql@j-davis.com
In reply to: Joe Conway (#107)
#113Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#110)
#114Joe Conway
mail@joeconway.com
In reply to: Jeff Davis (#112)
#115Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#103)
#116Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#111)
#117Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#113)
#118Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#112)
#119Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#118)
#120Jeff Davis
pgsql@j-davis.com
In reply to: Joe Conway (#114)
#121Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#117)
#122Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#121)
#123Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#119)
#124Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#116)
#125Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#124)
#126Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#122)
#127Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#126)
#128Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#125)
#129Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#127)
In reply to: Jeff Davis (#115)
#131Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#106)
#132Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#131)
#133Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#131)
#134Joe Conway
mail@joeconway.com
In reply to: Jeff Davis (#133)
#135Thomas Munro
thomas.munro@gmail.com
In reply to: Joe Conway (#134)
#136Jeff Davis
pgsql@j-davis.com
In reply to: Thomas Munro (#135)
#137Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#135)
#138Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#136)
#139Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#138)
#140vignesh C
vignesh21@gmail.com
In reply to: Jeff Davis (#139)
#141Jeff Davis
pgsql@j-davis.com
In reply to: vignesh C (#140)
#142vignesh C
vignesh21@gmail.com
In reply to: Jeff Davis (#141)
#143Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#141)
#144Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#143)
#145Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#144)
#146Jeff Davis
pgsql@j-davis.com
In reply to: Robert Haas (#145)
#147Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Davis (#146)
#148Thomas Munro
thomas.munro@gmail.com
In reply to: Jeff Davis (#146)