proposal: change behavior on collation version mismatch

Started by Jeremy Schneiderabout 2 years ago11 messages
#1Jeremy Schneider
schnjere@amazon.com

I had some interesting conversations with a couple PostgreSQL community
members at PASS Data Summit the week before last about the collation
problem, and then - just in this last week - I saw two more people on
public channels hitting corruption problems. One person on the public
PostgreSQL Slack, we eventually figured out they had upgraded from
Ubuntu 18.04 to 22.04 which hits glibc 2.28; a second person here on the
pgsql-general list reported by Daniel Westermann and I assume
representing a client of dbi services [1]. Everyone who's been tracking
this over the past few years has seen the steady stream of quiet
complaints in public from people at almost every major PG company,
largely around the glibc 2.28 debacle.

I've been tracking the discussions around collation here on the lists
and I've had a number of conversations with folks working deeply in this
area inside and outside of AWS, and I was part of the effort to address
it at AWS since we first became aware of it many years ago.

It seems to me the general perspective on the mailing lists is that:

1) "collation changes are uncommon" (which is relatively correct)
2) "most users would rather have ease-of-use than 100% safety, since
it's uncommon"

And I think this led to the current behavior of issuing a warning rather
than an error, and providing a SQL command "alter ... refresh collation"
which simply instructs the database to permanently forget the warning
without changing anything. I agree that some users might prefer this
behavior, but I think businesses like banks or healthcare companies
would be appalled, and would prefer to do the extra work and have
certainty of avoiding small but known probabilities of silent data
corruption.

As I said on the pgsql-general thread: glibc 2.28 has certainly been the
most obvious and impactful case, so the focus is understandable, but
there's a bit of a myth in the general public that the problem is only
with glibc 2.28 (and not ICU or other glibc versions or data structures
other than indexes). Unfortunately, contrary to current popular belief,
the only truly safe way to update an operating system under PosgreSQL is
with logical dump/load or logical replication, or continuing to compile
and use the exact older version of ICU from the old OS (if you use ICU).
I think the ICU folks are generally careful enough that it'll be far
less likely for compiler changes and new compiler optimizations to
inadvertently change collation on newer operating systems and newer
build toolchains (eg. for strings with don't have linguistically defined
collation, like mixing characters from multiple languages and classes).

It's been two years now since I published the collation torture test
over on github, which directly compares 10 years of both glibc and ICU
changes and demonstrates clearly that both ICU and glibc libraries have
regular small changes, and both libraries have had at least one release
with a massive number of changes. [2]

I also published a blog post this past March with a step-by-step
reproducible demonstration of silent corruption without any indexes
involved by using ICU (not glibc) with an OS upgrade from Ubuntu 20.04
to 22.04. [3] The warning was not even displayed to the user, because
it happened at connection time rather than query time.

That blog also listed many reasons that glibc & ICU regularly include
small changes and linked to real examples from ICU: new characters
(which use existing code points), fixing incorrect rules, governments or
universities clarifying rules, general improvements, and unintentional
changes from code changes or refactors (like glibc in Ubuntu 15.04
changing sort order for 22 thousand CJK characters, many years prior to
2.28).

My own personal opinion here is that PostgreSQL is on a clear trajectory
to soon be the dominant database of businesses like banks and healthcare
companies, and that the PostgreSQL default configuration with regard to
safety and durability should be bank defaults rather than "easy" defaults.

For this reason, I'd like to revisit two specific current behaviors of
PostgreSQL and get a sense of how strongly everyone feels about them.

First: I'd suggest that a collation version mismatch should cause an
ERROR rather than a WARNING by default. If we want to have a GUC that
allows warning behavior, I think that's OK but I think it should be
superuser-only and documented as a "developer" setting similar to
zero_damaged_pages.

Second: I'd suggest that all of the "alter ... refresh collation"
commands should be strictly superuser-only rather than
owner-of-collation-privs, and that they should be similarly documented
as something that is generally advised against and exists for
extraordinary circumstances.

I know these things have been discussed before, and I realize the
implications are important and inconvenient for many users, and also I
realize that I'm not often involved in discussions here on the hackers
email list. (I usually catch up on hackers from archives irregularly,
for areas I'm interested in, and I'm involved more regularly with public
slack and user groups.) But I'm a bit unsatisfied with the current state
of things and want to bring up the topic here and see what happens.

Respectfully,
Jeremy

1:
/messages/by-id/CA+fnDAZufFS-4-6=O3L+qG9iFT8tm6BvtZXNnSm1dkJ8GciCkA@mail.gmail.com

2: https://github.com/ardentperf/glibc-unicode-sorting/

3: https://ardentperf.com/2023/03/26/did-postgres-lose-my-data/

--
Jeremy Schneider
Database and Performance Engineer
Amazon Web Services

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Jeremy Schneider (#1)
Re: proposal: change behavior on collation version mismatch

On Mon, 2023-11-27 at 11:06 -0800, Jeremy Schneider wrote:

First: I'd suggest that a collation version mismatch should cause an
ERROR rather than a WARNING by default. If we want to have a GUC that
allows warning behavior, I think that's OK but I think it should be
superuser-only and documented as a "developer" setting similar to
zero_damaged_pages.

Second: I'd suggest that all of the "alter ... refresh collation"
commands should be strictly superuser-only rather than
owner-of-collation-privs, and that they should be similarly documented
as something that is generally advised against and exists for
extraordinary circumstances.

Thanks for spending thought on this painful subject.

I can get behind changing the collation version mismatch warning into
an error. It would cause more annoyance, but might avert bigger pain
later on.

But I don't think that ALTER DATABASE ... REFRESH COLLATION VERSION
need be superuser-only. Whoever creates an object may alter it in
PostgreSQL, and system collations are owned by the bootstrap superuser
anyway. The point of the warning (or proposed error) is that the user
knows "here is a potential problem, have a closer look".

Yours,
Laurenz Albe

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#2)
Re: proposal: change behavior on collation version mismatch

I forgot to add that the problem will remain a problem until the
day we start keeping our own copy of the ICU library in the source
tree...

Yours,
Laurenz Albe

#4Jeff Davis
pgsql@j-davis.com
In reply to: Jeremy Schneider (#1)
Re: proposal: change behavior on collation version mismatch

On Mon, 2023-11-27 at 11:06 -0800, Jeremy Schneider wrote:

I've been tracking the discussions around collation here on the lists
and I've had a number of conversations with folks working deeply in
this
area inside and outside of AWS, and I was part of the effort to
address
it at AWS since we first became aware of it many years ago.

For the record, I don't have a strong opinion on your specific
proposals. Not because I don't care, but because the available options
all seem pretty bad -- including the status quo.

My general opinion (not tied specifically to your proposals) is that we
need to pursue a lot of different approaches and hope to mitigate the
problem. With that in mind, I think your proposals have merit but we of
course need to consider the downsides.

2) "most users would rather have ease-of-use than 100% safety, since
it's uncommon"

And I think this led to the current behavior of issuing a warning
rather
than an error

The elevel trade-off is *availability* vs safety, not ease-of-use vs
safety. It's harder to reason about what most users might want in that
situation.

First: I'd suggest that a collation version mismatch should cause an
ERROR rather than a WARNING by default.

Is this proposal based on our current notion of collation version?
There's been a lot of reasonable skepticism that what's stored in
datcollversion is a good indicator.

If we want to have a GUC that
allows warning behavior, I think that's OK but I think it should be
superuser-only and documented as a "developer" setting similar to
zero_damaged_pages.

A GUC seems sensible to express the availability-vs-safety trade-off. I
suspect we can get a GUC that defaults to "warning" committed, but
anything else will be controversial.

Regards,
Jeff Davis

#5Jeff Davis
pgsql@j-davis.com
In reply to: Laurenz Albe (#3)
Re: proposal: change behavior on collation version mismatch

On Mon, 2023-11-27 at 20:19 +0100, Laurenz Albe wrote:

I forgot to add that the problem will remain a problem until the
day we start keeping our own copy of the ICU library in the source
tree...

Another option is for packagers to keep specific ICU versions around
for an extended time, and make it possible for Postgres to link to the
right one more flexibly (e.g. tie at initdb time, or some kind of
multi-lib system).

Even if ICU is available, we still have the problem of defaults. initdb
defaults to libc, and so does CREATE COLLATION (even if the database
collation is ICU). So it will be a long time before it's used widely
enough to consider the problem solved.

And even after all of that, ICU is not perfect, and our support for it
still has various rough edges.

Regards,
Jeff Davis

#6Magnus Hagander
magnus@hagander.net
In reply to: Jeff Davis (#4)
Re: proposal: change behavior on collation version mismatch

On Mon, Nov 27, 2023 at 9:30 PM Jeff Davis <pgsql@j-davis.com> wrote:

On Mon, 2023-11-27 at 11:06 -0800, Jeremy Schneider wrote:

If we want to have a GUC that
allows warning behavior, I think that's OK but I think it should be
superuser-only and documented as a "developer" setting similar to
zero_damaged_pages.

A GUC seems sensible to express the availability-vs-safety trade-off. I
suspect we can get a GUC that defaults to "warning" committed, but
anything else will be controversial.

A guc like this would bring a set of problems similar to what we have
e.g. with fsync.

That is, set it to "warnings only", insert a single row into the table
with an "unlucky" key, set it back to "errors always" and you now have
a corrupt database, but your setting reflects that it shouldn't be
corrupt. Sure, people shouldn't do that - but people will, and it will
make things harder to debug.

There's been talk before about adding a "tainted" flag or similar to
pg_control that gets set if you ever start the system with fsync=off.
Similar things could be done here of course, but I'd worry a bit about
adding another flag like this which can lead to
hard-to-determine-state without resolving that.

(The fact that we have "fsync" under WAL config and not developer
options is an indication that we can't really use the classification
of the config parameters are a good indicator of what's safe and not
safe to set)

I could get behind turning it into an error though :)

//Magnus

#7Jeff Davis
pgsql@j-davis.com
In reply to: Magnus Hagander (#6)
Re: proposal: change behavior on collation version mismatch

On Mon, 2023-11-27 at 22:37 +0100, Magnus Hagander wrote:

That is, set it to "warnings only", insert a single row into the
table
with an "unlucky" key, set it back to "errors always" and you now
have
a corrupt database, but your setting reflects that it shouldn't be
corrupt.

You would be giving the setting too much credit if you assume that
consistently keeping it on "error" is a guarantee against corruption.

It only affects what we do when we detect potential corruption, but our
detection is subject to both false positives and false negatives.

We'd need to document the setting so that users understand the
consequences and limitations.

I won't push strongly for such a setting to exist because I know that
it's far from a complete solution. But I believe it would be sensible
considering that this problem is going to take a while to resolve.

Regards,
Jeff Davis

#8Jeremy Schneider
schnjere@amazon.com
In reply to: Jeff Davis (#4)
Re: proposal: change behavior on collation version mismatch

On 11/27/23 12:29 PM, Jeff Davis wrote:

2) "most users would rather have ease-of-use than 100% safety, since
it's uncommon"

And I think this led to the current behavior of issuing a warning
rather
than an error

The elevel trade-off is *availability* vs safety, not ease-of-use vs
safety. It's harder to reason about what most users might want in that
situation.

I'm not in agreement with the idea that this is hard to reason about;
I've always thought durability & correctness is generally supposed to be
prioritized over availability in databases. For many enterprise
customers, if they ask why their database wouldn't accept connections
after an OS upgrade and we explained that durability & correctness is
prioritized over availability, I think they would agree we're doing the
right thing.

In practice this always happens after a major operating system update of
some kind (it would be an unintentional bug in a minor OS upgrade).  In
most cases, I hope the error will happen immediately because users
ideally won't even be able to connect (for DB-level glibc and for ICU
default setting).  Giving a hard error quickly after an OS upgrade is
actually pretty easy for most people to deal with. For most users,
they'll immediately understand that something went wrong related to the
OS upgrade.  And basic testing would turn up connection errors before
the production upgrade as long as a connection was attempted as part of
the test.

It seems to me that much of the hand-wringing is around taking a hard
line on not allowing in-place OS upgrades. We're all aware that when
you're talking about tens of terrabytes, in-place upgrade is just a lot
more convenient and easy than the alternatives. And we're aware that
some other relational databases support this (and also bundle collation
libs directly in the DB rather than using external libraries).

I myself wouldn't frame this as an availability issue, I think it's more
about ease-of-use in the sense of allowing low-downtime major OS
upgrades without the complexity of logical replication (but perhaps with
a risk of data loss, because with unicode nobody can actually be 100%
sure there's no risky characters stored in the DB, and even those of us
with extensive expert knowledge struggle to accurately characterize the
risk level).

The hand-wringing often comes down to the argument "but MAYBE en_US
didn't change in those 3 major version releases of ICU that you jumped
across to land a new Ubuntu LTS release" ~~ however I believe it's one
thing to make this argument with ISO 8859 but in the unicode world en_US
has default sort rules for japanese, chinese, arabic, cyrilic, nepalese,
and all kinds of strings with nonsensical combinations of all these
characters.  After some years of ICU and PG, I'm just coming to a
conclusion that the right thing to do is stay safe and don't change ICU
versions (or glibc versions) for existing databases in-place.

-Jeremy

--
Jeremy Schneider
Performance Engineer
Amazon Web Services

#9Jeff Davis
pgsql@j-davis.com
In reply to: Jeremy Schneider (#8)
Re: proposal: change behavior on collation version mismatch

On Mon, 2023-11-27 at 15:35 -0800, Jeremy Schneider wrote:

For many enterprise customers, if they ask why their database
wouldn't accept connections after an OS upgrade and we explained that
durability & correctness is prioritized over availability, I think
they would agree we're doing the right thing.

They may agree, but their database is still down, and they'll be
looking for a clear process to get it going again, ideally within their
maintenance window.

It would be really nice to agree on such a process, or even better, to
implement it in code.

After some years of ICU and PG, I'm just coming to a conclusion that
the right thing to do is stay safe and don't change ICU versions (or
glibc versions) for existing databases in-place.

I don't disagree, but for a lot of users that depend on their operating
system and packaging infrastructure, that is not very practical.

Regards,
Jeff Davis

#10Daniel Verite
daniel@manitou-mail.org
In reply to: Jeremy Schneider (#1)
Re: proposal: change behavior on collation version mismatch

Jeremy Schneider wrote:

1) "collation changes are uncommon" (which is relatively correct)
2) "most users would rather have ease-of-use than 100% safety, since
it's uncommon"

And I think this led to the current behavior of issuing a warning rather
than an error,

There's a technical reason for this being a warning.
If it was an error, any attempt to do anything with the collation
would fail, which includes REINDEX on indexes using
that collation.
And yet that's precisely what you're supposed to do in that
situation.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#11Jeremy Schneider
schnjere@amazon.com
In reply to: Daniel Verite (#10)
Re: proposal: change behavior on collation version mismatch

On 11/28/23 2:12 AM, Daniel Verite wrote:

Jeremy Schneider wrote:

1) "collation changes are uncommon" (which is relatively correct)
2) "most users would rather have ease-of-use than 100% safety, since
it's uncommon"

And I think this led to the current behavior of issuing a warning rather
than an error,

There's a technical reason for this being a warning.
If it was an error, any attempt to do anything with the collation
would fail, which includes REINDEX on indexes using
that collation.
And yet that's precisely what you're supposed to do in that
situation.

Indexes are the most obvious and impactful corruption, so the focus is
understandable, but there's a bit of a myth in the general public that
REINDEX means you fixed your database.  I'm concerned that too many
people believe this falsehood, and don't realize that things like
constraints and partitions can also be affected by a major OS update
when leaving PG data files in place.  Also there's a tendancy to use
amcheck and validate btree indexes, but skip other index types.  And of
course none of this is possible when people mistakenly use a different
major OS for the hot standby (but Postgres willingly sends incorrect
query results to users).

This is why my original proposal included an update to the ALTER ...
REFRESH/COLLATION docs.  Today's conventional wisdom suggests this is a
safe command.  It's really not, if you're using unicode (which everyone
is). Fifteen years ago, you needed to buy a french keyboard to type
french accented characters.  Today it's a quick tap on your phone to get
chinese, russian, tibetan, emojis, and any other character you can dream
of.  All of those surprising characters eventually get stored in Postres
databases, often to the surprise of devs and admins, after they discover
corruption from an OS upgrade.

And to recap some data about historical ICU versions from the torture test:

ICU Version | OS Version | en-US characters changed collation |
zh-Hans-CN characters changed collation | fr-FR characters changed collation
55.1-7ubuntu0.5 | Ubuntu 16.04.7 LTS | 286,654 | 286,654 | 286,654
60.2-3ubuntu3.1 | Ubuntu 18.04.6 LTS | 23,741 | 24,415 | 23,741
63.1-6 | Ubuntu 19.04 | 688 | 688 | 688
66.1-2ubuntu2 | Ubuntu 20.04.3 LTS | 6,497 | 6,531 | 6,497
70.1-2 | Ubuntu 22.04 LTS | 879 | 887 | 879

The very clear trend here is that most changes are made in the root
collation rules, affecting all locales.  This means that worrying about
specific collation versions of different locales is really focusing on
an irrelevant edge case.  In ICU development, all the locales tend to
change.

If anyone thinks the Collation Apocalypse is bad now, I predict the
Kubernetes wave will be mayhem.  Fifteen years ago it was rare to
physically move PG datafiles to a new major OS.  Most people would dump
and load their databases, sized in GBs.  Today's multi-TB Postgres
databases have meant an increase of in-place OS upgrades in recent
years.  People started to either detach/attach their storage, or they
used a hot standby. Kubernetes will make these moves across major OS's a
daily, effortless occurrence.

ICU doesn't fix anything directly.  We do need ICU - only because it
finally enables us to compile that old version of ICU forever on every
new OS we move to going forward. This was simply impossible with glibc.
Over the past couple decades, not even Oracle or IBM has managed to
deprecate a single version of ICU from a relational database, and not
for lack of desire.

-Jeremy

--
Jeremy Schneider
Performance Engineer
Amazon Web Services