Collation versioning

Started by Thomas Munroover 7 years ago192 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hello,

While reviewing the ICU versioning work a while back, I mentioned the
idea of using a user-supplied command to get a collversion string for
the libc collation provider. I was reminded about that by recent news
about an upcoming glibc/CLDR resync that is likely to affect
PostgreSQL users (though, I guess, probably only when they do a major
OS upgrade). Here's an experimental patch to try that idea out. For
example, you might set it like this:

libc_collation_version_command = 'md5
/usr/share/locale/@LC_COLLATE@/LC_COLLATE | sed "s/.* = //"'

... or, on a Debian system using the locales package, like this:

libc_collation_version_command = 'dpkg -s locales | grep Version: |
sed "s/Version: //"'

Using the checksum approach, it works like this:

postgres=# alter collation "xx_XX" refresh version;
NOTICE: changing version from b88d621596b7e61337e832f7841066a9 to
7b008442fbaf5dfe7a10fb3d82a634ab
ALTER COLLATION
postgres=# select * from pg_collation where collname = 'xx_XX';
-[ RECORD 1 ]-+---------------------------------
collname | xx_XX
collnamespace | 2200
collowner | 10
collprovider | c
collencoding | 6
collcollate | en_US.UTF-8
collctype | UTF-8
collversion | 7b008442fbaf5dfe7a10fb3d82a634ab

When the collation definition changes you get the desired scary
warning on next attempt to use it in a fresh backend:

postgres=# select * from t order by v;
WARNING: collation "xx_XX" has version mismatch
DETAIL: The collation in the database was created using version
b88d621596b7e61337e832f7841066a9, but the operating system provides
version 7b008442fbaf5dfe7a10fb3d82a634ab.
HINT: Rebuild all objects affected by this collation and run ALTER
COLLATION public."xx_XX" REFRESH VERSION, or build PostgreSQL with the
right library version.

The problem is that it isn't in effect at initdb time so if you add
that later it only affects new locales. You'd need a way to do that
during init to capture the imported system locale versions, and that's
a really ugly string to have to pass into some initdb option. Ugh.

Another approach would be to decide that we're willing to put
non-portable version extracting magic in pg_locale.c. On a long
flight I hacked my libc to store a version string (based on CLDR
version or whatever) in its binary locale definitions and provide a
proper interface to ask for it, modelled on querylocale(3):

const char *querylocaleversion(int mask, locale_t locale);

Then the patch for pg_locale.c is trivial, see attached. While I
could conceivably try to convince my local friendly OS to take such a
patch, the real question is how to deal with glibc. Does anyone know
of a way to extract a version string from glibc using existing
interfaces? I heard there was an undocumented way but I haven't been
able to find it -- probably because I was, erm, looking in the
documentation.

Or maybe this isn't worth bothering with, and we should just build out
the ICU support and then make it the default and be done with it.

In passing, here's a patch to add tab completion for ALTER COLLATION
... REFRESH VERSION.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Auto-complete-ALTER-COLLATION-.-REFRESH-VERSION.patchapplication/octet-stream; name=0001-Auto-complete-ALTER-COLLATION-.-REFRESH-VERSION.patchDownload+2-2
0001-Add-libc_collation_version_command-GUC.patchapplication/octet-stream; name=0001-Add-libc_collation_version_command-GUC.patchDownload+64-1
0001-Add-libc_collation_version_command-GUC.patchapplication/octet-stream; name=0001-Add-libc_collation_version_command-GUC.patchDownload+64-1
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#1)
Re: Collation versioning

On Tue, Sep 4, 2018 at 10:02 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

const char *querylocaleversion(int mask, locale_t locale);

Then the patch for pg_locale.c is trivial, see attached.

Oops, here's that one, FWIW.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Use-querylocaleversion-3-if-available.patchapplication/octet-stream; name=0001-Use-querylocaleversion-3-if-available.patchDownload+20-3
#3Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#1)
Re: Collation versioning

Re: Thomas Munro 2018-09-04 <CAEepm=0uEQCpfq_+LYFBdArCe4Ot98t1aR4eYiYTe=yavQygiQ@mail.gmail.com>

I was reminded about that by recent news
about an upcoming glibc/CLDR resync that is likely to affect
PostgreSQL users (though, I guess, probably only when they do a major
OS upgrade).

Or replicating/restoring a database to a newer host.

... or, on a Debian system using the locales package, like this:

libc_collation_version_command = 'dpkg -s locales | grep Version: |
sed "s/Version: //"'

Ugh. This sounds horribly easy to get wrong on the user side. I could
of course put that preconfigured into the Debian packages, but that
would leave everyone not using any of the standard distro packagings
in the rain.

Does anyone know
of a way to extract a version string from glibc using existing
interfaces? I heard there was an undocumented way but I haven't been
able to find it -- probably because I was, erm, looking in the
documentation.

That sounds more robust. Googling around:

https://www.linuxquestions.org/questions/linux-software-2/how-to-check-glibc-version-263103/

#include <stdio.h>
#include <gnu/libc-version.h>
int main (void) { puts (gnu_get_libc_version ()); return 0; }

$ ./a.out
2.27

Hopefully that version info is fine-grained enough.

Christoph

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Christoph Berg (#3)
Re: Collation versioning

On Wed, Sep 5, 2018 at 3:35 AM Christoph Berg <myon@debian.org> wrote:

Re: Thomas Munro 2018-09-04 <CAEepm=0uEQCpfq_+LYFBdArCe4Ot98t1aR4eYiYTe=yavQygiQ@mail.gmail.com>

I was reminded about that by recent news
about an upcoming glibc/CLDR resync that is likely to affect
PostgreSQL users (though, I guess, probably only when they do a major
OS upgrade).

Or replicating/restoring a database to a newer host.

Yeah.

Does anyone know
of a way to extract a version string from glibc using existing
interfaces? I heard there was an undocumented way but I haven't been
able to find it -- probably because I was, erm, looking in the
documentation.

That sounds more robust. Googling around:

https://www.linuxquestions.org/questions/linux-software-2/how-to-check-glibc-version-263103/

#include <stdio.h>
#include <gnu/libc-version.h>
int main (void) { puts (gnu_get_libc_version ()); return 0; }

$ ./a.out
2.27

Hopefully that version info is fine-grained enough.

Hmm. I was looking for locale data version, not libc.so itself. I
realise they come ultimately from the same source package, but are the
locale definitions and libc6 guaranteed to be updated at the same
time? I see that the locales package depends on libc-bin >> 2.27 (no
upper bound), which in turn depends on libc6 >> 2.27, << 2.28. So
perhaps you can have a system with locales 2.27 and libc6 2.28? I
also wonder if there are some configurations where they could get out
of sync because of manual use of locale-gen or something like that.
Clearly most systems would have them in sync through apt-get upgrade
though, so maybe gnu_get_libc_version() would work well in practice?

--
Thomas Munro
http://www.enterprisedb.com

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#4)
Re: Collation versioning

On Wed, Sep 5, 2018 at 8:20 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Sep 5, 2018 at 3:35 AM Christoph Berg <myon@debian.org> wrote:

int main (void) { puts (gnu_get_libc_version ()); return 0; }

$ ./a.out
2.27

Hmm. I was looking for locale data version, not libc.so itself. I
realise they come ultimately from the same source package, but are the
locale definitions and libc6 guaranteed to be updated at the same
time?

And even if they are, what if your cluster is still running and still
has the older libc.so.6 mapped in? Newly forked backends will see new
locale data but gnu_get_libc_version() will return the old string.
(Pointed out off-list by Andres.) Eventually you restart your cluster
and start seeing the error.

So, it's not ideal but perhaps worth considering on the grounds that
it's better than nothing?

--
Thomas Munro
http://www.enterprisedb.com

#6Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#5)
Re: Collation versioning

Re: Thomas Munro 2018-09-05 <CAEepm=3a5BC7CwsXZo3V4fw6YuAMT2nJ1krwtqOatb=vDqRWEA@mail.gmail.com>

Hopefully that version info is fine-grained enough.

Hmm. I was looking for locale data version, not libc.so itself. I
realise they come ultimately from the same source package, but are the
locale definitions and libc6 guaranteed to be updated at the same
time? I see that the locales package depends on libc-bin >> 2.27 (no
upper bound), which in turn depends on libc6 >> 2.27, << 2.28. So
perhaps you can have a system with locales 2.27 and libc6 2.28?

No because libc6.deb "breaks" locales << 2.27:

Package: libc6
Source: glibc
Version: 2.27-5
Breaks: [...] locales (<< 2.27), locales-all (<< 2.27),

(I can't tell off-hand why this isn't just a stricter dependency in
locales.deb, but it's probably because this variant works better for
upgrades.)

I also wonder if there are some configurations where they could get out
of sync because of manual use of locale-gen or something like that.
Clearly most systems would have them in sync through apt-get upgrade
though, so maybe gnu_get_libc_version() would work well in practice?

I'd hope so. I'm more worried about breakage because of fixes applied
within one glibc version (2.27-5 vs 2.27-6), but I guess Debian's
glibc maintainers are clueful enough not to do that.

Re: Thomas Munro 2018-09-05 <CAEepm=0hoACQLFn8ro7jCO9-wTth2mXXS3K=s09gxKqN2jy8pA@mail.gmail.com>

And even if they are, what if your cluster is still running and still
has the older libc.so.6 mapped in? Newly forked backends will see new
locale data but gnu_get_libc_version() will return the old string.
(Pointed out off-list by Andres.) Eventually you restart your cluster
and start seeing the error.

That problem isn't protected against by PG itself. I've seen clusters
that were upgraded on disk but not restarted yet where every plpgsql
invocation was throwing symbol errors. So I guess we don't have to try
harder for libc.

So, it's not ideal but perhaps worth considering on the grounds that
it's better than nothing?

Ack.

Christoph

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Christoph Berg (#6)
Re: Collation versioning

On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg <myon@debian.org> wrote:

So, it's not ideal but perhaps worth considering on the grounds that
it's better than nothing?

Ack.

Ok, here's a little patch like that.

postgres=# select collname, collcollate, collversion from pg_collation
where collname = 'en_NZ';
collname | collcollate | collversion
----------+-------------+-------------
en_NZ | en_NZ.utf8 | 2.24
(1 row)

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Set-collversion-for-collations-that-come-from-glibc.patchapplication/octet-stream; name=0001-Set-collversion-for-collations-that-come-from-glibc.patchDownload+18-3
#8Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#7)
Re: Collation versioning

On 05/09/2018 23:18, Thomas Munro wrote:

On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg <myon@debian.org> wrote:

So, it's not ideal but perhaps worth considering on the grounds that
it's better than nothing?

Ack.

Ok, here's a little patch like that.

postgres=# select collname, collcollate, collversion from pg_collation
where collname = 'en_NZ';
collname | collcollate | collversion
----------+-------------+-------------
en_NZ | en_NZ.utf8 | 2.24
(1 row)

But wouldn't that also have the effect that glibc updates that don't
change anything about the locales would trigger the version
incompatibility warning?

Also, note that this mechanism only applies to collation objects, not to
database-global locales. So most users wouldn't be helped by this approach.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#8)
Re: Collation versioning

On Thu, Sep 6, 2018 at 12:01 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 05/09/2018 23:18, Thomas Munro wrote:

On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg <myon@debian.org> wrote:

So, it's not ideal but perhaps worth considering on the grounds that
it's better than nothing?

Ack.

Ok, here's a little patch like that.

postgres=# select collname, collcollate, collversion from pg_collation
where collname = 'en_NZ';
collname | collcollate | collversion
----------+-------------+-------------
en_NZ | en_NZ.utf8 | 2.24
(1 row)

But wouldn't that also have the effect that glibc updates that don't
change anything about the locales would trigger the version
incompatibility warning?

Right. And likewise, a glibc update that does change some locales but
not the locales that you are actually using will trigger false alarm
warnings. The same goes for the ICU provider, which appears to return
the same collversion for every collation, even though presumably some
don't change from one ICU version to the next.

I wonder if someone here knows how many "locales" packages have been
released over the lifetime of (say) the current Debian stable distro,
whether any LC_COLLATE files changed over those releases, and whether
libc6 had the same MAJOR.MINOR for the whole lifetime. That is, even
though they might have been through 2.19-17+blah, 2.19-18+blah, ...
did they all report "2.19" and were the collations actually stable?
If that's the case, I think it'd be quite good: we'd only raise the
alarm after a big dist-upgrade Debian 8->9, or when doing streaming
replication from a Debian 8 box to a Debian 9 box.

Also, note that this mechanism only applies to collation objects, not to
database-global locales. So most users wouldn't be helped by this approach.

Yeah, right, that would have to work for this to be useful. I will
look into that.

--
Thomas Munro
http://www.enterprisedb.com

#10Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#9)
Re: Collation versioning

Fwiw, I was doing some tests with LC_COLLATE last year:

https://github.com/ChristophBerg/lc_collate_testsuite

Iirc the outcome was that everything except de_DE.UTF-8 was stable.

Christoph

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#9)
Re: Collation versioning

On Thu, Sep 6, 2018 at 5:36 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Sep 6, 2018 at 12:01 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

Also, note that this mechanism only applies to collation objects, not to
database-global locales. So most users wouldn't be helped by this approach.

Yeah, right, that would have to work for this to be useful. I will
look into that.

We could perform a check up front in (say) CheckMyDatabase(), or maybe
defer until the first string comparison. The tricky question is where
to store it.

1. We could add datcollversion to pg_database.

2. We could remove datcollate and datctype and instead store a
collation OID. I'm not sure what problems would come up, but for
starters it seems a bit weird to have a shared catalog pointing to
rows in a non-shared catalog.

The same question comes up if we want to support ICU as a database
level default. Add datcollprovider, or point to a pg_collation row?

--
Thomas Munro
http://www.enterprisedb.com

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#11)
Re: Collation versioning

On 07/09/2018 23:34, Thomas Munro wrote:

On Thu, Sep 6, 2018 at 5:36 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Sep 6, 2018 at 12:01 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

Also, note that this mechanism only applies to collation objects, not to
database-global locales. So most users wouldn't be helped by this approach.

Yeah, right, that would have to work for this to be useful. I will
look into that.

We could perform a check up front in (say) CheckMyDatabase(), or maybe
defer until the first string comparison. The tricky question is where
to store it.

1. We could add datcollversion to pg_database.

2. We could remove datcollate and datctype and instead store a
collation OID. I'm not sure what problems would come up, but for
starters it seems a bit weird to have a shared catalog pointing to
rows in a non-shared catalog.

The same question comes up if we want to support ICU as a database
level default. Add datcollprovider, or point to a pg_collation row?

This was previously discussed here:
/messages/by-id/f689322a-4fc5-10cc-4a60-81f1ff0166c9@2ndquadrant.com
-- without a conclusion.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#11)
Re: Collation versioning

Re: Thomas Munro 2018-09-07 <CAEepm=1xGTsLDx63UEdcJ8MdG63CNJ-tsDWHbH9djtvxRH5ZWw@mail.gmail.com>

2. We could remove datcollate and datctype and instead store a
collation OID. I'm not sure what problems would come up, but for
starters it seems a bit weird to have a shared catalog pointing to
rows in a non-shared catalog.

Naive idea: make that catalog shared? Collations are system-wide after
all.

Christoph

#14Peter Eisentraut
peter_e@gmx.net
In reply to: Christoph Berg (#13)
Re: Collation versioning

On 12/09/2018 10:15, Christoph Berg wrote:

Re: Thomas Munro 2018-09-07 <CAEepm=1xGTsLDx63UEdcJ8MdG63CNJ-tsDWHbH9djtvxRH5ZWw@mail.gmail.com>

2. We could remove datcollate and datctype and instead store a
collation OID. I'm not sure what problems would come up, but for
starters it seems a bit weird to have a shared catalog pointing to
rows in a non-shared catalog.

Naive idea: make that catalog shared? Collations are system-wide after
all.

By the same argument, extensions should be shared, but they are not.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Christoph Berg
myon@debian.org
In reply to: Peter Eisentraut (#14)
Re: Collation versioning

Re: Peter Eisentraut 2018-09-12 <0447ec7b-cdb6-7252-7943-88a4664e7bb7@2ndquadrant.com>

Naive idea: make that catalog shared? Collations are system-wide after
all.

By the same argument, extensions should be shared, but they are not.

But extensions put a lot of visible stuff into a database, whereas a
collation is just a line in some table that doesn't get into the way.

Christoph

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Christoph Berg (#15)
Re: Collation versioning

On 12/09/2018 13:25, Christoph Berg wrote:

Re: Peter Eisentraut 2018-09-12 <0447ec7b-cdb6-7252-7943-88a4664e7bb7@2ndquadrant.com>

Naive idea: make that catalog shared? Collations are system-wide after
all.

By the same argument, extensions should be shared, but they are not.

But extensions put a lot of visible stuff into a database, whereas a
collation is just a line in some table that doesn't get into the way.

How about C functions? They are just a system catalog representation of
something that exists on the OS.

Anyway, we also want to support application-specific collation
definitions, so that users can CREATE COLLATION
"my_specific_requirements" and use that that in their application, so
global collations wouldn't be appropriate for that.

Moreover, the fix for a collation version mismatch is, in the simplest
case, to go around and REINDEX everything. Making the collation or
collation version global doesn't fix that. It would actually make it
harder because you couldn't run ALTER COLLATION REFRESH VERSION until
after you have rebuilt all affected objects *in all databases*.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Christoph Berg
myon@debian.org
In reply to: Peter Eisentraut (#16)
Re: Collation versioning

Re: Peter Eisentraut 2018-09-13 <4f60612c-a7b5-092d-1532-21ff7a106bd5@2ndquadrant.com>

Moreover, the fix for a collation version mismatch is, in the simplest
case, to go around and REINDEX everything. Making the collation or
collation version global doesn't fix that. It would actually make it
harder because you couldn't run ALTER COLLATION REFRESH VERSION until
after you have rebuilt all affected objects *in all databases*.

Btw, I think a "reindexdb --all --collation" (and the SQL per-database
equivalent) that only rebuilds indexes that are affected by collations
would be immensely useful to have.

Christoph

#18Stephen Frost
sfrost@snowman.net
In reply to: Christoph Berg (#17)
Re: Collation versioning

Greetings,

* Christoph Berg (myon@debian.org) wrote:

Re: Peter Eisentraut 2018-09-13 <4f60612c-a7b5-092d-1532-21ff7a106bd5@2ndquadrant.com>

Moreover, the fix for a collation version mismatch is, in the simplest
case, to go around and REINDEX everything. Making the collation or
collation version global doesn't fix that. It would actually make it
harder because you couldn't run ALTER COLLATION REFRESH VERSION until
after you have rebuilt all affected objects *in all databases*.

Btw, I think a "reindexdb --all --collation" (and the SQL per-database
equivalent) that only rebuilds indexes that are affected by collations
would be immensely useful to have.

As I was discussing w/ Peter G during PostgresOpen, we'd have to wait
until that reindexdb is complete before actually using anything in the
system and that's pretty painful. While it sounds like it'd be a good
bit of work, it seems like we really need to have a way to support
multiple collation versions concurrently and to do that we'll need to
have the library underneath actually providing that to us. Once we have
that, we can build new indexes concurrently and swap to them (or,
ideally, just issue REINDEX CONCURRENTLY once we support that..).

Until then, it seems like we really need to have a way to realize that a
given upgrade is going to require a big reindex, before actually doing
the reindex and suddenly discovering that we can't use a bunch of
indexes because they're out of date and extending the downtime for the
upgrade to be however long it takes to rebuild those indexes...

Thanks!

Stephen

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#16)
Re: Collation versioning

On Thu, Sep 13, 2018 at 7:03 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 12/09/2018 13:25, Christoph Berg wrote:

Re: Peter Eisentraut 2018-09-12 <0447ec7b-cdb6-7252-7943-88a4664e7bb7@2ndquadrant.com>

Naive idea: make that catalog shared? Collations are system-wide after
all.

By the same argument, extensions should be shared, but they are not.

But extensions put a lot of visible stuff into a database, whereas a
collation is just a line in some table that doesn't get into the way.

How about C functions? They are just a system catalog representation of
something that exists on the OS.

Anyway, we also want to support application-specific collation
definitions, so that users can CREATE COLLATION
"my_specific_requirements" and use that that in their application, so
global collations wouldn't be appropriate for that.

Moreover, the fix for a collation version mismatch is, in the simplest
case, to go around and REINDEX everything. Making the collation or
collation version global doesn't fix that. It would actually make it
harder because you couldn't run ALTER COLLATION REFRESH VERSION until
after you have rebuilt all affected objects *in all databases*.

Here's one idea I came up with. It involves a new kind of magic. The
goals are:

1. Support versioning for the libc provider, including for the
default collation.
2. Support ICU for the default collation.
3. Fix the tracking of when reindexes need to be rebuilt, so that you
can't get it wrong (as you're alluding to above).

Changes:

1. Drop the datcollate and datctype columns from pg_database.
2. In CheckMyDatabase() or elsewhere in backend initialisation, get
that information instead by loading the pg_collation row with OID =
DEFAULT_COLLATION_OID.
3. Don't put COLLPROVIDER_DEFAULT into the default collation
collprovider column, instead give it a concrete provider value, ie
COLLPROVIDER_LIBC.
4. After creating a new database, update that row as appropriate in
the new database (!). Or find some other way to write a new table out
and switch it around, or something like that. That is, if you say
CREATE DATABASE foo LC_COLLATE = 'xx_XX', COLLATION_PROVIDER = libc
then those values somehow get written into the default pg_collation
row in the *new* database (so at that point it's not a simple copy of
the template database).
5. Drop the collversion column from pg_collation. Get rid of the
REFRESH VERSION command. Instead, add a new column indcollversion to
pg_index. It needs to be an array of text (not sure if that is a
problem in a catalog), with elements that correspond to the elements
of indcollation.
6. Do the check and log warnings when we first open each index.
7. Update indcollversion at index creation and whenever we REINDEX.

I haven't actually tried any of this so I'm not sure if I'm missing
something other than the inherent difficulty of updating a row in a
table in a database you're not connected to...

--
Thomas Munro
http://www.enterprisedb.com

#20Douglas Doole
dougdoole@gmail.com
In reply to: Thomas Munro (#19)
Re: Collation versioning

On Sun, Sep 16, 2018 at 1:20 AM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

3. Fix the tracking of when reindexes need to be rebuilt, so that you
can't get it wrong (as you're alluding to above).

I've mentioned this in the past, but didn't seem to get any traction, so
I'll try it again ;-)

The focus on indexes when a collation changes is, in my opinion, the least
of the problems. You definitely have to worry about indexes, but they can
be easily rebuilt. What about other places where collation is hardened into
the system, such as constraints?

For example, in ICU 4.6 the handling of accents changed for French.
Previously accents were considered right-to-left but ICU 4.6 reversed this.
So consider a constraint like CHECK COL < 'coté' (last letter is U+00E9,
small letter e with acute). Prior to ICU 4.6 the value 'côte' (second
letter is U+00F4, small letter o with circumflex) would have passed this
constraint. With 4.6 or later it would be rejected because of the accent
ordering change. As soon as the collation changes, this table becomes
inconsistent and a reindex isn't going to help it. This becomes a data
cleansing problem at this point (which sucks for the user because their
data was clean immediately prior to the "upgrade").

There have similarly been cases where ICU changes have caused equal
characters to become unequal and vice versa. (Unfortunately all my ICU
notes with examples are at my previous employer.) Consider the effect on RI
constraints. The primary key can be fixed with a reindex (although dealing
with two existing values becoming equal is a pain). But then the user also
has to deal with the foreign keys since they may now have foreign keys
which have no match in the primary key.

And constraints problems are even easier than triggers. Consider a database
with complex BI rules that are implemented through triggers that fire when
values are/are not equal. If the equality of strings change, there could be
bad data throughout the tables. (At least with constraints the inter-column
dependencies are explicit in the catalogs. With triggers anything goes.)

All this collation stuff is great, and I know users want it, but it feels
like were pushing them out of an airplane with a ripped parachute every
time the collation libraries change. Maybe they'll land safely or maybe
things will get very messy.

- Doug
Salesforce

#21Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Douglas Doole (#20)
#22Douglas Doole
dougdoole@gmail.com
In reply to: Andrew Gierth (#21)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Douglas Doole (#20)
#24Stephen Frost
sfrost@snowman.net
In reply to: Thomas Munro (#23)
#25Thomas Munro
thomas.munro@gmail.com
In reply to: Stephen Frost (#24)
#26Stephen Frost
sfrost@snowman.net
In reply to: Thomas Munro (#25)
#27Douglas Doole
dougdoole@gmail.com
In reply to: Thomas Munro (#23)
#28Bruce Momjian
bruce@momjian.us
In reply to: Douglas Doole (#27)
#29Douglas Doole
dougdoole@gmail.com
In reply to: Bruce Momjian (#28)
#30Thomas Munro
thomas.munro@gmail.com
In reply to: Stephen Frost (#24)
#31Stephen Frost
sfrost@snowman.net
In reply to: Thomas Munro (#30)
#32Thomas Munro
thomas.munro@gmail.com
In reply to: Stephen Frost (#31)
#33Stephen Frost
sfrost@snowman.net
In reply to: Thomas Munro (#32)
#34Douglas Doole
dougdoole@gmail.com
In reply to: Stephen Frost (#33)
#35Stephen Frost
sfrost@snowman.net
In reply to: Douglas Doole (#34)
#36Thomas Munro
thomas.munro@gmail.com
In reply to: Stephen Frost (#35)
#37Stephen Frost
sfrost@snowman.net
In reply to: Thomas Munro (#36)
#38Thomas Munro
thomas.munro@gmail.com
In reply to: Stephen Frost (#37)
#39Thomas Munro
thomas.munro@gmail.com
In reply to: Douglas Doole (#29)
#40Douglas Doole
dougdoole@gmail.com
In reply to: Thomas Munro (#39)
#41Thomas Munro
thomas.munro@gmail.com
In reply to: Douglas Doole (#40)
In reply to: Thomas Munro (#41)
#43Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#41)
#44Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#19)
#45Peter Eisentraut
peter_e@gmx.net
In reply to: Douglas Doole (#20)
#46Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#44)
#47Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#46)
#48Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#45)
#49Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#7)
#50Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#49)
#51Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#50)
#52Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#51)
#53Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#52)
#54Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#53)
#55Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#53)
#56Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#55)
#57Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#56)
#58Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#48)
#59Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#58)
#60Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#59)
#61Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#60)
#62Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#61)
#63Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#60)
#64Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#60)
#65Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#59)
#66Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#62)
#67Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#60)
#68Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#67)
#69Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#65)
#70Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#68)
#71Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#70)
#72Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Thomas Munro (#71)
#73Christoph Berg
myon@debian.org
In reply to: Laurenz Albe (#72)
#74Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#68)
#75Julien Rouhaud
rjuju123@gmail.com
In reply to: Christoph Berg (#73)
#76Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#75)
#77Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#76)
#78Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#77)
#79Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#78)
#80Thomas Munro
thomas.munro@gmail.com
In reply to: Laurenz Albe (#72)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#80)
#82Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#81)
#83Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#82)
#84Robert Haas
robertmhaas@gmail.com
In reply to: Julien Rouhaud (#83)
#85Julien Rouhaud
rjuju123@gmail.com
In reply to: Robert Haas (#84)
#86Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#85)
#87Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#68)
#88Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#86)
#89Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#88)
#90Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#89)
#91Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#90)
#92Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#91)
#93Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#89)
#94Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#93)
#95Peter Eisentraut
peter_e@gmx.net
In reply to: Julien Rouhaud (#94)
#96Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#95)
#97Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#96)
#98Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#97)
#99Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Julien Rouhaud (#98)
#100Julien Rouhaud
rjuju123@gmail.com
In reply to: Laurenz Albe (#99)
#101Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#100)
#102Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#101)
#103Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#98)
#104Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#103)
#105Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#104)
#106Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#105)
#107Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#106)
#108Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#107)
#109Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#108)
#110Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#109)
#111Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#110)
#112Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#111)
#113Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#112)
#114Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#113)
#115Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#114)
#116Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#115)
#117opolofdez
opolofdez@gmail.com
In reply to: Julien Rouhaud (#113)
#118Peter Eisentraut
peter_e@gmx.net
In reply to: Julien Rouhaud (#116)
#119Christoph Berg
myon@debian.org
In reply to: Peter Eisentraut (#118)
#120Julien Rouhaud
rjuju123@gmail.com
In reply to: Christoph Berg (#119)
#121Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#120)
#122Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#116)
#123Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#122)
#124Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#123)
#125Peter Eisentraut
peter_e@gmx.net
In reply to: Julien Rouhaud (#120)
#126Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#125)
#127Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#125)
#128Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#124)
#129Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#128)
#130Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#129)
#131Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#130)
#132Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#131)
#133Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#132)
#134Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#133)
#135Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#134)
#136Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#135)
#137Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#136)
#138Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#137)
#139Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#138)
#140Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#139)
#141Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#140)
#142Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#141)
#143Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#142)
#144Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#143)
#145Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#144)
#146Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#145)
#147Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#146)
#148Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#147)
#149Christoph Berg
myon@debian.org
In reply to: Julien Rouhaud (#148)
#150Julien Rouhaud
rjuju123@gmail.com
In reply to: Christoph Berg (#149)
#151Peter Eisentraut
peter_e@gmx.net
In reply to: Christoph Berg (#149)
#152Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#151)
#153Peter Eisentraut
peter_e@gmx.net
In reply to: Julien Rouhaud (#148)
#154Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#153)
#155Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#154)
#156Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#155)
#157Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#156)
#158Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#157)
#159Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#158)
#160Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#159)
#161Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#160)
#162Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#161)
#163Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#162)
#164Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#163)
#165Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#164)
#166Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#165)
#167Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#166)
#168Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#167)
#169Julien Rouhaud
rjuju123@gmail.com
In reply to: Thomas Munro (#168)
#170Thomas Munro
thomas.munro@gmail.com
In reply to: Julien Rouhaud (#169)
#171Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#170)
#172Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#171)
#173David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Munro (#172)
#174David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#173)
#175Thomas Munro
thomas.munro@gmail.com
In reply to: David Rowley (#174)
#176Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#175)
#177Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#175)
#178Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#177)
#179Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#178)
#180Christoph Berg
myon@debian.org
In reply to: Thomas Munro (#177)
#181Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#179)
#182Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#181)
#183David Rowley
dgrowleyml@gmail.com
In reply to: Thomas Munro (#181)
#184Thomas Munro
thomas.munro@gmail.com
In reply to: David Rowley (#183)
#185Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Thomas Munro (#177)
#186Michael Paquier
michael@paquier.xyz
In reply to: Juan José Santamaría Flecha (#185)
#187Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#186)
#188Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Christoph Berg (#180)
#189Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#186)
#190Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#50)
#191Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#189)
#192Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#191)