Q: error on updating collation version information

Started by Karsten Hilbertover 3 years ago12 messagesgeneral
Jump to latest
#1Karsten Hilbert
Karsten.Hilbert@gmx.net

Dear all,

following an ICU upgrade, collations in a stock Debian PG 15.1
cluster now have divergent version information in pg_collations.

Now

gnumed_v22=> ALTER COLLATION pg_catalog."br_FR@euro" REFRESH VERSION;
ERROR: collation "pg_catalog.br_FR@euro" for encoding "UTF8" does not exist

despite

gnumed_v22=> select *, pg_encoding_to_char(collencoding) from pg_collation where collname = 'br_FR@euro';
-[ RECORD 1 ]-------+-----------
oid | 12413
collname | br_FR@euro
collnamespace | 11
collowner | 10
collprovider | c
collisdeterministic | t
collencoding | 16
collcollate | br_FR@euro
collctype | br_FR@euro
colliculocale |
collversion | 2.35
pg_encoding_to_char | LATIN9

However, note the UTF8 vs LATIN9.

The manual sayeth:

Some (less frequently used) encodings are not supported
by ICU. When the database encoding is one of these, ICU
collation entries in pg_collation are ignored. Attempting
to use one will draw an error along the lines of
“collation "de-x-icu" for encoding "WIN874" does not
exist”.

which sounds somewhat related.

The database encoding is UTF8. That br_FR@euro.LATIN9 had
_not_ been added manually. It is also not actively used in my
database(s).

What is the canonical advice on the way forward here ? Is
the _suggested_ solution to delete the collation or am I
missing to see the "proper" approach to fixing it ?

Thanks,
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#2Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Karsten Hilbert (#1)
Re: Q: error on updating collation version information

Am Sun, Dec 04, 2022 at 01:22:02PM +0100 schrieb Karsten Hilbert:

gnumed_v22=> ALTER COLLATION pg_catalog."br_FR@euro" REFRESH VERSION;
ERROR: collation "pg_catalog.br_FR@euro" for encoding "UTF8" does not exist

The OS (libc) does seem to know that collation:

@hermes:~$ locale -a | grep br_FR
br_FR
br_FR@euro
br_FR.utf8

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#3Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Karsten Hilbert (#1)
Re: Q: error on updating collation version information

Am Sun, Dec 04, 2022 at 01:22:02PM +0100 schrieb Karsten Hilbert:

following an ICU upgrade, collations in a stock Debian PG 15.1
cluster now have divergent version information in pg_collations.

Correction: this is following a libc upgrade 2.35 -> 2.36

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Karsten Hilbert (#3)
Re: Q: error on updating collation version information

On 12/4/22 04:35, Karsten Hilbert wrote:

Am Sun, Dec 04, 2022 at 01:22:02PM +0100 schrieb Karsten Hilbert:

following an ICU upgrade, collations in a stock Debian PG 15.1
cluster now have divergent version information in pg_collations.

Correction: this is following a libc upgrade 2.35 -> 2.36

So to be clear this database is not using ICU, but collations from libc?

How was the database installed?

In first post you had:

gnumed_v22=> select *, pg_encoding_to_char(collencoding) from
pg_collation where collname = 'br_FR@euro';
-[ RECORD 1 ]-------+-----------
oid | 12413
collname | br_FR@euro
collnamespace | 11
collowner | 10
collprovider | c
collisdeterministic | t
collencoding | 16
collcollate | br_FR@euro
collctype | br_FR@euro
colliculocale |
collversion | 2.35
pg_encoding_to_char | LATIN9

where collprovider c means libc and collversion 2.35.

Not exactly sure how that interacts with from here:

https://www.postgresql.org/docs/current/catalog-pg-collation.html

collversion text

Provider-specific version of the collation. This is recorded when the
collation is created and then checked when it is used, to detect changes
in the collation definition that could lead to data corruption.

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Adrian Klaver (#4)
Re: Q: error on updating collation version information

Am Sun, Dec 04, 2022 at 10:09:47AM -0800 schrieb Adrian Klaver:

following an ICU upgrade, collations in a stock Debian PG 15.1
cluster now have divergent version information in pg_collations.

Correction: this is following a libc upgrade 2.35 -> 2.36

So to be clear this database is not using ICU, but collations from libc?

Sorry for the confusion.

This database carries collations from _both_ libc and ICU in
pg_collations.

The collation in question (br_FR@euro) is _not_ in use (as in
being depended on by any in-database object).

How was the database installed?

stock Debian

apt-get install postgresql-15 (which gives 15.1)

followed by

CREATE DATABASE "gnumed_v22" with owner = "redacted :-)" template = "template1" encoding = 'unicode';

as "postgres".

In first post you had:

gnumed_v22=> select *, pg_encoding_to_char(collencoding) from pg_collation where
collname = 'br_FR@euro';
-[ RECORD 1 ]-------+-----------
oid | 12413
collname | br_FR@euro
collnamespace | 11
collowner | 10
collprovider | c
collisdeterministic | t
collencoding | 16
collcollate | br_FR@euro
collctype | br_FR@euro
colliculocale |
collversion | 2.35
pg_encoding_to_char | LATIN9

where collprovider c means libc and collversion 2.35.

Yeah, that's when I figured that I misspoke about the ICU upgrade.

Yes, there was an ICU upgrade, and yes, it did affect
collations. Those I was able to fix up (the "reindex /
revalidate constraint / refresh collation version" dance).

There also was a libc upgrade which also affected locales.
Most of them were fixable by that dance but some popped up
(such as br_FR@euro) to not be "correctable" showing the
"does not exist for encoding" error.

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#6Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Karsten Hilbert (#5)
Re: Q: error on updating collation version information

Is this to be expected ?

PG 15.1 on Debian:

gnumed_v22=# select *, pg_collation_actual_version(oid), pg_encoding_to_char(collencoding) from pg_collation where collname = 'zh_TW';
-[ RECORD 1 ]---------------+------------
oid | 12985
collname | zh_TW
collnamespace | 11
collowner | 10
collprovider | c
collisdeterministic | t
collencoding | 4
collcollate | zh_TW.euctw
collctype | zh_TW.euctw
colliculocale |
collversion | 2.35
pg_collation_actual_version | 2.36
pg_encoding_to_char | EUC_TW
-[ RECORD 2 ]---------------+------------
oid | 12986
collname | zh_TW
collnamespace | 11
collowner | 10
collprovider | c
collisdeterministic | t
collencoding | 6
collcollate | zh_TW.utf8
collctype | zh_TW.utf8
colliculocale |
collversion | 2.36
pg_collation_actual_version | 2.36
pg_encoding_to_char | UTF8

gnumed_v22=# begin;
BEGIN
gnumed_v22=*# alter collation pg_catalog."zh_TW" refresh version ;
NOTICE: version has not changed
ALTER COLLATION
gnumed_v22=*# alter collation pg_catalog."zh_TW.utf8" refresh version ;
NOTICE: version has not changed
ALTER COLLATION
gnumed_v22=*# alter collation pg_catalog."zh_TW.euctw" refresh version ;
ERROR: collation "pg_catalog.zh_TW.euctw" for encoding "UTF8" does not exist
gnumed_v22=!#

As far as I can tell the documentation asserts that since the
database encoding is UTF8 the pg_catalog."zh_TW.euctw" will
be ignored by the server for all practical purposes.

Does this mean it is impossible to "correct" its version
information ?

And if so, that is expected to be non-harmful and is not
expected to trigger nag messages ?

Thanks,
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#7Daniel Verite
daniel@manitou-mail.org
In reply to: Karsten Hilbert (#1)
Re: Q: error on updating collation version information

Karsten Hilbert wrote:

The database encoding is UTF8. That br_FR@euro.LATIN9 had
_not_ been added manually. It is also not actively used in my
database(s).

br_FR@euro.LATIN9 cannot be used actively in an UTF-8 database
because it's for a different encoding than the database.

It was probably available in the OS at initdb time. Every available
locale (see locale -a) gets imported into the template databases,
and then into the other databases, since CREATE DATABASE copies a
template without filtering out the locales that are incompatible
with the target database.

There's no need to upgrade anything in the OS to get the
ALTER COLLATION... REFRESH error you mention.
It's sufficient to have a mix of collations for different encodings
and try to refresh collations whose encoding are not compatible
with the current database.

What is the canonical advice on the way forward here ? Is
the _suggested_ solution to delete the collation or am I
missing to see the "proper" approach to fixing it ?

ISTM that dropping that collation from any non-LATIN9 database
is the more sensible action.

Maybe it could be discussed as a possible improvement to have
ALTER COLLATION... REFRESH ignore the database encoding and still
refresh the version number.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#8Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Daniel Verite (#7)
dropped default locale

Dear all,

I managed to drop the "special" collations default, C, and
POSIX with OIDs 100, 950, 951.

Is there a way to recreate them (short of restoring a backup)
? Naive attempts with create collation do not seem to work
out.

Thanks,
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Karsten Hilbert (#8)
Re: dropped default locale

On Wed, 2022-12-21 at 15:24 +0100, Karsten Hilbert wrote:

I managed to drop the "special" collations default, C, and
POSIX with OIDs 100, 950, 951.

Is there a way to recreate them (short of restoring a backup)
? Naive attempts with create collation do not seem to work
out.

I would definitely go for the backup, but here is how you can
create these three rows in PostgreSQL v15:

INSERT INTO pg_collation
(oid, collname, collnamespace, collowner, collprovider,
collisdeterministic, collencoding, collcollate, collctype)
VALUES
(100, 'default', 11, 10, 'd', TRUE, -1, NULL, NULL),
(950, 'C', 11, 10, 'c', TRUE, -1, 'C', 'C'),
(951, 'POSIX', 11, 10, 'c', TRUE, -1, 'POSIX', 'POSIX');

Yours,
Laurenz Albe

#10Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Laurenz Albe (#9)
Re: dropped default locale

Am Wed, Dec 21, 2022 at 03:41:24PM +0100 schrieb Laurenz Albe:

I would definitely go for the backup, but here is how you can
create these three rows in PostgreSQL v15:

INSERT INTO pg_collation
(oid, collname, collnamespace, collowner, collprovider,
collisdeterministic, collencoding, collcollate, collctype)
VALUES
(100, 'default', 11, 10, 'd', TRUE, -1, NULL, NULL),
(950, 'C', 11, 10, 'c', TRUE, -1, 'C', 'C'),
(951, 'POSIX', 11, 10, 'c', TRUE, -1, 'POSIX', 'POSIX');

Many thanks ! I wasn't so sure whether inserting appropriate
rows would be equivalent to create collation... (pg_collation
might have been a view projecting inner workings of the
server engine).

Thanks,
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#11Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Karsten Hilbert (#10)
Re: dropped default locale

Am Wed, Dec 21, 2022 at 03:46:48PM +0100 schrieb Karsten Hilbert:

I wasn't so sure whether inserting appropriate
rows would be equivalent to create collation...

For that matter, is DELETE FROM pg_collation ... equivalent
to DROP COLLATION ?

Thanks,
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Karsten Hilbert (#11)
Re: dropped default locale

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:

For that matter, is DELETE FROM pg_collation ... equivalent
to DROP COLLATION ?

There's also entries in pg_depend and pg_shdepend to worry
about.

For these built-in collations, as of v15 there are no such
entries, but prior versions had explicit "pin" entries.

regards, tom lane