Strange collation names ("hu_HU.UTF-8")

Started by Durumdaraalmost 4 years ago3 messagesgeneral

Jump to latest

Durumdara

durumdara@gmail.com

almost 4 years ago

Dear Members!

Today we found strange database collation names in a server (V11).

select -- datname,
distinct datcollate
from pg_database
order by datcollate --, datname;

"hu_HU.UTF-8"
"hu_HU.UTF8"
"hu_HU.utf8"

The PGAdmin also gives us these possible collations in the dialog.
Some of the databases were migrated from 9.6 by Python script, and we used
"hu_HU.UTF-8"
to create the empty databases before restoring them.

What I don't understand, that if I query for collations, I got only this:

SELECT *
FROM pg_collation
where upper(collname) like '%HU%' -- and upper(collname) like '%UTF%'
order by collname

hu_HU.utf8

This collation ("hu_HU.UTF-8") doesn't cause any problems, and PGSQL V11
accepts it.

The whole problem appeared when we wanted to copy a database to a new (with
defining the old as template).

*Error: new collation (hu_HU.utf8) is incompatible with the collation of
the template database (hu_HU.UTF-8)*

So maybe the UTF-8 isn't valid but the PG accepted that???
Or is it valid and inherited from 9.6?

Please help me a little bit! Thank you!

Best regards
dd

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Durumdara (#1)

Re: Strange collation names ("hu_HU.UTF-8")

Durumdara <durumdara@gmail.com> writes:

Today we found strange database collation names in a server (V11).

"hu_HU.UTF-8"
"hu_HU.UTF8"
"hu_HU.utf8"

Yeah, these are all the same so far as the operating system is
concerned. I believe most if not all variants of Unix are
permissive about the spelling of the encoding part.

What I don't understand, that if I query for collations, I got only this:
hu_HU.utf8

pg_collation generally contains only "canonical" spellings of the locale
names, because initdb builds it from what "locale -a" prints. However,
different OS releases may have different ideas about which encoding name
is canonical.

The whole problem appeared when we wanted to copy a database to a new (with
defining the old as template).
*Error: new collation (hu_HU.utf8) is incompatible with the collation of
the template database (hu_HU.UTF-8)*

The code that checks that isn't as permissive as libc. You can spell
it exactly the same, or if you wanted to live dangerously you could
manually update the template database's pg_database entry to use the
currently-canonical spelling. (I'd try that in a scratch installation
first ...)

There was some discussion not long ago about relaxing the check
for "same collation name" [1]/messages/by-id/fedc0205-c15b-e400-aa3f-e1d2a1285ddb@sourcepole.ch, but no one has written a patch.

regards, tom lane

[1]: /messages/by-id/fedc0205-c15b-e400-aa3f-e1d2a1285ddb@sourcepole.ch

Thomas Munro

thomas.munro@gmail.com

almost 4 years ago

In reply to: Tom Lane (#2)

Re: Strange collation names ("hu_HU.UTF-8")

On Wed, Aug 3, 2022 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe most if not all variants of Unix are
permissive about the spelling of the encoding part.

I've only seen glibc doing that downcase-and-strip-hyphens thing to
the codeset part of a locale name when looking for locale definition
files. Other systems like FreeBSD expect to be able to open
/usr/share/locale/$LC_COLLATE/LC_COLLATE directly without any kind of
munging. On a Mac it's probably a little fuzzy because the filenames
are case insensitive...