BUG #17170: Invalid collation created with provider icu and initdb' locale C

Started by PG Bug reporting formover 4 years ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17170
Logged by: Andrew Bille
Email address: andrewbille@gmail.com
PostgreSQL version: 14beta3
Operating system: centos-7
Description:

Hi!
When using postgres built "--with-icu" I get an invalid collation with the
following script:
initdb -D db --locale=C
createdb testdb
psql testdb -c "CREATE COLLATION test (provider=icu,
locale='en_US.utf8');"
psql testdb -c "SELECT 'a' < 'b' COLLATE test;"
ERROR: collation "test" for encoding "SQL_ASCII" does not exist
LINE 1: SELECT 'a' < 'b' COLLATE test;
^
or
psql testdb -c "DROP COLLATION test;"
ERROR: collation "test" for encoding "SQL_ASCII" does not exist

Reproduced on REL_10_STABLE..REL_14_STABLE.

Thanks!

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #17170: Invalid collation created with provider icu and initdb' locale C

PG Bug reporting form <noreply@postgresql.org> writes:

When using postgres built "--with-icu" I get an invalid collation with the
following script:
initdb -D db --locale=C
createdb testdb
psql testdb -c "CREATE COLLATION test (provider=icu,
locale='en_US.utf8');"
psql testdb -c "SELECT 'a' < 'b' COLLATE test;"
ERROR: collation "test" for encoding "SQL_ASCII" does not exist
LINE 1: SELECT 'a' < 'b' COLLATE test;
^

That error doesn't seem inappropriate; ICU collations don't work
in SQL_ASCII encoding.

psql testdb -c "DROP COLLATION test;"
ERROR: collation "test" for encoding "SQL_ASCII" does not exist

That, however, is a bit annoying. If you can create the collation
you should be able to drop it.

The immediate issue is that DROP COLLATION uses the same lookup
rule as other code paths, so it fails to see inapplicable
pg_collation entries at all. So one approach to a fix is to
relax its lookup rule. But I wonder if we should have prevented
the CREATE in the first place, instead.

Unfortunately, either approach is likely to still have some
edge-case odd behaviors :-(. This whole business of having the
database encoding be implicitly part of the lookup rule is just
prone to that.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: BUG #17170: Invalid collation created with provider icu and initdb' locale C

I wrote:

The immediate issue is that DROP COLLATION uses the same lookup
rule as other code paths, so it fails to see inapplicable
pg_collation entries at all. So one approach to a fix is to
relax its lookup rule. But I wonder if we should have prevented
the CREATE in the first place, instead.

After a bit of experimentation, I'm inclined to the idea that
we should have rejected the CREATE COLLATION in the first place.
It's not very useful to create a collation that you're not going
to be able to use, and I think it's more user-friendly to say so
up front than to leave the user guessing about why subsequent
references don't work.

There is arguably one use-case that this fix shuts off: manually
adding ICU-based collations to template0, in the expectation
that they could get cloned into other databases where they'd be
useful. But that seems like a mighty thin argument, especially
since you could initdb with an ICU-compatible encoding if you
want to do that.

regards, tom lane

Attachments:

prevent-creating-useless-ICU-collations.patchtext/x-diff; charset=us-ascii; name=prevent-creating-useless-ICU-collations.patchDownload+12-0