SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE

Started by Jeff Lanzarottaover 6 years ago3 messagesgeneral
Jump to latest
#1Jeff Lanzarotta
jeff.lanzarotta@gmail.com

Hello,

Not sure if my original email was actually received or not. If it was,
forgive the repost...

I have a question about nondeterministic collations in PostgreSQL 12. I
have created a new collation that is nondeterministic and created several
columns which use this collation. Querying these columns works great until
I use LIKE. When I do, I get the following error:

SQL Error [0A000]: ERROR: nondeterministic collations are not supported for
LIKE

Is there any plan to allow this functionality?

Thanks,

#2Daniel Verite
daniel@manitou-mail.org
In reply to: Jeff Lanzarotta (#1)
Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE

Jeff Lanzarotta wrote:

I have a question about nondeterministic collations in PostgreSQL 12. I
have created a new collation that is nondeterministic and created several
columns which use this collation. Querying these columns works great until
I use LIKE. When I do, I get the following error:

SQL Error [0A000]: ERROR: nondeterministic collations are not supported for
LIKE

Is there any plan to allow this functionality?

PostgreSQL development is conducted without a roadmap [1]https://www.postgresql.org/developer/roadmap/. Maybe
someone will submit a patch to enable LIKE with nondeterministic
collations, but so far it did not happen according to the current set
of patches at https://commitfest.postgresql.org

Such matches can be weirder than you might think (not to
mention much slower).
Consider for instance a collation that ignores punctuation:

CREATE COLLATION "nd3alt" (
provider = 'icu',
locale='und@colAlternate=shifted',
deterministic = false
);

In the icu_ext extension, icu_strpos [2]https://github.com/dverite/icu_ext#icu_strpos can match a substring with
a nondeterministic collation, which is one part of what LIKE
would need to do for such collations. The implementation uses
the string search facility of the ICU library.

With the above-defined collation, we can have for instance:

SELECT icu_strpos('abc. ...de', 'c,d' COLLATE nd3alt);
icu_strpos
------------
3

So even though 'c,d' is not a substring of 'abc. ...de' in the common
sense, it is recognized as such by this collation, by design.

A LIKE operator for nondeterministic collations should be able to
recognize this too, but with an arbitrary number of substrings to
match in the pattern, plus it should handle the underscore wildcard
in a way that hopefully makes sense.

With the example above,
'abc. ...de' LIKE '%c,d%' COLLATE nd3alt
should certainly be a match, but in the case of this variant:
'abc. ...de' LIKE '%c_d%' COLLATE nd3alt
it's not necessarily clear how (or even if) it should work.

[1]: https://www.postgresql.org/developer/roadmap/
[2]: https://github.com/dverite/icu_ext#icu_strpos

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#3Jeff Lanzarotta
jeff.lanzarotta@gmail.com
In reply to: Daniel Verite (#2)
Re: SQL Error [0A000]: ERROR: nondeterministic collations are not supported for LIKE

Daniel, thanks for the reply. I believe we will just remove the collation,
allow LIKE to function normally, and wait for a future patch is one is ever
provided.

On Thu, Oct 24, 2019 at 3:39 PM Daniel Verite <daniel@manitou-mail.org>
wrote:

Show quoted text

Jeff Lanzarotta wrote:

I have a question about nondeterministic collations in PostgreSQL 12. I
have created a new collation that is nondeterministic and created several
columns which use this collation. Querying these columns works great

until

I use LIKE. When I do, I get the following error:

SQL Error [0A000]: ERROR: nondeterministic collations are not supported

for

LIKE

Is there any plan to allow this functionality?

PostgreSQL development is conducted without a roadmap [1]. Maybe
someone will submit a patch to enable LIKE with nondeterministic
collations, but so far it did not happen according to the current set
of patches at https://commitfest.postgresql.org

Such matches can be weirder than you might think (not to
mention much slower).
Consider for instance a collation that ignores punctuation:

CREATE COLLATION "nd3alt" (
provider = 'icu',
locale='und@colAlternate=shifted',
deterministic = false
);

In the icu_ext extension, icu_strpos [2] can match a substring with
a nondeterministic collation, which is one part of what LIKE
would need to do for such collations. The implementation uses
the string search facility of the ICU library.

With the above-defined collation, we can have for instance:

SELECT icu_strpos('abc. ...de', 'c,d' COLLATE nd3alt);
icu_strpos
------------
3

So even though 'c,d' is not a substring of 'abc. ...de' in the common
sense, it is recognized as such by this collation, by design.

A LIKE operator for nondeterministic collations should be able to
recognize this too, but with an arbitrary number of substrings to
match in the pattern, plus it should handle the underscore wildcard
in a way that hopefully makes sense.

With the example above,
'abc. ...de' LIKE '%c,d%' COLLATE nd3alt
should certainly be a match, but in the case of this variant:
'abc. ...de' LIKE '%c_d%' COLLATE nd3alt
it's not necessarily clear how (or even if) it should work.

[1] https://www.postgresql.org/developer/roadmap/
[2] https://github.com/dverite/icu_ext#icu_strpos

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite