Wrong results with equality search using trigram index and non-deterministic collation
Using a trigram index with an non-deterministic collation can
lead to wrong query results:
CREATE COLLATION faux_cn (PROVIDER = icu, LOCALE = 'und', DETERMINISTIC = FALSE, RULES = '&l = r');
CREATE TABLE boom (id integer PRIMARY KEY, t text COLLATE faux_cn);
INSERT INTO boom VALUES (1, 'right'), (2, 'light');
SELECT * FROM boom WHERE t = 'right';
id │ t
════╪═══════
1 │ right
2 │ light
(2 rows)
CREATE INDEX ON boom USING gin (t gin_trgm_ops);
SET enable_seqscan = off;
SELECT * FROM boom WHERE t = 'right';
id │ t
════╪═══════
1 │ right
(1 row)
I also see questionable results with the similarity operator (with and
without the index):
SELECT * FROM boom WHERE t % 'rigor';
id │ t
════╪═══════
1 │ right
(1 row)
But here you could argue that the operator ignores the collation, so
the result is correct. With equality, there is no such loophole.
I don't know what the correct fix would be. Perhaps just refusing to use
the index for equality comparisons with non-deterministic collations.
Yours,
Laurenz Albe
On Tue, 2024-09-17 at 08:00 +0200, Laurenz Albe wrote:
Using a trigram index with an non-deterministic collation can
lead to wrong query results:
[...]I don't know what the correct fix would be. Perhaps just refusing to use
the index for equality comparisons with non-deterministic collations.
Looking into fixing that, how can you tell the optimizer to consider
a certain index only for certain collations?
Yours,
Laurenz Albe