bug: LC_CTYPE=en_US.UTF-8 confuses query planner

Started by Dmitry Karasikalmost 21 years ago2 messageshackers
Jump to latest
#1Dmitry Karasik
dk@catpipe.net

I encountered a bug where the same query behaves differently
under different LC_CTYPE settings, "C" and "en_US.UTF-8".

The query is of type SELECT ... WHERE a like 'x' and b like 'y', where relevant
indexes exist for a and b, and 'x' and 'y' strings do not contain the %
character. When database is initdb'ed with LC_CTYPE=C, the query uses index
scan; when LC_CTYPE=en_US.UTF-8 it is the sequential scan. The table is large,
so it doesn't seem that planner selects seqscan out of performance reasons.
Also, I think this is a bug since when the query contains only one 'like'
statement, the query planner does use the index, no matter what $LC_CTYPE
value is.

Details:
pgsql 8.0.3

LC_CTYPE=C:
# explain select * from queues where username like 'a' and hostname like 'b';
QUERY PLAN
------------------------------------------------------------------------------------------
Index Scan using queues_idx_hostname_time on queues (cost=0.00..11.48 rows=1 width=161)
Index Cond: (hostname = 'b'::text)
Filter: ((username ~~ 'a'::text) AND (hostname ~~ 'b'::text))
(3 rows)

LC_CTYPE=en_US.UTF-8:
# explain select * from queues where username like 'a' and hostname like 'b';
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on queues (cost=100000000.00..100000016.15 rows=1 width=161)
Filter: ((username ~~ 'a'::text) AND (hostname ~~ 'b'::text))
(2 rows)

# \d queues
...
username | text | not null
hostname | text | not null

--
Sincerely,
Dmitry Karasik

---
catpipe Systems ApS
*BSD solutions, consulting, development
www.catpipe.net
+45 7021 0050

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Karasik (#1)
Re: bug: LC_CTYPE=en_US.UTF-8 confuses query planner

Dmitry Karasik <dk@catpipe.net> writes:

When database is initdb'ed with LC_CTYPE=C, the query uses index
scan; when LC_CTYPE=en_US.UTF-8 it is the sequential scan.

This is in the FAQ:

When using wild-card operators such as LIKE or ~, indexes can only be
used in certain circumstances:
...
* The default C locale must be used during initdb because it is not
possible to know the next-greatest character in a non-C locale.
You can create a special text_pattern_ops index for such cases
that work only for LIKE indexing.

regards, tom lane