BUG #2554: ILIKE operator works incorrectly
The following bug has been logged online:
Bug reference: 2554
Logged by: Jarosław Bojar
Email address: jarek.bojar@gmail.com
PostgreSQL version: 8.1.4
Operating system: i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 4.1.0
Description: ILIKE operator works incorrectly
Details:
ILIKE operator works incorrectly with UTF8 encoding and Polish characters.
Consider following SQL statements:
CREATE DATABASE test ENCODING='UTF8';
\c test
CREATE TABLE the_table (val VARCHAR(50));
INSERT INTO the_table (val) VALUES ('Świat');
INSERT INTO the_table (val) VALUES ('Łąka');
INSERT INTO the_table (val) VALUES ('Ćma');
INSERT INTO the_table (val) VALUES ('abc');
INSERT INTO the_table (val) VALUES ('ABC');
Without Polish characters ILIKE works correctly:
SELECT * FROM the_table WHERE val ilike 'abc';
val
-----
abc
ABC
(2 rows)
But with Polish characters it does not work correctly. Following queries
should give single row results, but they do not return any rows:
SELECT * FROM the_table WHERE val ilike 'świat';
val
-----
(0 rows)
SELECT * FROM the_table WHERE val ilike 'łąka';
val
-----
(0 rows)
SELECT * FROM the_table WHERE val ilike 'ćma';
val
-----
(0 rows)
On the contrary functions like UPPER work correctly with Polish characters
and following queries produce correct results:
SELECT * FROM the_table WHERE UPPER(val) like UPPER('świat');
val
-------
Świat
(1 row)
SELECT * FROM the_table WHERE UPPER(val) like UPPER('ćma');
val
-----
Ćma
(1 row)
SELECT * FROM the_table WHERE UPPER(val) like UPPER('łąka');
val
------
Łąka
(1 row)
The bug is also present in PostgreSQL 8.1.0 on Windows XP.
Jaros�aw Bojar wrote:
ILIKE operator works incorrectly with UTF8 encoding and Polish
characters.
What does SHOW lc_collate say?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes:
Jaros�aw Bojar wrote:
ILIKE operator works incorrectly with UTF8 encoding and Polish
characters.
What does SHOW lc_collate say?
Even if the locale is set right, iwchareq() is completely broken
for multibyte charsets: it's trying to apply the system's tolower()
to a pg_wchar, which is highly unlikely to work. This has been a
known problem for awhile, but no one's stepped up to fix it.
regards, tom lane