Different result when using '=' and 'like' in unicode mode

Started by Jiang Shengover 23 years ago4 messagesbugs
Jump to latest
#1Jiang Sheng
jiangsheng@sis.sh.cn

I used PostgreSQL 7.2.1 on redhat 7.3, installed using the rpm from redhat,.
When I createdb -E UNICODE, and import data using \copy command, after
converted all the imported text files to UTF-8 encoding. It looks well, I
can get
the data from several platforms in different locale. But, a strange problem,
when i using '=' in where clause, to match the varchar value, the result is
not correct,
for example, the table structure is

Table "tbl_test"
Column | Type | Modifiers
-------------+-----------------------+-----------
id | integer |
name | character varying(40) |

the sql is
select * from tbl_test where name = '石田';

but the result contains other data, such as 石町, 柳町, 柳田

and I changed the sql to
select * from tbl_test where name like '石田';
select * from tbl_test where name ilike '石田';

the results both correct.

I don't know the reason of this problem, and in my application it not
allowed
to use 'like' replaced '=', the data amount is too large.

Is this a bug of PostgreSQL? or only a problem in redhat environment? and
how to fix it?

Thx,
Jiang Sheng

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Jiang Sheng (#1)
Re: Different result when using '=' and 'like' in unicode

Jiang Sheng writes:

the sql is
select * from tbl_test where name = '石田';

but the result contains other data, such as 石町, 柳町, 柳田

and I changed the sql to
select * from tbl_test where name like '石田';
select * from tbl_test where name ilike '石田';

the results both correct.

The LIKE operator uses a character-by-character comparison (which could
quite possibly behave oddly if your strings contain combining characters,
but that is a different issue). The = operator uses the host operating
system's locale facilities to do a locale-aware comparison. So in general
LIKE and = are not interchangeable.

If the result you get for = is wrong even under some locale, then you
probably don't have the right locale set on your server. Recent glibc
systems require you to set your locale to xx_YY.utf8 (rather than just
xx_YY) if you want to use Unicode.

--
Peter Eisentraut peter_e@gmx.net

#3Jiang Sheng
jiangsheng@sis.sh.cn
In reply to: Peter Eisentraut (#2)
Re: Different result when using '=' and 'like' in unicode mode

Thanks a lot!
In your opinion, should I re-compile the PostgreSQL after set the $LANG
to ja_JP.utf8? or I need to set other variables' value..
I changed the $LANG value, but the result still not correct, and not found
anything about it in PostgrSQL's document.

Thx
Jiang Sheng

----- Original Message -----
送信者 : "Peter Eisentraut" <peter_e@gmx.net>
宛先 : "Jiang Sheng" <jiangsheng@sis.sh.cn>
Cc: <pgsql-bugs@postgresql.org>
送信日時 : 2002年11月18日 6:21
件名 : Re: [BUGS] Different result when using '=' and 'like' in unicode mode

Show quoted text

Jiang Sheng writes:

the sql is
select * from tbl_test where name = '石田';

but the result contains other data, such as 石町, 柳町, 柳田

and I changed the sql to
select * from tbl_test where name like '石田';
select * from tbl_test where name ilike '石田';

the results both correct.

The LIKE operator uses a character-by-character comparison (which could
quite possibly behave oddly if your strings contain combining characters,
but that is a different issue). The = operator uses the host operating
system's locale facilities to do a locale-aware comparison. So in general
LIKE and = are not interchangeable.

If the result you get for = is wrong even under some locale, then you
probably don't have the right locale set on your server. Recent glibc
systems require you to set your locale to xx_YY.utf8 (rather than just
xx_YY) if you want to use Unicode.

--
Peter Eisentraut peter_e@gmx.net

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Jiang Sheng (#3)
Re: Different result when using '=' and 'like' in unicode

Jiang Sheng writes:

In your opinion, should I re-compile the PostgreSQL after set the $LANG
to ja_JP.utf8? or I need to set other variables' value..

Recompiling has nothing to do with that. Please see the Administrator's
Guide chapter on localization for all the information.

--
Peter Eisentraut peter_e@gmx.net