Lower case

Started by Vladimir S. Petukhovabout 21 years ago6 messagesgeneral
Jump to latest
#1Vladimir S. Petukhov
vladimir@sycore.org

Hi!

Sorry for my English..

I want to do case-insensitivity search, like this
... WHERE lower (column_name) LIKE lower (%value%);
This work fine for English..
But i need search for Russian words, lower() operator does not work with
Russian (non-English) chars, but ORDER works fine...
???????????????
What's wrong?

--
It is better to travel hopefully than to fly Continental.

Now playing: 16 - ?????????.mp3
AutoGenerated by fortune & xmms...

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Vladimir S. Petukhov (#1)
Re: Lower case

Vladimir,

there is pgsql-ru-general mailing list for russian speaking people

Oleg
On Wed, 26 Jan 2005, Vladimir S. Petukhov wrote:

Hi!

Sorry for my English..

I want to do case-insensitivity search, like this
... WHERE lower (column_name) LIKE lower (%value%);
This work fine for English..
But i need search for Russian words, lower() operator does not work with
Russian (non-English) chars, but ORDER works fine...
???????????????
What's wrong?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vladimir S. Petukhov (#1)
Re: Lower case

"Vladimir S. Petukhov" <vladimir@sycore.org> writes:

pg_controldata /var/pgsql/data
...
LC_COLLATE: ru_RU
LC_CTYPE: ru_RU

bash-2.05b# psql -l
List of databases
Name | Owner | Encoding
-----------+----------+----------
testdb | postgres | UNICODE

And LIKE, ILIKE, ~ do not recognize upper/lower case..

What character encoding is implied by those LC_ settings on your machine?
If it's different from the database encoding (here utf8) these things
won't actually work right.

Also, before PG 8.0 upper/lower simply don't work on multibyte characters,
which means you'd have to use a single-byte encoding to make it work
(I think koi8 is single-byte but not sure).

regards, tom lane

#4Vladimir S. Petukhov
vladimir@sycore.org
In reply to: Tom Lane (#3)
Re: Lower case

On Wednesday 26 January 2005 20:01, you wrote:

"Vladimir S. Petukhov" <vladimir@sycore.org> writes:

pg_controldata /var/pgsql/data
...
LC_COLLATE: ru_RU
LC_CTYPE: ru_RU

bash-2.05b# psql -l
List of databases
Name | Owner | Encoding
-----------+----------+----------
testdb | postgres | UNICODE

And LIKE, ILIKE, ~ do not recognize upper/lower case..

What character encoding is implied by those LC_ settings on your machine?
If it's different from the database encoding (here utf8) these things
won't actually work right.

LANG=ru_RU.koi8r
LC_ALL=ru_RU.koi8r
But how it act on lower/upper cases? Client use utf-8 encoding...

Also, before PG 8.0 upper/lower simply don't work on multibyte characters,
which means you'd have to use a single-byte encoding to make it work
(I think koi8 is single-byte but not sure).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
You feel a whole lot more like you do now than you did when you used to.

Now playing:
AutoGenerated by fortune & xmms...

#5Mihail Nasedkin
m.nasedkin.perm@mail.ru
In reply to: Oleg Bartunov (#2)
Re: [GENERAL] Lower case

пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ, пїЅпїЅпїЅ.

пїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ (пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅ 26 пїЅпїЅпїЅпїЅпїЅпїЅ 2005 пїЅ., 17:35:00)

OB> On Wed, 26 Jan 2005, Vladimir S. Petukhov wrote:

I want to do case-insensitivity search, like this
... WHERE lower (column_name) LIKE lower (%value%);
This work fine for English..
But i need search for Russian words, lower() operator does not work with
Russian (non-English) chars, but ORDER works fine...
???????????????
What's wrong?

пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ, пїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ PostgeSQL пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅ?

--
пїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ,
Mihail mailto:m.nasedkin.perm@mail.ru

#6Dawid Kuroczko
qnex42@gmail.com
In reply to: Vladimir S. Petukhov (#4)
Re: Lower case

On Thu, 27 Jan 2005 00:16:14 +0000, Vladimir S. Petukhov
<vladimir@sycore.org> wrote:

LC_COLLATE: ru_RU
LC_CTYPE: ru_RU
Name | Owner | Encoding
-----------+----------+----------
testdb | postgres | UNICODE
And LIKE, ILIKE, ~ do not recognize upper/lower case..

What character encoding is implied by those LC_ settings on your machine?
If it's different from the database encoding (here utf8) these things
won't actually work right.

LANG=ru_RU.koi8r
LC_ALL=ru_RU.koi8r
But how it act on lower/upper cases? Client use utf-8 encoding...

The client uses utf-8 encoding, so does server. Texts are stored
using UTF-8. However when you call a lower() function from
PostgreSQL it does more or less following:
-- it retrieves text row from database. This text is in UTF-8 encoding.
-- it calls strxfrm function upon this text.
-- strxfrm function sees that current locale is ru_RU.koi8r
-- strxfrm then takes utf-8 encoded text and treats it as koi8r
-- strxfrm "skips over" characters it does not recognize (utf-8 chars)
-- strxfrm returns transformed text
-- PostgreSQL takes the resulting text, believing it is still in utf-8.
In other words, probably only latin characters were subject to lower()
functions, any "unknown" Russian UTF-8 characters were at best
skipped.

Please note that PostgreSQL does not do implicit utf8->koi8r->utf8
conversion while calling function lower(). AFAIK it does not even
know (or care) if current locale setting ("ru_RU") is for different
encoding than current database's. It is DB Admin's duty to make
sure cluster locale (done in initdb) is compatible with database
encoding (done in createdb).

Regards,
Dawid