PostgreSQL

Started by Eugeny Balakhonovover 22 years ago7 messagesgeneral
Jump to latest

Hello, all!

I have a good question for PostgreSQL FAQ.

How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?

Thanks for your answers!

Best regards
Eugeny

In reply to: Eugeny Balakhonov (#1)
Re: PostgreSQL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I confirm this behavour: cyrilic words are not changed by lower()/upper()
functions, nor catched by ilike.

I am using :
=> SELECT version();
version
- ---------------------------------------------------------------
PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2
(1 row)

Nothing special was done during database creation (no encoding selected).

Not sure. I thought it would work.

How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database
which contains non-latin characters (like cyrillic)? How to make case
insensetive search by text field which contains non-latin characters?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV
L9TbSItEb0tAC7cI0cKwg6A=
=veHN
-----END PGP SIGNATURE-----

#3Bruce Momjian
bruce@momjian.us
In reply to: Eugeny Balakhonov (#1)
Re: PostgreSQL

Not sure. I thought it would work.

---------------------------------------------------------------------------

Eugeny Balakhonov wrote:

Hello, all!

I have a good question for PostgreSQL FAQ.

How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?

Thanks for your answers!

Best regards
Eugeny

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Bruce Momjian (#3)
Re: PostgreSQL

On Mon, 11 Aug 2003, Bruce Momjian wrote:

Not sure. I thought it would work.

No, it doesn't works. Several people already complained about bad
unicode support. I recall Tatsuo comment some piece of code.
I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html
about my experience with UTF8 and cyrillic.

---------------------------------------------------------------------------

Eugeny Balakhonov wrote:

Hello, all!

I have a good question for PostgreSQL FAQ.

How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?

Thanks for your answers!

Best regards
Eugeny

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#5Bruce Momjian
bruce@momjian.us
In reply to: Oleg Bartunov (#4)
Re: PostgreSQL

Well, I have no mention of this problem in the TODO list, so I would
like to get a good description of why it isn't working.

Looking at the code, I see upper() is defined in oracle_compat.c (you
would think it would be more standard), and it calls toupper(), so it
probably works on single-bytes encodings, but not multi-byte ones. Is
this correct? is there a way to do multi-byte toupper? Perhaps
converting to wide characters and calling towupper()?

---------------------------------------------------------------------------

Oleg Bartunov wrote:

On Mon, 11 Aug 2003, Bruce Momjian wrote:

Not sure. I thought it would work.

No, it doesn't works. Several people already complained about bad
unicode support. I recall Tatsuo comment some piece of code.
I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html
about my experience with UTF8 and cyrillic.

---------------------------------------------------------------------------

Eugeny Balakhonov wrote:

Hello, all!

I have a good question for PostgreSQL FAQ.

How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?

Thanks for your answers!

Best regards
Eugeny

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#6Dennis Gearon
gearond@cvc.net
In reply to: Bruce Momjian (#5)
Re: PostgreSQL

I think if Postgres were to be completely UTF8 compatible, and as the default configuration, we'd do a lot better against 'the others', and take more of Oracle's market.

Bruce Momjian wrote:

Show quoted text

Well, I have no mention of this problem in the TODO list, so I would
like to get a good description of why it isn't working.

Looking at the code, I see upper() is defined in oracle_compat.c (you
would think it would be more standard), and it calls toupper(), so it
probably works on single-bytes encodings, but not multi-byte ones. Is
this correct? is there a way to do multi-byte toupper? Perhaps
converting to wide characters and calling towupper()?

---------------------------------------------------------------------------

Oleg Bartunov wrote:

On Mon, 11 Aug 2003, Bruce Momjian wrote:

Not sure. I thought it would work.

No, it doesn't works. Several people already complained about bad
unicode support. I recall Tatsuo comment some piece of code.
I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html
about my experience with UTF8 and cyrillic.

---------------------------------------------------------------------------

Eugeny Balakhonov wrote:

Hello, all!

I have a good question for PostgreSQL FAQ.

How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?

Thanks for your answers!

Best regards
Eugeny

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

#7Bruce Momjian
bruce@momjian.us
In reply to: Alexander Litvinov (#2)
Re: PostgreSQL

Added to TODO:

* Fix upper()/lower() to work for multibyte encodings

---------------------------------------------------------------------------

Alexander Litvinov wrote:
[ PGP not available, raw data follows ]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I confirm this behavour: cyrilic words are not changed by lower()/upper()
functions, nor catched by ilike.

I am using :
=> SELECT version();
version
- ---------------------------------------------------------------
PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2
(1 row)

Nothing special was done during database creation (no encoding selected).

Not sure. I thought it would work.

How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database
which contains non-latin characters (like cyrillic)? How to make case
insensetive search by text field which contains non-latin characters?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV
L9TbSItEb0tAC7cI0cKwg6A=
=veHN
-----END PGP SIGNATURE-----

[ End of raw data]

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073