Re: unicode

Started by Tatsuo Ishiialmost 24 years ago3 messageshackers

ishii@postgresql.org

almost 24 years ago

The actual checking is done in INSERT/UPDATE/COPY. However, the
checking is currently very limited: every byte of a mutibyte character
must be greater than 0x7f.

Show quoted text

Tatsuo,

do I understand correctly that there is no checking for
convertion between local charset and unicode in insert and
checking is done only in select ?

test=# create table qq (a text);
CREATE TABLE
test=# \encoding koi8
test=# insert into qq values('бартунов');
INSERT 24617 1
test=# \encoding unicode
test=# select * from qq;
a
----------
п�п�я��п�п�
(1 row)

test=# \encoding unicode
test=# insert into qq values('бартунов');
INSERT 24618 1
test=# select * from qq;
a
----------
п�п�я��п�п�

(2 rows)

test=# \encoding koi8
test=# select * from qq;
WARNING: UtfToLocal: could not convert UTF-8 (0xc2c1). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xd2d4). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xd5ce). Ignored
WARNING: UtfToLocal: could not convert UTF-8 (0xcfd7). Ignored
a
----------
бартунов

(2 rows)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Import Notes

Reply to msg id not found: Pine.GSO.4.44.0209231710540.29085-100000@ra.sai.msu.suReference msg id not found: Pine.GSO.4.44.0209231710540.29085-100000@ra.sai.msu.su

Hannu Krosing

hannu@tm.ee

almost 24 years ago

In reply to: Tatsuo Ishii (#1)

Tatsuo Ishii kirjutas N, 26.09.2002 kell 03:37:

The actual checking is done in INSERT/UPDATE/COPY. However, the
checking is currently very limited: every byte of a mutibyte character
must be greater than 0x7f.

Where can I read about basic tech details of Unicode / Charset
Conversion / ...

I't like to find answers to the following (for database created using
UNICODE)

1. Where exactly are conversions between national charsets done

2. What is converyted (whole SQL statements or just data)

3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8,
UTF-16, UTF-32, ...)

4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ?

5. Are LIKE/SIMILAR aware of locale stuff ?

-------------
Hannu

Tatsuo Ishii

ishii@postgresql.org

almost 24 years ago

In reply to: Hannu Krosing (#2)

Where can I read about basic tech details of Unicode / Charset
Conversion / ...

I't like to find answers to the following (for database created using
UNICODE)

1. Where exactly are conversions between national charsets done

No "national charset" is in PostgreSQL. I assume you want to know
where frontend/backend encoding conversion happens. They are handled
by pg_server_to_client(does conversion BE to FE) and
pg_client_to_server(FE to BE). These functions are called by the
communication sub system(backend/libpq) and COPY. In summary, in most
cases the encoding conversion is done before the parser and after the
executor produces the final result.

2. What is converyted (whole SQL statements or just data)

Whole statement.

3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8,
UTF-16, UTF-32, ...)

"format"? I assume you are talking about the encoding.

It is exactly same as the database encoding. For UNICODE database, we
use UTF-8. Not UCS-2 nor UCS-4.

4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ?

Ditto.

5. Are LIKE/SIMILAR aware of locale stuff ?

I don't know about SIMILAR, but I believe LIKE is not locale aware and
is correct from the standard's point of view...
--
Tatsuo Ishii