Cyrillic to UNICODE conversion

Started by Victor Wagneralmost 25 years ago4 messageshackers
Jump to latest
#1Victor Wagner
vitus@ice.ru

Despite of advertized support of Unicode to other charset conversion,
PostgreSQL-7.1 reports that Conversion of UNICODE to KOI8 is not
supported. Same for WIN, ALT and other charsets.

As I found out, it was simply forgotten to add these charsets to list
of 8-bit charsets which should be converted. May be becouse their maps
are stored in another directory on ftp.unicode.org (see VENDORS/MicroSoft
for cp1251 and cp866 maps, and somewhere else for KOI8-R.TXT. At least all
those maps are included in the catdoc distribution)

Attached patch fixes this problem. It adds script UCS_to_cyrillic.pl
into src/backend/utils/mb/Unicode directory. Mapping of the PostgreSQL
charset names to filenames (as they appear in catdoc distribution, i.e.
lowercased) is hardcoded into script. It is almost exact copy of
UCS_to_iso script, with only file and constant names changed.

Generated maps are included in the patch, as they are included in the
source tarball, and maps are omitted, becouse they are removed by
make distclean

file src/backend/mb/conv.c is modified
to include new maps and provide appropriate conversion functions

--
Victor Wagner vitus@ice.ru
Chief Technical Officer Office:7-(095)-748-53-88
Communiware.Net Home: 7-(095)-135-46-61
http://www.communiware.net http://www.ice.ru/~vitus

Attachments:

cyr-unicode.patch.gzapplication/octet-stream; name=cyr-unicode.patch.gzDownload
#2Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Victor Wagner (#1)
Re: Cyrillic to UNICODE conversion

Thanks for the fixes. I have committed your patches and they should
appear in 7.1.1.

BTW, I have not added cp1251.txt cp866.txt koi8-r.txt, since they
come from Unicode.org and are not permitted to re-distribute.
--
Tatsuo Ishii

From: Victor Wagner <vitus@ice.ru>
Subject: [PATCHES] Cyrillic to UNICODE conversion
Date: Thu, 26 Apr 2001 20:51:25 +0400 (MSD)
Message-ID: <Pine.LNX.4.30.0104262041500.9539-101000@party.ice.ru>

Show quoted text

Despite of advertized support of Unicode to other charset conversion,
PostgreSQL-7.1 reports that Conversion of UNICODE to KOI8 is not
supported. Same for WIN, ALT and other charsets.

As I found out, it was simply forgotten to add these charsets to list
of 8-bit charsets which should be converted. May be becouse their maps
are stored in another directory on ftp.unicode.org (see VENDORS/MicroSoft
for cp1251 and cp866 maps, and somewhere else for KOI8-R.TXT. At least all
those maps are included in the catdoc distribution)

Attached patch fixes this problem. It adds script UCS_to_cyrillic.pl
into src/backend/utils/mb/Unicode directory. Mapping of the PostgreSQL
charset names to filenames (as they appear in catdoc distribution, i.e.
lowercased) is hardcoded into script. It is almost exact copy of
UCS_to_iso script, with only file and constant names changed.

Generated maps are included in the patch, as they are included in the
source tarball, and maps are omitted, becouse they are removed by
make distclean

file src/backend/mb/conv.c is modified
to include new maps and provide appropriate conversion functions

--
Victor Wagner vitus@ice.ru
Chief Technical Officer Office:7-(095)-748-53-88
Communiware.Net Home: 7-(095)-135-46-61
http://www.communiware.net http://www.ice.ru/~vitus

#3Victor Wagner
vitus@ice.ru
In reply to: Tatsuo Ishii (#2)
Re: Cyrillic to UNICODE conversion

On Sun, 29 Apr 2001, Tatsuo Ishii wrote:

From: Tatsuo Ishii <t-ishii@sra.co.jp>
Subject: Re: [PATCHES] Cyrillic to UNICODE conversion
X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1
[iso-2022-jp] (^[$B0*^[(B)

Thanks for the fixes. I have committed your patches and they should
appear in 7.1.1.

BTW, I have not added cp1251.txt cp866.txt koi8-r.txt, since they
come from Unicode.org and are not permitted to re-distribute.

It is not true for koi8-r.txt. At least one which is included into catdoc
distribution I've made myself from RFC1483, and only afterward it has
appear on unicode.org, and Chernov's KOI8 pages.

But anyway, if anybody
is able to get them from unicode.org, why bother.
--
Victor Wagner vitus@ice.ru
Chief Technical Officer Office:7-(095)-748-53-88
Communiware.Net Home: 7-(095)-135-46-61
http://www.communiware.net http://www.ice.ru/~vitus

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Victor Wagner (#3)
Re: Cyrillic to UNICODE conversion

BTW, I have not added cp1251.txt cp866.txt koi8-r.txt, since they
come from Unicode.org and are not permitted to re-distribute.

It is not true for koi8-r.txt. At least one which is included into catdoc
distribution I've made myself from RFC1483, and only afterward it has
appear on unicode.org, and Chernov's KOI8 pages.

Oh, I didn't know that.

But anyway, if anybody
is able to get them from unicode.org, why bother.

Agreed.
--
Tatsuo Ishii