Unicode mapping scripts cleanup

Started by Peter Eisentrautover 10 years ago10 messages
#1Peter Eisentraut
peter_e@gmx.net
10 attachment(s)

Here is a series of patches to clean up the Unicode mapping script
business in src/backend/utils/mb/Unicode/. It overlaps with the
perlcritic work that I recently wrote about, except that these pieces
are not strictly related to Perl, but wrong comments, missing makefile
pieces, and such.

I discovered that some of the source files that one is supposed to
download don't exist anymore or are labeled obsolete. Also, running the
scripts produces slight differences in the output. So apparently, the
CJK to Unicode mappings are still evolving and should be updated
occasionally. Next steps would be to commit some or all of these
differences after additional verification, and then update the scripts
to use whatever the non-obsolete mapping sources are supposed to be.

Attachments:

0001-UCS_to_most.pl-Make-executable-for-consistency-with-.patchtext/x-patch; name=0001-UCS_to_most.pl-Make-executable-for-consistency-with-.patchDownload
>From 2c96b1154c300325654735984f4268df7cc6efcd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:12 -0400
Subject: [PATCH 01/10] UCS_to_most.pl: Make executable, for consistency with
 other scripts

---
 src/backend/utils/mb/Unicode/UCS_to_most.pl | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 src/backend/utils/mb/Unicode/UCS_to_most.pl

diff --git a/src/backend/utils/mb/Unicode/UCS_to_most.pl b/src/backend/utils/mb/Unicode/UCS_to_most.pl
old mode 100644
new mode 100755
-- 
2.5.1

0002-Fix-comments.patchtext/x-patch; name=0002-Fix-comments.patchDownload
>From 374d2954b31867e7ffe4ed183f9a9cd7a098cb9b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:12 -0400
Subject: [PATCH 02/10] Fix comments

Some of these comments were copied and pasted without updating them,
some of them were duplicates.
---
 src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl | 6 +-----
 src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl | 4 ----
 src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl | 6 +-----
 src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl | 6 +-----
 src/backend/utils/mb/Unicode/UCS_to_SJIS.pl   | 4 ----
 5 files changed, 3 insertions(+), 23 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
index bfc9912..643f02b 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
@@ -49,10 +49,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_CN table
-#
-
 $file = "utf8_to_euc_cn.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_CN[ $count ] = {\n";
@@ -75,7 +71,7 @@
 close(FILE);
 
 #
-# then generate EUC_JP --> UTF8 table
+# then generate EUC_CN --> UTF8 table
 #
 reset 'array';
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
index 79bc05b..687e668 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
@@ -130,10 +130,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_JP table
-#
-
 $file = "utf8_to_euc_jp.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_JP[ $count ] = {\n";
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
index fa553fd..82490a0 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
@@ -49,10 +49,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_KR table
-#
-
 $file = "utf8_to_euc_kr.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_KR[ $count ] = {\n";
@@ -75,7 +71,7 @@
 close(FILE);
 
 #
-# then generate EUC_JP --> UTF8 table
+# then generate EUC_KR --> UTF8 table
 #
 reset 'array';
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
index 02414ba..697b6e6 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
@@ -65,10 +65,6 @@
 }
 close(FILE);
 
-#
-# first, generate UTF8 --> EUC_TW table
-#
-
 $file = "utf8_to_euc_tw.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapEUC_TW[ $count ] = {\n";
@@ -91,7 +87,7 @@
 close(FILE);
 
 #
-# then generate EUC_JP --> UTF8 table
+# then generate EUC_TW --> UTF8 table
 #
 reset 'array';
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
index 74cd7ac..e607e91 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
@@ -66,10 +66,6 @@
 
 close(FILE);
 
-#
-# first, generate UTF8 --> SJIS table
-#
-
 $file = "utf8_to_sjis.map";
 open(FILE, "> $file") || die("cannot open $file");
 print FILE "static const pg_utf_to_local ULmapSJIS[ $count ] = {\n";
-- 
2.5.1

0003-Remove-manually-added-header-comments-from-generated.patchtext/x-patch; name=0003-Remove-manually-added-header-comments-from-generated.patchDownload
>From 8f19caa272880afc1a0857e6478ac3fac8203cd6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:12 -0400
Subject: [PATCH 03/10] Remove manually added header comments from generated
 files

These header comments were added as part of commit f3d99d16, but only a
few of the map files were changed.  Since these files are in theory
generated from scripts, we shouldn't manually change them in a way
that's inconsistent with the script output.
---
 src/backend/utils/mb/Unicode/euc_cn_to_utf8.map     | 2 --
 src/backend/utils/mb/Unicode/euc_jp_to_utf8.map     | 2 --
 src/backend/utils/mb/Unicode/euc_tw_to_utf8.map     | 2 --
 src/backend/utils/mb/Unicode/gbk_to_utf8.map        | 2 --
 src/backend/utils/mb/Unicode/iso8859_10_to_utf8.map | 2 --
 src/backend/utils/mb/Unicode/iso8859_13_to_utf8.map | 2 --
 src/backend/utils/mb/Unicode/iso8859_14_to_utf8.map | 2 --
 src/backend/utils/mb/Unicode/iso8859_15_to_utf8.map | 2 --
 src/backend/utils/mb/Unicode/iso8859_16_to_utf8.map | 2 --
 src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map  | 2 --
 src/backend/utils/mb/Unicode/koi8r_to_utf8.map      | 2 --
 18 files changed, 36 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/euc_cn_to_utf8.map b/src/backend/utils/mb/Unicode/euc_cn_to_utf8.map
index 4052379..17cf7c8 100644
--- a/src/backend/utils/mb/Unicode/euc_cn_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/euc_cn_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/euc_cn_to_utf8.map */
-
 static const pg_local_to_utf LUmapEUC_CN[ 7445 ] = {
   {0xa1a1, 0xe38080},
   {0xa1a2, 0xe38081},
diff --git a/src/backend/utils/mb/Unicode/euc_jp_to_utf8.map b/src/backend/utils/mb/Unicode/euc_jp_to_utf8.map
index db427cb..937b53c 100644
--- a/src/backend/utils/mb/Unicode/euc_jp_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/euc_jp_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/euc_jp_to_utf8.map */
-
 static const pg_local_to_utf LUmapEUC_JP[] = {
   {0x8ea1, 0xefbda1},
   {0x8ea2, 0xefbda2},
diff --git a/src/backend/utils/mb/Unicode/euc_tw_to_utf8.map b/src/backend/utils/mb/Unicode/euc_tw_to_utf8.map
index b430b44..8535b99 100644
--- a/src/backend/utils/mb/Unicode/euc_tw_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/euc_tw_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/euc_tw_to_utf8.map */
-
 static const pg_local_to_utf LUmapEUC_TW[ 23575 ] = {
   {0xa1a1, 0xe38080},
   {0xa1a2, 0xefbc8c},
diff --git a/src/backend/utils/mb/Unicode/gbk_to_utf8.map b/src/backend/utils/mb/Unicode/gbk_to_utf8.map
index fced1f4..964aa52 100644
--- a/src/backend/utils/mb/Unicode/gbk_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/gbk_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/gbk_to_utf8.map */
-
 static const pg_local_to_utf LUmapGBK[ 21792 ] = {
   {0x0080, 0xe282ac},
   {0x8140, 0xe4b882},
diff --git a/src/backend/utils/mb/Unicode/iso8859_10_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_10_to_utf8.map
index 8a650ee..91d3e68 100644
--- a/src/backend/utils/mb/Unicode/iso8859_10_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_10_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_10_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_10[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_13_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_13_to_utf8.map
index 2075706..b641673 100644
--- a/src/backend/utils/mb/Unicode/iso8859_13_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_13_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_13_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_13[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_14_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_14_to_utf8.map
index 49d63d4..e2ca373 100644
--- a/src/backend/utils/mb/Unicode/iso8859_14_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_14_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_14_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_14[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_15_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_15_to_utf8.map
index 349b64c..f9803e8 100644
--- a/src/backend/utils/mb/Unicode/iso8859_15_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_15_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_15_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_15[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_16_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_16_to_utf8.map
index d8e2801..87e9246 100644
--- a/src/backend/utils/mb/Unicode/iso8859_16_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_16_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_16_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_16[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map
index 30d487a..5fedd36 100644
--- a/src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_2[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map
index 94b5bc4..c6f824c 100644
--- a/src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_3[ 121 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map
index f339c19..73228fa 100644
--- a/src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_4[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map
index 601be30..cd832b6 100644
--- a/src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_5[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map
index 289f97e..5e7b676 100644
--- a/src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_6[ 83 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map
index fbbecaa..987c0ad 100644
--- a/src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_7[ 125 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map
index 4ed316c..9f0a597 100644
--- a/src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_8[ 92 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map b/src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map
index f86cc65..93c9cf5 100644
--- a/src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map */
-
 static const pg_local_to_utf LUmapISO8859_9[ 128 ] = {
   {0x0080, 0xc280},
   {0x0081, 0xc281},
diff --git a/src/backend/utils/mb/Unicode/koi8r_to_utf8.map b/src/backend/utils/mb/Unicode/koi8r_to_utf8.map
index 738f160..d8a544c 100644
--- a/src/backend/utils/mb/Unicode/koi8r_to_utf8.map
+++ b/src/backend/utils/mb/Unicode/koi8r_to_utf8.map
@@ -1,5 +1,3 @@
-/* src/backend/utils/mb/Unicode/koi8r_to_utf8.map */
-
 static const pg_local_to_utf LUmapKOI8R[ 128 ] = {
   {0x0080, 0xe29480},
   {0x0081, 0xe29482},
-- 
2.5.1

0004-Add-Unicode-map-generation-scripts-as-rule-prerequis.patchtext/x-patch; name=0004-Add-Unicode-map-generation-scripts-as-rule-prerequis.patchDownload
>From f970a7b93b815a27d1724a448a36b12e1c87b328 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:13 -0400
Subject: [PATCH 04/10] Add Unicode map generation scripts as rule
 prerequisites

That way, the rules will trigger when the scripts change.
---
 src/backend/utils/mb/Unicode/Makefile | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 353fc75..b15efce 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -68,29 +68,29 @@ GENERICTEXTS = $(ISO8859TEXTS) $(WINTEXTS) \
 
 all: $(MAPS)
 
-$(GENERICMAPS) : $(GENERICTEXTS)
-	$(PERL) $(srcdir)/UCS_to_most.pl
+$(GENERICMAPS): UCS_to_most.pl $(GENERICTEXTS)
+	$(PERL) $<
 
-euc_jp_to_utf8.map utf8_to_euc_jp.map : JIS0201.TXT JIS0208.TXT JIS0212.TXT
-	$(PERL) $(srcdir)/UCS_to_EUC_JP.pl
+euc_jp_to_utf8.map utf8_to_euc_jp.map: UCS_to_EUC_JP.pl JIS0201.TXT JIS0208.TXT JIS0212.TXT
+	$(PERL) $<
 
-euc_cn_to_utf8.map utf8_to_euc_cn.map : GB2312.TXT
-	$(PERL) $(srcdir)/UCS_to_EUC_CN.pl
+euc_cn_to_utf8.map utf8_to_euc_cn.map: UCS_to_EUC_CN.pl GB2312.TXT
+	$(PERL) $<
 
-euc_kr_to_utf8.map utf8_to_euc_kr.map : KSX1001.TXT
-	$(PERL) $(srcdir)/UCS_to_EUC_KR.pl
+euc_kr_to_utf8.map utf8_to_euc_kr.map: UCS_to_EUC_KR.pl KSX1001.TXT
+	$(PERL) $<
 
-euc_tw_to_utf8.map utf8_to_euc_tw.map : CNS11643.TXT
-	$(PERL) $(srcdir)/UCS_to_EUC_TW.pl
+euc_tw_to_utf8.map utf8_to_euc_tw.map: UCS_to_EUC_TW.pl CNS11643.TXT
+	$(PERL) $<
 
-sjis_to_utf8.map utf8_to_sjis.map : CP932.TXT
-	$(PERL) $(srcdir)/UCS_to_SJIS.pl
+sjis_to_utf8.map utf8_to_sjis.map: UCS_to_SJIS.pl CP932.TXT
+	$(PERL) $<
 
-gb18030_to_utf8.map  utf8_to_gb18030.map : gb-18030-2000.xml
-	$(PERL) $(srcdir)/UCS_to_GB18030.pl
+gb18030_to_utf8.map utf8_to_gb18030.map: UCS_to_GB18030.pl gb-18030-2000.xml
+	$(PERL) $<
 
-big5_to_utf8.map  utf8_to_big5.map : BIG5.TXT CP950.TXT
-	$(PERL) $(srcdir)/UCS_to_BIG5.pl
+big5_to_utf8.map utf8_to_big5.map: UCS_to_BIG5.pl BIG5.TXT CP950.TXT
+	$(PERL) $<
 
 distclean: clean
 	rm -f $(TEXTS)
-- 
2.5.1

0005-Add-missing-rules-related-to-EUC_JIS_2004-and-SHIFT_.patchtext/x-patch; name=0005-Add-missing-rules-related-to-EUC_JIS_2004-and-SHIFT_.patchDownload
>From 0931da19b1960525f696a92997e294e9bea30e5c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:13 -0400
Subject: [PATCH 05/10] Add missing rules related to EUC_JIS_2004 and
 SHIFT_JIS_2004 encodings

This was apparently forgotten in commit
75c6519ff68dbb97f73b13e9976fb8075bbde7b8.
---
 src/backend/utils/mb/Unicode/Makefile | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index b15efce..1c14b13 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -50,7 +50,11 @@ SPECIALMAPS = euc_cn_to_utf8.map utf8_to_euc_cn.map \
 	euc_tw_to_utf8.map utf8_to_euc_tw.map \
 	sjis_to_utf8.map utf8_to_sjis.map \
 	gb18030_to_utf8.map utf8_to_gb18030.map \
-	big5_to_utf8.map utf8_to_big5.map
+	big5_to_utf8.map utf8_to_big5.map \
+	euc_jis_2004_to_utf8.map euc_jis_2004_to_utf8_combined.map \
+	utf8_to_euc_jis_2004.map utf8_to_euc_jis_2004_combined.map \
+	shift_jis_2004_to_utf8.map shift_jis_2004_to_utf8_combined.map \
+	utf8_to_shift_jis_2004.map utf8_to_shift_jis_2004_combined.map
 
 MAPS = $(GENERICMAPS) $(SPECIALMAPS)
 
@@ -92,6 +96,12 @@ gb18030_to_utf8.map utf8_to_gb18030.map: UCS_to_GB18030.pl gb-18030-2000.xml
 big5_to_utf8.map utf8_to_big5.map: UCS_to_BIG5.pl BIG5.TXT CP950.TXT
 	$(PERL) $<
 
+euc_jis_2004_to_utf8.map euc_jis_2004_to_utf8_combined.map utf8_to_euc_jis_2004.map utf8_to_euc_jis_2004_combined.map: UCS_to_EUC_JIS_2004.pl euc-jis-2004-std.txt
+	$(PERL) $<
+
+shift_jis_2004_to_utf8.map shift_jis_2004_to_utf8_combined.map utf8_to_shift_jis_2004.map utf8_to_shift_jis_2004_combined.map: UCS_to_SHIFT_JIS_2004.pl sjis-0213-2004-std.txt
+	$(PERL) $<
+
 distclean: clean
 	rm -f $(TEXTS)
 
-- 
2.5.1

0006-Make-some-adjustments-in-variable-assignments.patchtext/x-patch; name=0006-Make-some-adjustments-in-variable-assignments.patchDownload
>From c542057ba817c959fbb09f0aa44075d247db60d4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:13 -0400
Subject: [PATCH 06/10] Make some adjustments in variable assignments

These variables aren't really used for anything interesting, but it
seems the existing grouping was somewhat nonsensical.
---
 src/backend/utils/mb/Unicode/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 1c14b13..baf15d2 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -63,12 +63,13 @@ ISO8859TEXTS = 8859-2.TXT 8859-3.TXT 8859-4.TXT 8859-5.TXT \
 	8859-10.TXT 8859-13.TXT 8859-14.TXT 8859-15.TXT \
 	8859-16.TXT
 
-WINTEXTS = CP866.TXT CP874.TXT CP1250.TXT CP1251.TXT \
+WINTEXTS = CP866.TXT CP874.TXT CP936.TXT CP949.TXT \
+	CP1250.TXT CP1251.TXT \
 	CP1252.TXT CP1253.TXT CP1254.TXT CP1255.TXT \
 	CP1256.TXT CP1257.TXT CP1258.TXT
 
 GENERICTEXTS = $(ISO8859TEXTS) $(WINTEXTS) \
-	KOI8-R.TXT CP936.TXT CP949.TXT JOHAB.TXT
+	KOI8-R.TXT JOHAB.TXT
 
 all: $(MAPS)
 
-- 
2.5.1

0007-Add-prerequisite-for-KOI8-U.TXT.patchtext/x-patch; name=0007-Add-prerequisite-for-KOI8-U.TXT.patchDownload
>From 62826ff0d14dc8a4a72ed7ef16059060b82d05d1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:13 -0400
Subject: [PATCH 07/10] Add prerequisite for KOI8-U.TXT

This was missed when the encoding was added.
---
 src/backend/utils/mb/Unicode/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index baf15d2..433fb29 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -69,7 +69,7 @@ WINTEXTS = CP866.TXT CP874.TXT CP936.TXT CP949.TXT \
 	CP1256.TXT CP1257.TXT CP1258.TXT
 
 GENERICTEXTS = $(ISO8859TEXTS) $(WINTEXTS) \
-	KOI8-R.TXT JOHAB.TXT
+	KOI8-R.TXT KOI8-U.TXT JOHAB.TXT
 
 all: $(MAPS)
 
-- 
2.5.1

0008-Add-rules-to-download-raw-mapping-files.patchtext/x-patch; name=0008-Add-rules-to-download-raw-mapping-files.patchDownload
>From 25d14ecee10d7014ed82527ba13abd93133260d9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:13 -0400
Subject: [PATCH 08/10] Add rules to download raw mapping files

These are not part of a normal build, like this entire directory.
---
 src/backend/utils/mb/Unicode/Makefile | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 433fb29..342d990 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -108,3 +108,37 @@ distclean: clean
 
 maintainer-clean: distclean
 	rm -f $(MAPS)
+
+
+WGET = wget -O $@ --no-use-server-timestamps
+#WGET = curl -o $@
+
+BIG.TXT:
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/$(@F)
+
+CNS11643.TXT:
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/$(@F)
+
+GB2312.TXT:
+	$(WGET) 'http://trac.greenstone.org/browser/trunk/gsdl/unicode/MAPPINGS/EASTASIA/GB/GB2312.TXT?rev=1842&format=txt'
+
+JIS0201.TXT JIS0208.TXT JIS0212.TXT:
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/$(@F)
+
+JOHAB.TXT:
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/$(@F)
+
+KOI8-R.TXT KOI8-U.TXT:
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/$(@F)
+
+KSX1001.TXT:
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/$(@F)
+
+$(ISO8859TEXTS):
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/ISO8859/$(@F)
+
+$(filter-out CP8%,$(WINTEXTS)):
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/$(@F)
+
+$(filter CP8%,$(WINTEXTS)):
+	$(WGET) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/$(@F)
-- 
2.5.1

0009-Make-spacing-and-punctuation-consistent.patchtext/x-patch; name=0009-Make-spacing-and-punctuation-consistent.patchDownload
>From fd4fecaa40bc3e62e102aec577e71dda20f34be4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:14 -0400
Subject: [PATCH 09/10] Make spacing and punctuation consistent

---
 src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl | 4 ++--
 src/backend/utils/mb/Unicode/UCS_to_SJIS.pl           | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
index 33d108e..e8f2467 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
@@ -168,9 +168,9 @@
 	if ($array{$code} ne "")
 	{
 		printf STDERR
-		  "Warning: duplicate UTF-8: %08x UCS: %04x Shift JIS: %04x\n", $utf,
+		  "Warning: duplicate UTF8: %08x UCS: %04x Shift JIS: %04x\n", $utf,
 		  $ucs, $code;
-		printf STDERR "Previous value: UTF-8: %08x\n", $array{$utf};
+		printf STDERR "Previous value: UTF8: %08x\n", $array{$utf};
 		next;
 	}
 	$count++;
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
index e607e91..5d2a1ca 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
@@ -55,7 +55,7 @@
 				&& ($code <= 0x879c)))
 		{
 			printf STDERR
-			  "Warning: duplicate UTF8 : UCS=0x%04x  SJIS=0x%04x\n", $ucs,
+			  "Warning: duplicate UTF8: UCS=0x%04x SJIS=0x%04x\n", $ucs,
 			  $code;
 			next;
 		}
-- 
2.5.1

0010-Turn-off-test-mode-by-default.patchtext/x-patch; name=0010-Turn-off-test-mode-by-default.patchDownload
>From 80d0ecb8c6bafab496b4acd4abc3af6495b05c9e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Mon, 31 Aug 2015 23:59:14 -0400
Subject: [PATCH 10/10] Turn off "test" mode by default

It produces debugging output files that are of no further use, so we
don't need that by default.
---
 src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
index 7860736..92252a2 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
@@ -9,7 +9,7 @@
 
 require "ucs2utf.pl";
 
-$TEST = 1;
+$TEST = 0;
 
 # first generate UTF-8 --> EUC_JIS_2004 table
 
-- 
2.5.1

#2Greg Stark
stark@mit.edu
In reply to: Peter Eisentraut (#1)
Re: Unicode mapping scripts cleanup

On Tue, Sep 1, 2015 at 5:13 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

So apparently, the
CJK to Unicode mappings are still evolving and should be updated
occasionally. Next steps would be to commit some or all of these
differences after additional verification, and then update the scripts
to use whatever the non-obsolete mapping sources are supposed to be.

Would that pose a problem for databases which have data in them
already using the old mappings?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tatsuo Ishii
ishii@postgresql.org
In reply to: Greg Stark (#2)
Re: Unicode mapping scripts cleanup

On Tue, Sep 1, 2015 at 5:13 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

So apparently, the
CJK to Unicode mappings are still evolving and should be updated
occasionally. Next steps would be to commit some or all of these
differences after additional verification, and then update the scripts
to use whatever the non-obsolete mapping sources are supposed to be.

Would that pose a problem for databases which have data in them
already using the old mappings?

I think so. We must be very careful updating the maps. Adding new
mapping data would cause less problem, but replacing existing mappings
will be definitely a big problem for users.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tatsuo Ishii
ishii@postgresql.org
In reply to: Peter Eisentraut (#1)
Re: Unicode mapping scripts cleanup

I discovered that some of the source files that one is supposed to
download don't exist anymore or are labeled obsolete. Also, running the
scripts produces slight differences in the output. So apparently, the
CJK to Unicode mappings are still evolving and should be updated
occasionally. Next steps would be to commit some or all of these
differences after additional verification, and then update the scripts
to use whatever the non-obsolete mapping sources are supposed to be.

Some of maps were "hand tweaked" from the output of the script, for
example utf8_to_sjis.map. See git log for more details. This is due to
part of the source file was not incomplete or inappropriate. Also we
needed to compromise while creating a mapping between some local
encodings (for example SJIS) and Unicode, because in the source
mapping file round trip conversion is not guaranteed.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Tatsuo Ishii (#3)
Re: Unicode mapping scripts cleanup

On 9/1/15 7:27 PM, Tatsuo Ishii wrote:

On Tue, Sep 1, 2015 at 5:13 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

So apparently, the
CJK to Unicode mappings are still evolving and should be updated
occasionally. Next steps would be to commit some or all of these
differences after additional verification, and then update the scripts
to use whatever the non-obsolete mapping sources are supposed to be.

Would that pose a problem for databases which have data in them
already using the old mappings?

I think so. We must be very careful updating the maps. Adding new
mapping data would cause less problem, but replacing existing mappings
will be definitely a big problem for users.

Note that I'm not actually proposing to change the mappings, I just want
to get the scripts into working order, to put us into a position to
consider changes if necessary.

That said, I'm not sure what the problem with changes would be. The
data in the databases doesn't change. You just see different data
coming out. It is in the nature of encoding conversion that you don't
get the original data, but an approximation. Then again, I don't have
any knowledge about how to handle such changes. But the fact that the
standards bodies are still making changes indicates that such changes
are to be expected and should be handled. I think this is similar to
time zone changes, and also similar in different ways to collation changes.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tatsuo Ishii
ishii@postgresql.org
In reply to: Peter Eisentraut (#5)
Re: Unicode mapping scripts cleanup

I think so. We must be very careful updating the maps. Adding new
mapping data would cause less problem, but replacing existing mappings
will be definitely a big problem for users.

Note that I'm not actually proposing to change the mappings, I just want
to get the scripts into working order, to put us into a position to
consider changes if necessary.

That said, I'm not sure what the problem with changes would be. The
data in the databases doesn't change. You just see different data
coming out. It is in the nature of encoding conversion that you don't
get the original data, but an approximation.

I don't buy the argument "user's should accept the behavior change
because data inside PostgreSQL does not change". I think we should
care about user's application in total.

Then again, I don't have
any knowledge about how to handle such changes. But the fact that the
standards bodies are still making changes indicates that such changes
are to be expected and should be handled. I think this is similar to
time zone changes, and also similar in different ways to collation changes.

The question here is, as far as I know, the encoding mappings are
*not* part of the Unicode standard, nor any kind of other standards,
then why do we need strictly follow the mapping data with sacrificing
application's compatibility.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tatsuo Ishii (#6)
Re: Unicode mapping scripts cleanup

On Tue, Sep 15, 2015 at 9:00 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:

Then again, I don't have
any knowledge about how to handle such changes. But the fact that the
standards bodies are still making changes indicates that such changes
are to be expected and should be handled. I think this is similar to
time zone changes, and also similar in different ways to collation changes.

The question here is, as far as I know, the encoding mappings are
*not* part of the Unicode standard, nor any kind of other standards,
then why do we need strictly follow the mapping data with sacrificing
application's compatibility.

What if we discovered that one of our mappings was wrong? Suppose
that there is some encoding where the Unicode mapping for "a" is
inadvertently mapped to the letter "b" in some other character set,
and "b" is mapped to "a". I imagine that anyone using that encoding
would want this fixed; it's a bug.

Other cases might be less clear. The cost of changing the mappings
should always be compared against the benefit. But it might still be
the right thing to do in some cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tatsuo Ishii
ishii@postgresql.org
In reply to: Robert Haas (#7)
Re: Unicode mapping scripts cleanup

What if we discovered that one of our mappings was wrong? Suppose
that there is some encoding where the Unicode mapping for "a" is
inadvertently mapped to the letter "b" in some other character set,
and "b" is mapped to "a". I imagine that anyone using that encoding
would want this fixed; it's a bug.

I am not against fixing the mapping if it *clearly* includes a
bug. However we must be very careful before deciding if it's really a
bug or not.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#1)
Re: Unicode mapping scripts cleanup

Hi,

On 2015-09-01 00:13:07 -0400, Peter Eisentraut wrote:

Here is a series of patches to clean up the Unicode mapping script
business in src/backend/utils/mb/Unicode/. It overlaps with the
perlcritic work that I recently wrote about, except that these pieces
are not strictly related to Perl, but wrong comments, missing makefile
pieces, and such.

I looked through the patches, and afaics they're generally a good
idea. And they're all, IIUC, independent of us applying or not applying
updates. So why don't we go ahead with these changes?

I've marked this as returned-with-feedback for now, since there hasn't
been much progress lately.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Peter Eisentraut (#1)
1 attachment(s)
Re: Unicode mapping scripts cleanup

On 9/1/15 12:13 AM, Peter Eisentraut wrote:

ere is a series of patches to clean up the Unicode mapping script
business in src/backend/utils/mb/Unicode/.

I never committed the last of these patches, which have the download
locations of the files. I have updated this a bit now and propose it
here again.

I have also added download locations for the source files we do have in
git. I wonder why we ship these and none of the other ones:

845974 gb-18030-2000.xml
324237 euc-jis-2004-std.txt
319198 sjis-0213-2004-std.txt

I recall it might have been license issues with the other files.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-Add-rules-to-download-raw-mapping-files.patchtext/x-patch; name=0001-Add-rules-to-download-raw-mapping-files.patchDownload
From e8bc9fa4202fc560b666d061d43f120cf24b556a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Fri, 28 Oct 2016 12:00:00 -0400
Subject: [PATCH] Add rules to download raw mapping files

These are not part of a normal build, like this entire directory.
---
 src/backend/utils/mb/Unicode/Makefile | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 40065c3..9d2ef5e 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -108,3 +108,37 @@ distclean: clean
 
 maintainer-clean: distclean
 	rm -f $(MAPS)
+
+
+DOWNLOAD = wget -O $@ --no-use-server-timestamps
+#DOWNLOAD = curl -o $@
+
+BIG5.TXT CNS11643.TXT:
+	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/$(@F)
+
+euc-jis-2004-std.txt sjis-0213-2004-std.txt:
+	$(DOWNLOAD) http://x0213.org/codetable/$(@F)
+
+gb-18030-2000.xml:
+	$(DOWNLOAD) https://ssl.icu-project.org/repos/icu/data/trunk/charset/data/xml/$(@F)
+
+GB2312.TXT:
+	$(DOWNLOAD) 'http://trac.greenstone.org/browser/trunk/gsdl/unicode/MAPPINGS/EASTASIA/GB/GB2312.TXT?rev=1842&format=txt'
+
+JIS0201.TXT JIS0208.TXT JIS0212.TXT:
+	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/$(@F)
+
+JOHAB.TXT KSX1001.TXT:
+	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/$(@F)
+
+KOI8-R.TXT KOI8-U.TXT:
+	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/$(@F)
+
+$(ISO8859TEXTS):
+	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/ISO8859/$(@F)
+
+$(filter-out CP8%,$(WINTEXTS)):
+	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/$(@F)
+
+$(filter CP8%,$(WINTEXTS)):
+	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/$(@F)
-- 
2.10.1