Perform COPY FROM encoding conversions in larger chunks
I've been looking at the COPY FROM parsing code, trying to refactor it
so that the parallel COPY would be easier to implement. I haven't
touched parallelism itself, just looking for ways to smoothen the way.
And for ways to speed up COPY in general.
Currently, COPY FROM parses the input one line at a time. Each line is
converted to the database encoding separately, or if the file encoding
matches the database encoding, we just check that the input is valid for
the encoding. It would be more efficient to do the encoding
conversion/verification in larger chunks. At least potentially; the
current conversion/verification implementations work one byte a time so
it doesn't matter too much, but there are faster algorithms out there
that use SIMD instructions or lookup tables that benefit from larger inputs.
So I'd like to change it so that the encoding conversion/verification is
done before splitting the input into lines. The problem is that the
conversion and verification functions throw an error on incomplete
input. So we can't pass them a chunk of N raw bytes, if we don't know
where the character boundaries are. The first step in this effort is to
change the encoding and conversion routines to allow that. Attached
patches 0001-0004 do that:
For encoding conversions, change the signature of the conversion
function, by adding a "bool noError" argument and making them return the
number of input bytes successfully converted. That way, the conversion
function can be called in a streaming fashion: load a buffer with raw
input without caring about the character boundaries, call the conversion
function to convert it except for the few bytes at the end that might be
an incomplete character, load the buffer with more data, and repeat.
For encoding verification, add a new function that works similarly. It
takes N bytes of raw input, verifies as much of it as possible, and
returns the number of input bytes that were valid. In principle, this
could've been implemented by calling the existing pg_encoding_mblen()
and pg_encoding_verifymb() functions in a loop, but it would be too
slow. This adds encoding-specific functions for that. The UTF-8
implementation is slightly optimized by basically inlining the
pg_utf8_mblen() call, the other implementations are pretty naive.
- Heikki
Attachments:
0001-Add-new-mbverifystr-function-for-each-encoding.patchtext/x-patch; charset=UTF-8; name=0001-Add-new-mbverifystr-function-for-each-encoding.patchDownload
From 9c61aa3604af862a8c8217eee8d268b80ae06a2d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 14 Dec 2020 18:28:45 +0200
Subject: [PATCH 1/5] Add new mbverifystr() function for each encoding.
This potentially makes pg_verify_mbstr() function faster, by allowing
more efficient encoding-specific implementations. All of the
implementations in this patch are pretty naive, though.
---
src/backend/commands/extension.c | 2 +-
src/backend/utils/mb/conv.c | 2 +-
.../euc2004_sjis2004/euc2004_sjis2004.c | 4 +-
.../euc_jp_and_sjis/euc_jp_and_sjis.c | 10 +-
.../euc_kr_and_mic/euc_kr_and_mic.c | 4 +-
.../euc_tw_and_big5/euc_tw_and_big5.c | 8 +-
src/backend/utils/mb/mbutils.c | 31 +-
src/common/wchar.c | 514 +++++++++++++++---
src/include/mb/pg_wchar.h | 10 +-
9 files changed, 491 insertions(+), 94 deletions(-)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b5630b4c8d9..82f1248dbf1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -682,7 +682,7 @@ read_extension_script_file(const ExtensionControlFile *control,
src_encoding = control->encoding;
/* make sure that source string is valid in the expected encoding */
- pg_verify_mbstr_len(src_encoding, src_str, len, false);
+ (void) pg_verify_mbstr(src_encoding, src_str, len, false);
/*
* Convert the encoding to the database encoding. read_whole_file
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index 54dcf71fb75..192948caad2 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -653,7 +653,7 @@ LocalToUtf(const unsigned char *iso, int len,
continue;
}
- l = pg_encoding_verifymb(encoding, (const char *) iso, len);
+ l = pg_encoding_verifymbchar(encoding, (const char *) iso, len);
if (l < 0)
break;
diff --git a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
index 9ba6bd30405..3628e690aa1 100644
--- a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
@@ -87,7 +87,7 @@ euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
continue;
}
- l = pg_encoding_verifymb(PG_EUC_JIS_2004, (const char *) euc, len);
+ l = pg_encoding_verifymbchar(PG_EUC_JIS_2004, (const char *) euc, len);
if (l < 0)
report_invalid_encoding(PG_EUC_JIS_2004,
@@ -238,7 +238,7 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
continue;
}
- l = pg_encoding_verifymb(PG_SHIFT_JIS_2004, (const char *) sjis, len);
+ l = pg_encoding_verifymbchar(PG_SHIFT_JIS_2004, (const char *) sjis, len);
if (l < 0 || l > len)
report_invalid_encoding(PG_SHIFT_JIS_2004,
diff --git a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
index 4ca8e2126e4..ea05436596d 100644
--- a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
@@ -291,7 +291,7 @@ mic2sjis(const unsigned char *mic, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_MULE_INTERNAL, (const char *) mic, len);
+ l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
@@ -381,7 +381,7 @@ euc_jp2mic(const unsigned char *euc, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_EUC_JP, (const char *) euc, len);
+ l = pg_encoding_verifymbchar(PG_EUC_JP, (const char *) euc, len);
if (l < 0)
report_invalid_encoding(PG_EUC_JP,
(const char *) euc, len);
@@ -431,7 +431,7 @@ mic2euc_jp(const unsigned char *mic, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_MULE_INTERNAL, (const char *) mic, len);
+ l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
@@ -485,7 +485,7 @@ euc_jp2sjis(const unsigned char *euc, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_EUC_JP, (const char *) euc, len);
+ l = pg_encoding_verifymbchar(PG_EUC_JP, (const char *) euc, len);
if (l < 0)
report_invalid_encoding(PG_EUC_JP,
(const char *) euc, len);
@@ -580,7 +580,7 @@ sjis2euc_jp(const unsigned char *sjis, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_SJIS, (const char *) sjis, len);
+ l = pg_encoding_verifymbchar(PG_SJIS, (const char *) sjis, len);
if (l < 0)
report_invalid_encoding(PG_SJIS,
(const char *) sjis, len);
diff --git a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
index 4d7876a666e..600c5cbc5cd 100644
--- a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
@@ -76,7 +76,7 @@ euc_kr2mic(const unsigned char *euc, unsigned char *p, int len)
c1 = *euc;
if (IS_HIGHBIT_SET(c1))
{
- l = pg_encoding_verifymb(PG_EUC_KR, (const char *) euc, len);
+ l = pg_encoding_verifymbchar(PG_EUC_KR, (const char *) euc, len);
if (l != 2)
report_invalid_encoding(PG_EUC_KR,
(const char *) euc, len);
@@ -122,7 +122,7 @@ mic2euc_kr(const unsigned char *mic, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_MULE_INTERNAL, (const char *) mic, len);
+ l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index 82a22b9bebf..7e4c2697b07 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -148,7 +148,7 @@ euc_tw2mic(const unsigned char *euc, unsigned char *p, int len)
c1 = *euc;
if (IS_HIGHBIT_SET(c1))
{
- l = pg_encoding_verifymb(PG_EUC_TW, (const char *) euc, len);
+ l = pg_encoding_verifymbchar(PG_EUC_TW, (const char *) euc, len);
if (l < 0)
report_invalid_encoding(PG_EUC_TW,
(const char *) euc, len);
@@ -213,7 +213,7 @@ mic2euc_tw(const unsigned char *mic, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_MULE_INTERNAL, (const char *) mic, len);
+ l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
@@ -272,7 +272,7 @@ big52mic(const unsigned char *big5, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_BIG5, (const char *) big5, len);
+ l = pg_encoding_verifymbchar(PG_BIG5, (const char *) big5, len);
if (l < 0)
report_invalid_encoding(PG_BIG5,
(const char *) big5, len);
@@ -321,7 +321,7 @@ mic2big5(const unsigned char *mic, unsigned char *p, int len)
len--;
continue;
}
- l = pg_encoding_verifymb(PG_MULE_INTERNAL, (const char *) mic, len);
+ l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index a8e13cacfde..67d1c4fc19f 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -519,7 +519,7 @@ pg_convert(PG_FUNCTION_ARGS)
/* make sure that source string is valid */
len = VARSIZE_ANY_EXHDR(string);
src_str = VARDATA_ANY(string);
- pg_verify_mbstr_len(src_encoding, src_str, len, false);
+ (void) pg_verify_mbstr(src_encoding, src_str, len, false);
/* perform conversion */
dest_str = (char *) pg_do_encoding_conversion((unsigned char *) unconstify(char *, src_str),
@@ -1215,10 +1215,10 @@ static bool
pg_generic_charinc(unsigned char *charptr, int len)
{
unsigned char *lastbyte = charptr + len - 1;
- mbverifier mbverify;
+ mbchar_verifier mbverify;
/* We can just invoke the character verifier directly. */
- mbverify = pg_wchar_table[GetDatabaseEncoding()].mbverify;
+ mbverify = pg_wchar_table[GetDatabaseEncoding()].mbverifychar;
while (*lastbyte < (unsigned char) 255)
{
@@ -1445,8 +1445,7 @@ pg_database_encoding_max_length(void)
bool
pg_verifymbstr(const char *mbstr, int len, bool noError)
{
- return
- pg_verify_mbstr_len(GetDatabaseEncoding(), mbstr, len, noError) >= 0;
+ return pg_verify_mbstr(GetDatabaseEncoding(), mbstr, len, noError);
}
/*
@@ -1456,7 +1455,18 @@ pg_verifymbstr(const char *mbstr, int len, bool noError)
bool
pg_verify_mbstr(int encoding, const char *mbstr, int len, bool noError)
{
- return pg_verify_mbstr_len(encoding, mbstr, len, noError) >= 0;
+ int oklen;
+
+ Assert(PG_VALID_ENCODING(encoding));
+
+ oklen = pg_wchar_table[encoding].mbverifystr((const unsigned char *) mbstr, len);
+ if (oklen != len)
+ {
+ if (noError)
+ return false;
+ report_invalid_encoding(encoding, mbstr + oklen, len - oklen);
+ }
+ return true;
}
/*
@@ -1469,11 +1479,14 @@ pg_verify_mbstr(int encoding, const char *mbstr, int len, bool noError)
* If OK, return length of string in the encoding.
* If a problem is found, return -1 when noError is
* true; when noError is false, ereport() a descriptive message.
+ *
+ * Note: We cannot use the faster encoding-specific mbverifystr() function
+ * here, because we need to count the number of characters in the string.
*/
int
pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
{
- mbverifier mbverify;
+ mbchar_verifier mbverifychar;
int mb_len;
Assert(PG_VALID_ENCODING(encoding));
@@ -1493,7 +1506,7 @@ pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
}
/* fetch function pointer just once */
- mbverify = pg_wchar_table[encoding].mbverify;
+ mbverifychar = pg_wchar_table[encoding].mbverifychar;
mb_len = 0;
@@ -1516,7 +1529,7 @@ pg_verify_mbstr_len(int encoding, const char *mbstr, int len, bool noError)
report_invalid_encoding(encoding, mbstr, len);
}
- l = (*mbverify) ((const unsigned char *) mbstr, len);
+ l = (*mbverifychar) ((const unsigned char *) mbstr, len);
if (l < 0)
{
diff --git a/src/common/wchar.c b/src/common/wchar.c
index efaf1c155bb..5ab29bcbc39 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -19,7 +19,7 @@
* Operations on multi-byte encodings are driven by a table of helper
* functions.
*
- * To add an encoding support, define mblen(), dsplen() and verifier() for
+ * To add an encoding support, define mblen(), dsplen(), verifychar() and verifystr() for
* the encoding. For server-encodings, also define mb2wchar() and wchar2mb()
* conversion functions.
*
@@ -1087,29 +1087,47 @@ pg_gb18030_dsplen(const unsigned char *s)
*-------------------------------------------------------------------
* multibyte sequence validators
*
- * These functions accept "s", a pointer to the first byte of a string,
+ * The verifychar functions accept "s", a pointer to the first byte of a string,
* and "len", the remaining length of the string. If there is a validly
* encoded character beginning at *s, return its length in bytes; else
* return -1.
*
- * The functions can assume that len > 0 and that *s != '\0', but they must
+ * The verifychar functions can assume that len > 0 and that *s != '\0', but they must
* test for and reject zeroes in any additional bytes of a multibyte character.
- *
* Note that this definition allows the function for a single-byte
* encoding to be just "return 1".
+ *
+ * The verifystr functions also accept "s", a pointer to a string and "len",
+ * the remaining length of the string. It tries to verify the whole string, and
+ * returns the number of input bytes (<= len) that are valid. If there is an
+ * encoding error, the return value is < len, and len points to the first invalid
+ * byte.
+ *
+ * The verifystr functions must test for and reject zeroes in the input.
*-------------------------------------------------------------------
*/
-
static int
-pg_ascii_verifier(const unsigned char *s, int len)
+pg_ascii_verifychar(const unsigned char *s, int len)
{
return 1;
}
+static int
+pg_ascii_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *nullpos = memchr(s, 0, len);
+
+ if (nullpos == NULL)
+ return len;
+ {
+ return nullpos - s;
+ }
+}
+
#define IS_EUC_RANGE_VALID(c) ((c) >= 0xa1 && (c) <= 0xfe)
static int
-pg_eucjp_verifier(const unsigned char *s, int len)
+pg_eucjp_verifychar(const unsigned char *s, int len)
{
int l;
unsigned char c1,
@@ -1164,7 +1182,36 @@ pg_eucjp_verifier(const unsigned char *s, int len)
}
static int
-pg_euckr_verifier(const unsigned char *s, int len)
+pg_eucjp_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_eucjp_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_euckr_verifychar(const unsigned char *s, int len)
{
int l;
unsigned char c1,
@@ -1192,11 +1239,41 @@ pg_euckr_verifier(const unsigned char *s, int len)
return l;
}
+static int
+pg_euckr_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_euckr_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
/* EUC-CN byte sequences are exactly same as EUC-KR */
-#define pg_euccn_verifier pg_euckr_verifier
+#define pg_euccn_verifychar pg_euckr_verifychar
+#define pg_euccn_verifystr pg_euckr_verifystr
static int
-pg_euctw_verifier(const unsigned char *s, int len)
+pg_euctw_verifychar(const unsigned char *s, int len)
{
int l;
unsigned char c1,
@@ -1246,7 +1323,36 @@ pg_euctw_verifier(const unsigned char *s, int len)
}
static int
-pg_johab_verifier(const unsigned char *s, int len)
+pg_euctw_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_euctw_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_johab_verifychar(const unsigned char *s, int len)
{
int l,
mbl;
@@ -1270,7 +1376,36 @@ pg_johab_verifier(const unsigned char *s, int len)
}
static int
-pg_mule_verifier(const unsigned char *s, int len)
+pg_johab_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_johab_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_mule_verifychar(const unsigned char *s, int len)
{
int l,
mbl;
@@ -1291,13 +1426,54 @@ pg_mule_verifier(const unsigned char *s, int len)
}
static int
-pg_latin1_verifier(const unsigned char *s, int len)
+pg_mule_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_mule_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_latin1_verifychar(const unsigned char *s, int len)
{
return 1;
}
static int
-pg_sjis_verifier(const unsigned char *s, int len)
+pg_latin1_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *nullpos = memchr(s, 0, len);
+
+ if (nullpos == NULL)
+ return len;
+ {
+ return nullpos - s;
+ }
+}
+
+static int
+pg_sjis_verifychar(const unsigned char *s, int len)
{
int l,
mbl;
@@ -1320,7 +1496,36 @@ pg_sjis_verifier(const unsigned char *s, int len)
}
static int
-pg_big5_verifier(const unsigned char *s, int len)
+pg_sjis_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_sjis_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_big5_verifychar(const unsigned char *s, int len)
{
int l,
mbl;
@@ -1340,7 +1545,36 @@ pg_big5_verifier(const unsigned char *s, int len)
}
static int
-pg_gbk_verifier(const unsigned char *s, int len)
+pg_big5_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_big5_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_gbk_verifychar(const unsigned char *s, int len)
{
int l,
mbl;
@@ -1360,7 +1594,36 @@ pg_gbk_verifier(const unsigned char *s, int len)
}
static int
-pg_uhc_verifier(const unsigned char *s, int len)
+pg_gbk_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_gbk_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_uhc_verifychar(const unsigned char *s, int len)
{
int l,
mbl;
@@ -1380,7 +1643,36 @@ pg_uhc_verifier(const unsigned char *s, int len)
}
static int
-pg_gb18030_verifier(const unsigned char *s, int len)
+pg_uhc_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_uhc_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_gb18030_verifychar(const unsigned char *s, int len)
{
int l;
@@ -1411,11 +1703,55 @@ pg_gb18030_verifier(const unsigned char *s, int len)
}
static int
-pg_utf8_verifier(const unsigned char *s, int len)
+pg_gb18030_verifystr(const unsigned char *s, int len)
{
- int l = pg_utf_mblen(s);
+ const unsigned char *start = s;
- if (len < l)
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_gb18030_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
+static int
+pg_utf8_verifychar(const unsigned char *s, int len)
+{
+ int l;
+
+ if ((*s & 0x80) == 0)
+ {
+ if (*s == '\0')
+ return -1;
+ return 1;
+ }
+ else if ((*s & 0xe0) == 0xc0)
+ l = 2;
+ else if ((*s & 0xf0) == 0xe0)
+ l = 3;
+ else if ((*s & 0xf8) == 0xf0)
+ l = 4;
+ else
+ l = 1;
+
+ if (l > len)
return -1;
if (!pg_utf8_islegal(s, l))
@@ -1424,6 +1760,35 @@ pg_utf8_verifier(const unsigned char *s, int len)
return l;
}
+static int
+pg_utf8_verifystr(const unsigned char *s, int len)
+{
+ const unsigned char *start = s;
+
+ while (len > 0)
+ {
+ int l;
+
+ /* fast path for ASCII-subset characters */
+ if (!IS_HIGHBIT_SET(*s))
+ {
+ if (*s == '\0')
+ break;
+ l = 1;
+ }
+ else
+ {
+ l = pg_utf8_verifychar(s, len);
+ if (l == -1)
+ break;
+ }
+ s += l;
+ len -= l;
+ }
+
+ return s - start;
+}
+
/*
* Check for validity of a single UTF-8 encoded character
*
@@ -1503,48 +1868,48 @@ pg_utf8_islegal(const unsigned char *source, int length)
*-------------------------------------------------------------------
*/
const pg_wchar_tbl pg_wchar_table[] = {
- {pg_ascii2wchar_with_len, pg_wchar2single_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1}, /* PG_SQL_ASCII */
- {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3}, /* PG_EUC_JP */
- {pg_euccn2wchar_with_len, pg_wchar2euc_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2}, /* PG_EUC_CN */
- {pg_euckr2wchar_with_len, pg_wchar2euc_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3}, /* PG_EUC_KR */
- {pg_euctw2wchar_with_len, pg_wchar2euc_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4}, /* PG_EUC_TW */
- {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3}, /* PG_EUC_JIS_2004 */
- {pg_utf2wchar_with_len, pg_wchar2utf_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4}, /* PG_UTF8 */
- {pg_mule2wchar_with_len, pg_wchar2mule_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4}, /* PG_MULE_INTERNAL */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN1 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN2 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN3 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN4 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN5 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN6 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN7 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN8 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN9 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN10 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1256 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1258 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN866 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN874 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8R */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1251 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1252 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-5 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-6 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-7 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-8 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1250 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1253 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1254 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1255 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1257 */
- {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8U */
- {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */
- {0, 0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */
- {0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2}, /* PG_GBK */
- {0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2}, /* PG_UHC */
- {0, 0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4}, /* PG_GB18030 */
- {0, 0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3}, /* PG_JOHAB */
- {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2} /* PG_SHIFT_JIS_2004 */
+ {pg_ascii2wchar_with_len, pg_wchar2single_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifychar, pg_ascii_verifystr, 1}, /* PG_SQL_ASCII */
+ {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifychar, pg_eucjp_verifystr, 3}, /* PG_EUC_JP */
+ {pg_euccn2wchar_with_len, pg_wchar2euc_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifychar, pg_euccn_verifystr, 2}, /* PG_EUC_CN */
+ {pg_euckr2wchar_with_len, pg_wchar2euc_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifychar, pg_euckr_verifystr, 3}, /* PG_EUC_KR */
+ {pg_euctw2wchar_with_len, pg_wchar2euc_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifychar, pg_euctw_verifystr, 4}, /* PG_EUC_TW */
+ {pg_eucjp2wchar_with_len, pg_wchar2euc_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifychar, pg_eucjp_verifystr, 3}, /* PG_EUC_JIS_2004 */
+ {pg_utf2wchar_with_len, pg_wchar2utf_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifychar, pg_utf8_verifystr, 4}, /* PG_UTF8 */
+ {pg_mule2wchar_with_len, pg_wchar2mule_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifychar, pg_mule_verifystr, 4}, /* PG_MULE_INTERNAL */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN1 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN2 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN3 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN4 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN5 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN6 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN7 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN8 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN9 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_LATIN10 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1256 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1258 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN866 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN874 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_KOI8R */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1251 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1252 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* ISO-8859-5 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* ISO-8859-6 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* ISO-8859-7 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* ISO-8859-8 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1250 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1253 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1254 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1255 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_WIN1257 */
+ {pg_latin12wchar_with_len, pg_wchar2single_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifychar, pg_latin1_verifystr, 1}, /* PG_KOI8U */
+ {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifychar, pg_sjis_verifystr, 2}, /* PG_SJIS */
+ {0, 0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifychar, pg_big5_verifystr, 2}, /* PG_BIG5 */
+ {0, 0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifychar, pg_gbk_verifystr, 2}, /* PG_GBK */
+ {0, 0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifychar, pg_uhc_verifystr, 2}, /* PG_UHC */
+ {0, 0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifychar, pg_gb18030_verifystr, 4}, /* PG_GB18030 */
+ {0, 0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifychar, pg_johab_verifystr, 3}, /* PG_JOHAB */
+ {0, 0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifychar, pg_sjis_verifystr, 2} /* PG_SHIFT_JIS_2004 */
};
/*
@@ -1572,14 +1937,29 @@ pg_encoding_dsplen(int encoding, const char *mbstr)
/*
* Verify the first multibyte character of the given string.
* Return its byte length if good, -1 if bad. (See comments above for
- * full details of the mbverify API.)
+ * full details of the mbverifychar API.)
+ */
+int
+pg_encoding_verifymbchar(int encoding, const char *mbchar, int len)
+{
+ return (PG_VALID_ENCODING(encoding) ?
+ pg_wchar_table[encoding].mbverifychar((const unsigned char *) mbchar, len) :
+ pg_wchar_table[PG_SQL_ASCII].mbverifychar((const unsigned char *) mbchar, len));
+}
+
+/*
+ * Verify that a string is valid for the given encoding.
+ *
+ * Returns the number of input bytes (<= len) that form a valid string. If
+ * it equals 'len', the whole input is valid. Otherwise it is the index of
+ * the first invalid input byte.
*/
int
-pg_encoding_verifymb(int encoding, const char *mbstr, int len)
+pg_encoding_verifymbstr(int encoding, const char *mbstr, int len)
{
return (PG_VALID_ENCODING(encoding) ?
- pg_wchar_table[encoding].mbverify((const unsigned char *) mbstr, len) :
- pg_wchar_table[PG_SQL_ASCII].mbverify((const unsigned char *) mbstr, len));
+ pg_wchar_table[encoding].mbverifystr((const unsigned char *) mbstr, len) :
+ pg_wchar_table[PG_SQL_ASCII].mbverifystr((const unsigned char *) mbstr, len));
}
/*
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 494aefc7fab..549f2dd045d 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -371,7 +371,9 @@ typedef int (*mbdisplaylen_converter) (const unsigned char *mbstr);
typedef bool (*mbcharacter_incrementer) (unsigned char *mbstr, int len);
-typedef int (*mbverifier) (const unsigned char *mbstr, int len);
+typedef int (*mbchar_verifier) (const unsigned char *mbstr, int len);
+
+typedef int (*mbstr_verifier) (const unsigned char *mbstr, int len);
typedef struct
{
@@ -381,7 +383,8 @@ typedef struct
* to a multibyte */
mblen_converter mblen; /* get byte length of a char */
mbdisplaylen_converter dsplen; /* get display width of a char */
- mbverifier mbverify; /* verify multibyte sequence */
+ mbchar_verifier mbverifychar; /* verify multibyte character */
+ mbstr_verifier mbverifystr; /* verify multibyte string */
int maxmblen; /* max bytes for a char in this encoding */
} pg_wchar_tbl;
@@ -554,7 +557,8 @@ extern int pg_valid_server_encoding_id(int encoding);
*/
extern int pg_encoding_mblen(int encoding, const char *mbstr);
extern int pg_encoding_dsplen(int encoding, const char *mbstr);
-extern int pg_encoding_verifymb(int encoding, const char *mbstr, int len);
+extern int pg_encoding_verifymbchar(int encoding, const char *mbchar, int len);
+extern int pg_encoding_verifymbstr(int encoding, const char *mbstr, int len);
extern int pg_encoding_max_length(int encoding);
extern int pg_valid_client_encoding(const char *name);
extern int pg_valid_server_encoding(const char *name);
--
2.20.1
0002-Replace-pg_utf8_verifystr-with-a-faster-implementati.patchtext/x-patch; charset=UTF-8; name=0002-Replace-pg_utf8_verifystr-with-a-faster-implementati.patchDownload
From ccacdfe30614f10a79038df36fab228428335fe1 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 15 Dec 2020 11:12:45 +0200
Subject: [PATCH 2/5] Replace pg_utf8_verifystr() with a faster implementation.
This inlines the pg_utf8_verifychar() function into the loop. We could do
a lot more - there are much faster SIMD and lookup table based algorithms
out there - but I'll leave that for another patch.
In the passing, remove remnants of support for 5- and 6-byte UTF-8
characters. They were considered in very early Unicode versions, but the
current Unicode standard limits the number of code points to 17 planes
which are representable in 4 bytes in UTF-8, and there are no plans to ever
go beyond that.
---
src/common/wchar.c | 42 +++++++++++++++++++++---------------------
1 file changed, 21 insertions(+), 21 deletions(-)
diff --git a/src/common/wchar.c b/src/common/wchar.c
index 5ab29bcbc39..403974629f7 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -558,12 +558,6 @@ pg_utf_mblen(const unsigned char *s)
len = 3;
else if ((*s & 0xf8) == 0xf0)
len = 4;
-#ifdef NOT_USED
- else if ((*s & 0xfc) == 0xf8)
- len = 5;
- else if ((*s & 0xfe) == 0xfc)
- len = 6;
-#endif
else
len = 1;
return len;
@@ -1764,28 +1758,37 @@ static int
pg_utf8_verifystr(const unsigned char *s, int len)
{
const unsigned char *start = s;
+ const unsigned char *end = s + len;
- while (len > 0)
+ while (s < end)
{
- int l;
+ int l;
- /* fast path for ASCII-subset characters */
- if (!IS_HIGHBIT_SET(*s))
+ if ((*s & 0x80) == 0)
{
if (*s == '\0')
break;
- l = 1;
+
+ s++;
+ continue;
}
+ else if ((*s & 0xe0) == 0xc0)
+ l = 2;
+ else if ((*s & 0xf0) == 0xe0)
+ l = 3;
+ else if ((*s & 0xf8) == 0xf0)
+ l = 4;
else
- {
- l = pg_utf8_verifychar(s, len);
- if (l == -1)
- break;
- }
+ l = 1;
+
+ if (s + l > end)
+ break;
+
+ if (!pg_utf8_islegal(s, l))
+ break;
+
s += l;
- len -= l;
}
-
return s - start;
}
@@ -1810,9 +1813,6 @@ pg_utf8_islegal(const unsigned char *source, int length)
switch (length)
{
- default:
- /* reject lengths 5 and 6 for now */
- return false;
case 4:
a = source[3];
if (a < 0x80 || a > 0xBF)
--
2.20.1
0003-Add-direct-conversion-routines-between-EUC_TW-and-Bi.patchtext/x-patch; charset=UTF-8; name=0003-Add-direct-conversion-routines-between-EUC_TW-and-Bi.patchDownload
From 34b6b642a9619579ccd72d074d3a3d1ebbc3365b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 16 Dec 2020 12:01:33 +0200
Subject: [PATCH 3/5] Add direct conversion routines between EUC_TW and Big5.
Conversions between EUC_TW and Big5 were previously implemented by
converting the whole input to MIC first, and then from MIC to the
target encoding. Implement functions to convert directly between the
two.
The reason to do this now is that the next patch will change the
change the conversion function signature so that if the input is
invalid, we convert as much as we can and return the number of bytes
successfully converted. That's not possible if we use an intermediary
format, because if an error happens in the intermediary -> final
conversion, we lose track of the location of the invalid character in
the original input. Avoiding the intermediate step should be faster
too.
---
.../euc_tw_and_big5/euc_tw_and_big5.c | 146 ++++++++++++++++--
1 file changed, 136 insertions(+), 10 deletions(-)
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index 7e4c2697b07..7794d7ef8bf 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -37,6 +37,8 @@ PG_FUNCTION_INFO_V1(mic_to_big5);
* ----------
*/
+static void euc_tw2big5(const unsigned char *euc, unsigned char *p, int len);
+static void big52euc_tw(const unsigned char *euc, unsigned char *p, int len);
static void big52mic(const unsigned char *big5, unsigned char *p, int len);
static void mic2big5(const unsigned char *mic, unsigned char *p, int len);
static void euc_tw2mic(const unsigned char *euc, unsigned char *p, int len);
@@ -48,14 +50,10 @@ euc_tw_to_big5(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
- unsigned char *buf;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_TW, PG_BIG5);
- buf = palloc(len * ENCODING_GROWTH_RATE + 1);
- euc_tw2mic(src, buf, len);
- mic2big5(buf, dest, strlen((char *) buf));
- pfree(buf);
+ euc_tw2big5(src, dest, len);
PG_RETURN_VOID();
}
@@ -66,14 +64,10 @@ big5_to_euc_tw(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
- unsigned char *buf;
CHECK_ENCODING_CONVERSION_ARGS(PG_BIG5, PG_EUC_TW);
- buf = palloc(len * ENCODING_GROWTH_RATE + 1);
- big52mic(src, buf, len);
- mic2euc_tw(buf, dest, strlen((char *) buf));
- pfree(buf);
+ big52euc_tw(src, dest, len);
PG_RETURN_VOID();
}
@@ -134,6 +128,138 @@ mic_to_big5(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+
+/*
+ * EUC_TW ---> Big5
+ */
+static void
+euc_tw2big5(const unsigned char *euc, unsigned char *p, int len)
+{
+ unsigned char c1;
+ unsigned short big5buf,
+ cnsBuf;
+ unsigned char lc;
+ int l;
+
+ while (len > 0)
+ {
+ c1 = *euc;
+ if (IS_HIGHBIT_SET(c1))
+ {
+ /* Verify and decode the next EUC_TW input character */
+ l = pg_encoding_verifymbchar(PG_EUC_TW, (const char *) euc, len);
+ if (l < 0)
+ report_invalid_encoding(PG_EUC_TW,
+ (const char *) euc, len);
+ if (c1 == SS2)
+ {
+ c1 = euc[1]; /* plane No. */
+ if (c1 == 0xa1)
+ lc = LC_CNS11643_1;
+ else if (c1 == 0xa2)
+ lc = LC_CNS11643_2;
+ else
+ lc = c1 - 0xa3 + LC_CNS11643_3;
+ cnsBuf = (euc[2] << 8) | euc[3];
+ }
+ else
+ { /* CNS11643-1 */
+ lc = LC_CNS11643_1;
+ cnsBuf = (c1 << 8) | euc[1];
+ }
+
+ /* Write it out in Big5 */
+ big5buf = CNStoBIG5(cnsBuf, lc);
+ if (big5buf == 0)
+ report_untranslatable_char(PG_EUC_TW, PG_BIG5,
+ (const char *) euc, len);
+ *p++ = (big5buf >> 8) & 0x00ff;
+ *p++ = big5buf & 0x00ff;
+
+ euc += l;
+ len -= l;
+ }
+ else
+ { /* should be ASCII */
+ if (c1 == 0)
+ report_invalid_encoding(PG_EUC_TW,
+ (const char *) euc, len);
+ *p++ = c1;
+ euc++;
+ len--;
+ }
+ }
+ *p = '\0';
+}
+
+
+/*
+ * Big5 ---> EUC_TW
+ */
+static void
+big52euc_tw(const unsigned char *big5, unsigned char *p, int len)
+{
+ unsigned short c1;
+ unsigned short big5buf,
+ cnsBuf;
+ unsigned char lc;
+ int l;
+
+ while (len > 0)
+ {
+ /* Verify and decode the next Big5 input character */
+ c1 = *big5;
+ if (IS_HIGHBIT_SET(c1))
+ {
+ l = pg_encoding_verifymbchar(PG_BIG5, (const char *) big5, len);
+ if (l < 0)
+ report_invalid_encoding(PG_BIG5,
+ (const char *) big5, len);
+ big5buf = (c1 << 8) | big5[1];
+ cnsBuf = BIG5toCNS(big5buf, &lc);
+
+ if (lc == LC_CNS11643_1)
+ {
+ *p++ = (cnsBuf >> 8) & 0x00ff;
+ *p++ = cnsBuf & 0x00ff;
+ }
+ else if (lc == LC_CNS11643_2)
+ {
+ *p++ = SS2;
+ *p++ = 0xa2;
+ *p++ = (cnsBuf >> 8) & 0x00ff;
+ *p++ = cnsBuf & 0x00ff;
+ }
+ else if (lc >= LC_CNS11643_3 && lc <= LC_CNS11643_7)
+ {
+ *p++ = SS2;
+ *p++ = lc - LC_CNS11643_3 + 0xa3;
+ *p++ = (cnsBuf >> 8) & 0x00ff;
+ *p++ = cnsBuf & 0x00ff;
+ }
+ else
+ report_untranslatable_char(PG_BIG5, PG_EUC_TW,
+ (const char *) big5, len);
+
+ big5 += l;
+ len -= l;
+ }
+ else
+ {
+ /* ASCII */
+ if (c1 == 0)
+ report_invalid_encoding(PG_BIG5,
+ (const char *) big5, len);
+ *p++ = c1;
+ big5++;
+ len--;
+ continue;
+ }
+ }
+ *p = '\0';
+}
+
+
/*
* EUC_TW ---> MIC
*/
--
2.20.1
0004-Change-conversion-function-signature.patchtext/x-patch; charset=UTF-8; name=0004-Change-conversion-function-signature.patchDownload
From 5419d17f626da2d8b700e977cb5f3b856a069daf Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 16 Dec 2020 12:13:36 +0200
Subject: [PATCH 4/5] Change conversion function signature.
Add a 'noError' argument, so that we can try to convert a buffer without
knowing the character boundaries beforehand. The functions now need to
return the number of input bytes successfully converted.
TODO: Upgrade?
---
doc/src/sgml/ref/create_conversion.sgml | 5 +-
src/backend/commands/conversioncmds.c | 27 +-
src/backend/utils/error/elog.c | 2 +
src/backend/utils/mb/conv.c | 112 +++++-
.../cyrillic_and_mic/cyrillic_and_mic.c | 127 ++++---
.../euc2004_sjis2004/euc2004_sjis2004.c | 96 ++++-
.../euc_cn_and_mic/euc_cn_and_mic.c | 57 ++-
.../euc_jp_and_sjis/euc_jp_and_sjis.c | 153 ++++++--
.../euc_kr_and_mic/euc_kr_and_mic.c | 57 ++-
.../euc_tw_and_big5/euc_tw_and_big5.c | 165 +++++++--
.../latin2_and_win1250/latin2_and_win1250.c | 49 ++-
.../latin_and_mic/latin_and_mic.c | 43 ++-
.../utf8_and_big5/utf8_and_big5.c | 37 +-
.../utf8_and_cyrillic/utf8_and_cyrillic.c | 67 ++--
.../utf8_and_euc2004/utf8_and_euc2004.c | 37 +-
.../utf8_and_euc_cn/utf8_and_euc_cn.c | 37 +-
.../utf8_and_euc_jp/utf8_and_euc_jp.c | 37 +-
.../utf8_and_euc_kr/utf8_and_euc_kr.c | 37 +-
.../utf8_and_euc_tw/utf8_and_euc_tw.c | 37 +-
.../utf8_and_gb18030/utf8_and_gb18030.c | 37 +-
.../utf8_and_gbk/utf8_and_gbk.c | 37 +-
.../utf8_and_iso8859/utf8_and_iso8859.c | 43 ++-
.../utf8_and_iso8859_1/utf8_and_iso8859_1.c | 27 +-
.../utf8_and_johab/utf8_and_johab.c | 37 +-
.../utf8_and_sjis/utf8_and_sjis.c | 37 +-
.../utf8_and_sjis2004/utf8_and_sjis2004.c | 37 +-
.../utf8_and_uhc/utf8_and_uhc.c | 37 +-
.../utf8_and_win/utf8_and_win.c | 43 ++-
src/backend/utils/mb/mbutils.c | 13 +-
src/include/catalog/pg_proc.dat | 332 +++++++++---------
src/include/mb/pg_wchar.h | 49 +--
src/test/regress/expected/opr_sanity.out | 7 +-
src/test/regress/sql/opr_sanity.sql | 7 +-
33 files changed, 1292 insertions(+), 633 deletions(-)
diff --git a/doc/src/sgml/ref/create_conversion.sgml b/doc/src/sgml/ref/create_conversion.sgml
index e7700fecfc5..f014a676c88 100644
--- a/doc/src/sgml/ref/create_conversion.sgml
+++ b/doc/src/sgml/ref/create_conversion.sgml
@@ -117,8 +117,9 @@ conv_proc(
integer, -- destination encoding ID
cstring, -- source string (null terminated C string)
internal, -- destination (fill with a null terminated C string)
- integer -- source string length
-) RETURNS void;
+ integer, -- source string length
+ boolean -- if true, don't throw an error if conversion fails
+) RETURNS integer;
</programlisting></para>
</listitem>
</varlistentry>
diff --git a/src/backend/commands/conversioncmds.c b/src/backend/commands/conversioncmds.c
index 0ee3b6d19a3..2d032a802f5 100644
--- a/src/backend/commands/conversioncmds.c
+++ b/src/backend/commands/conversioncmds.c
@@ -45,8 +45,9 @@ CreateConversionCommand(CreateConversionStmt *stmt)
const char *from_encoding_name = stmt->for_encoding_name;
const char *to_encoding_name = stmt->to_encoding_name;
List *func_name = stmt->func_name;
- static const Oid funcargs[] = {INT4OID, INT4OID, CSTRINGOID, INTERNALOID, INT4OID};
+ static const Oid funcargs[] = {INT4OID, INT4OID, CSTRINGOID, INTERNALOID, INT4OID, BOOLOID};
char result[1];
+ Datum funcresult;
/* Convert list of names to a name and namespace */
namespaceId = QualifiedNameGetCreationNamespace(stmt->conversion_name,
@@ -92,8 +93,8 @@ CreateConversionCommand(CreateConversionStmt *stmt)
funcoid = LookupFuncName(func_name, sizeof(funcargs) / sizeof(Oid),
funcargs, false);
- /* Check it returns VOID, else it's probably the wrong function */
- if (get_func_rettype(funcoid) != VOIDOID)
+ /* Check it returns int4, else it's probably the wrong function */
+ if (get_func_rettype(funcoid) != INT4OID)
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("encoding conversion function %s must return type %s",
@@ -111,12 +112,20 @@ CreateConversionCommand(CreateConversionStmt *stmt)
* string; the conversion function should throw an error if it can't
* perform the requested conversion.
*/
- OidFunctionCall5(funcoid,
- Int32GetDatum(from_encoding),
- Int32GetDatum(to_encoding),
- CStringGetDatum(""),
- CStringGetDatum(result),
- Int32GetDatum(0));
+ funcresult = OidFunctionCall6(funcoid,
+ Int32GetDatum(from_encoding),
+ Int32GetDatum(to_encoding),
+ CStringGetDatum(""),
+ CStringGetDatum(result),
+ Int32GetDatum(0),
+ BoolGetDatum(false));
+
+ /* The function should return 0 for empty input. Might as well check that, too. */
+ if (DatumGetInt32(funcresult) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("encoding conversion function %s returned incorrect result for empty input",
+ NameListToString(func_name))));
/*
* All seem ok, go ahead (possible failure would be a duplicate conversion
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 3558e660c73..8b9d794dd33 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2266,6 +2266,8 @@ write_console(const char *line, int len)
* Conversion on non-win32 platforms is not implemented yet. It requires
* non-throw version of pg_do_encoding_conversion(), that converts
* unconvertable characters to '?' without errors.
+ *
+ * XXX: We have a no-throw version now. It doesn't convert to '?' though.
*/
#endif
diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c
index 192948caad2..91251cf70e7 100644
--- a/src/backend/utils/mb/conv.c
+++ b/src/backend/utils/mb/conv.c
@@ -26,14 +26,16 @@
* starting from 128 (0x80). each entry in the table holds the corresponding
* code point for the target charset, or 0 if there is no equivalent code.
*/
-void
+int
local2local(const unsigned char *l,
unsigned char *p,
int len,
int src_encoding,
int dest_encoding,
- const unsigned char *tab)
+ const unsigned char *tab,
+ bool noError)
{
+ const unsigned char *start = l;
unsigned char c1,
c2;
@@ -41,7 +43,11 @@ local2local(const unsigned char *l,
{
c1 = *l;
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(src_encoding, (const char *) l, len);
+ }
if (!IS_HIGHBIT_SET(c1))
*p++ = c1;
else
@@ -50,13 +56,19 @@ local2local(const unsigned char *l,
if (c2)
*p++ = c2;
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(src_encoding, dest_encoding,
(const char *) l, len);
+ }
}
l++;
len--;
}
*p = '\0';
+
+ return l - start;
}
/*
@@ -67,17 +79,22 @@ local2local(const unsigned char *l,
* lc is the mule character set id for the local encoding
* encoding is the PG identifier for the local encoding
*/
-void
+int
latin2mic(const unsigned char *l, unsigned char *p, int len,
- int lc, int encoding)
+ int lc, int encoding, bool noError)
{
+ const unsigned char *start = l;
int c1;
while (len > 0)
{
c1 = *l;
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(encoding, (const char *) l, len);
+ }
if (IS_HIGHBIT_SET(c1))
*p++ = lc;
*p++ = c1;
@@ -85,6 +102,8 @@ latin2mic(const unsigned char *l, unsigned char *p, int len,
len--;
}
*p = '\0';
+
+ return l - start;
}
/*
@@ -95,17 +114,22 @@ latin2mic(const unsigned char *l, unsigned char *p, int len,
* lc is the mule character set id for the local encoding
* encoding is the PG identifier for the local encoding
*/
-void
+int
mic2latin(const unsigned char *mic, unsigned char *p, int len,
- int lc, int encoding)
+ int lc, int encoding, bool noError)
{
+ const unsigned char *start = mic;
int c1;
while (len > 0)
{
c1 = *mic;
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL, (const char *) mic, len);
+ }
if (!IS_HIGHBIT_SET(c1))
{
/* easy for ASCII */
@@ -118,17 +142,27 @@ mic2latin(const unsigned char *mic, unsigned char *p, int len,
int l = pg_mule_mblen(mic);
if (len < l)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL, (const char *) mic,
len);
+ }
if (l != 2 || c1 != lc || !IS_HIGHBIT_SET(mic[1]))
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, encoding,
(const char *) mic, len);
+ }
*p++ = mic[1];
mic += 2;
len -= 2;
}
}
*p = '\0';
+
+ return mic - start;
}
@@ -144,14 +178,16 @@ mic2latin(const unsigned char *mic, unsigned char *p, int len,
* starting from 128 (0x80). each entry in the table holds the corresponding
* code point for the mule encoding, or 0 if there is no equivalent code.
*/
-void
+int
latin2mic_with_table(const unsigned char *l,
unsigned char *p,
int len,
int lc,
int encoding,
- const unsigned char *tab)
+ const unsigned char *tab,
+ bool noError)
{
+ const unsigned char *start = l;
unsigned char c1,
c2;
@@ -159,7 +195,11 @@ latin2mic_with_table(const unsigned char *l,
{
c1 = *l;
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(encoding, (const char *) l, len);
+ }
if (!IS_HIGHBIT_SET(c1))
*p++ = c1;
else
@@ -171,13 +211,19 @@ latin2mic_with_table(const unsigned char *l,
*p++ = c2;
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(encoding, PG_MULE_INTERNAL,
(const char *) l, len);
+ }
}
l++;
len--;
}
*p = '\0';
+
+ return l - start;
}
/*
@@ -192,14 +238,16 @@ latin2mic_with_table(const unsigned char *l,
* starting from 128 (0x80). each entry in the table holds the corresponding
* code point for the local charset, or 0 if there is no equivalent code.
*/
-void
+int
mic2latin_with_table(const unsigned char *mic,
unsigned char *p,
int len,
int lc,
int encoding,
- const unsigned char *tab)
+ const unsigned char *tab,
+ bool noError)
{
+ const unsigned char *start = mic;
unsigned char c1,
c2;
@@ -207,7 +255,11 @@ mic2latin_with_table(const unsigned char *mic,
{
c1 = *mic;
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL, (const char *) mic, len);
+ }
if (!IS_HIGHBIT_SET(c1))
{
/* easy for ASCII */
@@ -220,11 +272,17 @@ mic2latin_with_table(const unsigned char *mic,
int l = pg_mule_mblen(mic);
if (len < l)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL, (const char *) mic,
len);
+ }
if (l != 2 || c1 != lc || !IS_HIGHBIT_SET(mic[1]) ||
(c2 = tab[mic[1] - HIGHBIT]) == 0)
{
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, encoding,
(const char *) mic, len);
break; /* keep compiler quiet */
@@ -235,6 +293,8 @@ mic2latin_with_table(const unsigned char *mic,
}
}
*p = '\0';
+
+ return mic - start;
}
/*
@@ -425,17 +485,19 @@ pg_mb_radix_conv(const pg_mb_radix_tree *rt,
*
* See pg_wchar.h for more details about the data structures used here.
*/
-void
+int
UtfToLocal(const unsigned char *utf, int len,
unsigned char *iso,
const pg_mb_radix_tree *map,
const pg_utf_to_local_combined *cmap, int cmapsize,
utf_local_conversion_func conv_func,
- int encoding)
+ int encoding, bool noError)
{
uint32 iutf;
int l;
const pg_utf_to_local_combined *cp;
+ const unsigned char *start = utf;
+ const unsigned char *cur = utf;
if (!PG_VALID_ENCODING(encoding))
ereport(ERROR,
@@ -449,6 +511,8 @@ UtfToLocal(const unsigned char *utf, int len,
unsigned char b3 = 0;
unsigned char b4 = 0;
+ cur = iso;
+
/* "break" cases all represent errors */
if (*utf == '\0')
break;
@@ -584,15 +648,19 @@ UtfToLocal(const unsigned char *utf, int len,
}
/* failed to translate this character */
+ if (noError)
+ break;
report_untranslatable_char(PG_UTF8, encoding,
(const char *) (utf - l), len);
}
/* if we broke out of loop early, must be invalid input */
- if (len > 0)
+ if (len > 0 && !noError)
report_invalid_encoding(PG_UTF8, (const char *) utf, len);
*iso = '\0';
+
+ return cur - start;
}
/*
@@ -616,18 +684,24 @@ UtfToLocal(const unsigned char *utf, int len,
* (if provided) is applied. An error is raised if no match is found.
*
* See pg_wchar.h for more details about the data structures used here.
+ *
+ * Returns the number of input bytes consumed. If noError is true, this can
+ * be less than 'len'.
*/
-void
+int
LocalToUtf(const unsigned char *iso, int len,
unsigned char *utf,
const pg_mb_radix_tree *map,
const pg_local_to_utf_combined *cmap, int cmapsize,
utf_local_conversion_func conv_func,
- int encoding)
+ int encoding,
+ bool noError)
{
uint32 iiso;
int l;
const pg_local_to_utf_combined *cp;
+ const unsigned char *start = iso;
+ const unsigned char *cur = iso;
if (!PG_VALID_ENCODING(encoding))
ereport(ERROR,
@@ -641,6 +715,8 @@ LocalToUtf(const unsigned char *iso, int len,
unsigned char b3 = 0;
unsigned char b4 = 0;
+ cur = iso;
+
/* "break" cases all represent errors */
if (*iso == '\0')
break;
@@ -723,13 +799,17 @@ LocalToUtf(const unsigned char *iso, int len,
}
/* failed to translate this character */
+ if (noError)
+ break;
report_untranslatable_char(encoding, PG_UTF8,
(const char *) (iso - l), len);
}
/* if we broke out of loop early, must be invalid input */
- if (len > 0)
+ if (len > 0 && !noError)
report_invalid_encoding(encoding, (const char *) iso, len);
*utf = '\0';
+
+ return cur - start;
}
diff --git a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
index 376b48ca611..986c0c0c37d 100644
--- a/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
@@ -44,8 +44,11 @@ PG_FUNCTION_INFO_V1(win866_to_iso);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -306,12 +309,14 @@ koi8r_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_KOI8R, PG_MULE_INTERNAL);
- latin2mic(src, dest, len, LC_KOI8_R, PG_KOI8R);
+ converted = latin2mic(src, dest, len, LC_KOI8_R, PG_KOI8R, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -320,12 +325,14 @@ mic_to_koi8r(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_KOI8R);
- mic2latin(src, dest, len, LC_KOI8_R, PG_KOI8R);
+ converted = mic2latin(src, dest, len, LC_KOI8_R, PG_KOI8R, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -334,12 +341,14 @@ iso_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_ISO_8859_5, PG_MULE_INTERNAL);
- latin2mic_with_table(src, dest, len, LC_KOI8_R, PG_ISO_8859_5, iso2koi);
+ converted = latin2mic_with_table(src, dest, len, LC_KOI8_R, PG_ISO_8859_5, iso2koi, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -348,12 +357,14 @@ mic_to_iso(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_ISO_8859_5);
- mic2latin_with_table(src, dest, len, LC_KOI8_R, PG_ISO_8859_5, koi2iso);
+ converted = mic2latin_with_table(src, dest, len, LC_KOI8_R, PG_ISO_8859_5, koi2iso, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -362,12 +373,14 @@ win1251_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN1251, PG_MULE_INTERNAL);
- latin2mic_with_table(src, dest, len, LC_KOI8_R, PG_WIN1251, win12512koi);
+ converted = latin2mic_with_table(src, dest, len, LC_KOI8_R, PG_WIN1251, win12512koi, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -376,12 +389,14 @@ mic_to_win1251(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_WIN1251);
- mic2latin_with_table(src, dest, len, LC_KOI8_R, PG_WIN1251, koi2win1251);
+ converted = mic2latin_with_table(src, dest, len, LC_KOI8_R, PG_WIN1251, koi2win1251, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -390,12 +405,14 @@ win866_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN866, PG_MULE_INTERNAL);
- latin2mic_with_table(src, dest, len, LC_KOI8_R, PG_WIN866, win8662koi);
+ converted = latin2mic_with_table(src, dest, len, LC_KOI8_R, PG_WIN866, win8662koi, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -404,12 +421,14 @@ mic_to_win866(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_WIN866);
- mic2latin_with_table(src, dest, len, LC_KOI8_R, PG_WIN866, koi2win866);
+ converted = mic2latin_with_table(src, dest, len, LC_KOI8_R, PG_WIN866, koi2win866, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -418,12 +437,14 @@ koi8r_to_win1251(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_KOI8R, PG_WIN1251);
- local2local(src, dest, len, PG_KOI8R, PG_WIN1251, koi2win1251);
+ converted = local2local(src, dest, len, PG_KOI8R, PG_WIN1251, koi2win1251, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -432,12 +453,14 @@ win1251_to_koi8r(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN1251, PG_KOI8R);
- local2local(src, dest, len, PG_WIN1251, PG_KOI8R, win12512koi);
+ converted = local2local(src, dest, len, PG_WIN1251, PG_KOI8R, win12512koi, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -446,12 +469,14 @@ koi8r_to_win866(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_KOI8R, PG_WIN866);
- local2local(src, dest, len, PG_KOI8R, PG_WIN866, koi2win866);
+ converted = local2local(src, dest, len, PG_KOI8R, PG_WIN866, koi2win866, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -460,12 +485,14 @@ win866_to_koi8r(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN866, PG_KOI8R);
- local2local(src, dest, len, PG_WIN866, PG_KOI8R, win8662koi);
+ converted = local2local(src, dest, len, PG_WIN866, PG_KOI8R, win8662koi, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -474,12 +501,14 @@ win866_to_win1251(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN866, PG_WIN1251);
- local2local(src, dest, len, PG_WIN866, PG_WIN1251, win8662win1251);
+ converted = local2local(src, dest, len, PG_WIN866, PG_WIN1251, win8662win1251, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -488,12 +517,14 @@ win1251_to_win866(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN1251, PG_WIN866);
- local2local(src, dest, len, PG_WIN1251, PG_WIN866, win12512win866);
+ converted = local2local(src, dest, len, PG_WIN1251, PG_WIN866, win12512win866, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -502,12 +533,14 @@ iso_to_koi8r(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_ISO_8859_5, PG_KOI8R);
- local2local(src, dest, len, PG_ISO_8859_5, PG_KOI8R, iso2koi);
+ converted = local2local(src, dest, len, PG_ISO_8859_5, PG_KOI8R, iso2koi, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -516,12 +549,14 @@ koi8r_to_iso(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_KOI8R, PG_ISO_8859_5);
- local2local(src, dest, len, PG_KOI8R, PG_ISO_8859_5, koi2iso);
+ converted = local2local(src, dest, len, PG_KOI8R, PG_ISO_8859_5, koi2iso, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -530,12 +565,14 @@ iso_to_win1251(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_ISO_8859_5, PG_WIN1251);
- local2local(src, dest, len, PG_ISO_8859_5, PG_WIN1251, iso2win1251);
+ converted = local2local(src, dest, len, PG_ISO_8859_5, PG_WIN1251, iso2win1251, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -544,12 +581,14 @@ win1251_to_iso(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN1251, PG_ISO_8859_5);
- local2local(src, dest, len, PG_WIN1251, PG_ISO_8859_5, win12512iso);
+ converted = local2local(src, dest, len, PG_WIN1251, PG_ISO_8859_5, win12512iso, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -558,12 +597,14 @@ iso_to_win866(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_ISO_8859_5, PG_WIN866);
- local2local(src, dest, len, PG_ISO_8859_5, PG_WIN866, iso2win866);
+ converted = local2local(src, dest, len, PG_ISO_8859_5, PG_WIN866, iso2win866, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -572,10 +613,12 @@ win866_to_iso(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN866, PG_ISO_8859_5);
- local2local(src, dest, len, PG_WIN866, PG_ISO_8859_5, win8662iso);
+ converted = local2local(src, dest, len, PG_WIN866, PG_ISO_8859_5, win8662iso, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
index 3628e690aa1..40f231c12dc 100644
--- a/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/euc2004_sjis2004/euc2004_sjis2004.c
@@ -19,8 +19,8 @@ PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(euc_jis_2004_to_shift_jis_2004);
PG_FUNCTION_INFO_V1(shift_jis_2004_to_euc_jis_2004);
-static void euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len);
-static void shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len);
+static int euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len, bool noError);
+static int shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len, bool noError);
/* ----------
* conv_proc(
@@ -28,8 +28,11 @@ static void shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -39,12 +42,14 @@ euc_jis_2004_to_shift_jis_2004(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_JIS_2004, PG_SHIFT_JIS_2004);
- euc_jis_20042shift_jis_2004(src, dest, len);
+ converted = euc_jis_20042shift_jis_2004(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -53,20 +58,23 @@ shift_jis_2004_to_euc_jis_2004(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_SHIFT_JIS_2004, PG_EUC_JIS_2004);
- shift_jis_20042euc_jis_2004(src, dest, len);
+ converted = shift_jis_20042euc_jis_2004(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
/*
* EUC_JIS_2004 -> SHIFT_JIS_2004
*/
-static void
-euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
+static int
+euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = euc;
int c1,
ku,
ten;
@@ -79,8 +87,12 @@ euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JIS_2004,
(const char *) euc, len);
+ }
*p++ = c1;
euc++;
len--;
@@ -90,8 +102,12 @@ euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
l = pg_encoding_verifymbchar(PG_EUC_JIS_2004, (const char *) euc, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JIS_2004,
(const char *) euc, len);
+ }
if (c1 == SS2 && l == 2) /* JIS X 0201 kana? */
{
@@ -121,8 +137,12 @@ euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
*p++ = (ku + 0x19b) >> 1;
}
else
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JIS_2004,
(const char *) euc, len);
+ }
}
if (ku % 2)
@@ -132,8 +152,12 @@ euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
else if (ten >= 64 && ten <= 94)
*p++ = ten + 0x40;
else
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JIS_2004,
(const char *) euc, len);
+ }
}
else
*p++ = ten + 0x9e;
@@ -149,8 +173,12 @@ euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
else if (ku >= 63 && ku <= 94)
*p++ = (ku + 0x181) >> 1;
else
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JIS_2004,
(const char *) euc, len);
+ }
if (ku % 2)
{
@@ -159,20 +187,30 @@ euc_jis_20042shift_jis_2004(const unsigned char *euc, unsigned char *p, int len)
else if (ten >= 64 && ten <= 94)
*p++ = ten + 0x40;
else
- report_invalid_encoding(PG_EUC_JIS_2004,
+ {
+ if (noError)
+ break;
+ report_invalid_encoding(PG_EUC_JIS_2004,
(const char *) euc, len);
+ }
}
else
*p++ = ten + 0x9e;
}
else
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JIS_2004,
(const char *) euc, len);
+ }
euc += l;
len -= l;
}
*p = '\0';
+
+ return euc - start;
}
/*
@@ -212,9 +250,10 @@ get_ten(int b, int *ku)
* SHIFT_JIS_2004 ---> EUC_JIS_2004
*/
-static void
-shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len)
+static int
+shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = sjis;
int c1;
int ku,
ten,
@@ -230,8 +269,12 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SHIFT_JIS_2004,
(const char *) sjis, len);
+ }
*p++ = c1;
sjis++;
len--;
@@ -241,8 +284,12 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
l = pg_encoding_verifymbchar(PG_SHIFT_JIS_2004, (const char *) sjis, len);
if (l < 0 || l > len)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SHIFT_JIS_2004,
(const char *) sjis, len);
+ }
if (c1 >= 0xa1 && c1 <= 0xdf && l == 1)
{
@@ -266,8 +313,12 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
ku = (c1 << 1) - 0x100;
ten = get_ten(c2, &kubun);
if (ten < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SHIFT_JIS_2004,
(const char *) sjis, len);
+ }
ku -= kubun;
}
else if (c1 >= 0xe0 && c1 <= 0xef) /* plane 1 62ku-94ku */
@@ -275,9 +326,12 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
ku = (c1 << 1) - 0x180;
ten = get_ten(c2, &kubun);
if (ten < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SHIFT_JIS_2004,
-
(const char *) sjis, len);
+ }
ku -= kubun;
}
else if (c1 >= 0xf0 && c1 <= 0xf3) /* plane 2
@@ -286,8 +340,12 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
plane = 2;
ten = get_ten(c2, &kubun);
if (ten < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SHIFT_JIS_2004,
(const char *) sjis, len);
+ }
switch (c1)
{
case 0xf0:
@@ -309,16 +367,24 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
plane = 2;
ten = get_ten(c2, &kubun);
if (ten < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SHIFT_JIS_2004,
(const char *) sjis, len);
+ }
if (c1 == 0xf4 && kubun == 1)
ku = 15;
else
ku = (c1 << 1) - 0x19a - kubun;
}
else
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SHIFT_JIS_2004,
(const char *) sjis, len);
+ }
if (plane == 2)
*p++ = SS3;
@@ -330,4 +396,6 @@ shift_jis_20042euc_jis_2004(const unsigned char *sjis, unsigned char *p, int len
len -= l;
}
*p = '\0';
+
+ return sjis - start;
}
diff --git a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
index 59c6c3bb129..ad9ebac39b1 100644
--- a/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_cn_and_mic/euc_cn_and_mic.c
@@ -26,13 +26,16 @@ PG_FUNCTION_INFO_V1(mic_to_euc_cn);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
-static void euc_cn2mic(const unsigned char *euc, unsigned char *p, int len);
-static void mic2euc_cn(const unsigned char *mic, unsigned char *p, int len);
+static int euc_cn2mic(const unsigned char *euc, unsigned char *p, int len, bool noError);
+static int mic2euc_cn(const unsigned char *mic, unsigned char *p, int len, bool noError);
Datum
euc_cn_to_mic(PG_FUNCTION_ARGS)
@@ -40,12 +43,14 @@ euc_cn_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_CN, PG_MULE_INTERNAL);
- euc_cn2mic(src, dest, len);
+ converted = euc_cn2mic(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -54,20 +59,23 @@ mic_to_euc_cn(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_EUC_CN);
- mic2euc_cn(src, dest, len);
+ converted = mic2euc_cn(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
/*
* EUC_CN ---> MIC
*/
-static void
-euc_cn2mic(const unsigned char *euc, unsigned char *p, int len)
+static int
+euc_cn2mic(const unsigned char *euc, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = euc;
int c1;
while (len > 0)
@@ -76,7 +84,11 @@ euc_cn2mic(const unsigned char *euc, unsigned char *p, int len)
if (IS_HIGHBIT_SET(c1))
{
if (len < 2 || !IS_HIGHBIT_SET(euc[1]))
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_CN, (const char *) euc, len);
+ }
*p++ = LC_GB2312_80;
*p++ = c1;
*p++ = euc[1];
@@ -86,21 +98,28 @@ euc_cn2mic(const unsigned char *euc, unsigned char *p, int len)
else
{ /* should be ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_CN, (const char *) euc, len);
+ }
*p++ = c1;
euc++;
len--;
}
}
*p = '\0';
+
+ return euc - start;
}
/*
* MIC ---> EUC_CN
*/
-static void
-mic2euc_cn(const unsigned char *mic, unsigned char *p, int len)
+static int
+mic2euc_cn(const unsigned char *mic, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = mic;
int c1;
while (len > 0)
@@ -109,11 +128,19 @@ mic2euc_cn(const unsigned char *mic, unsigned char *p, int len)
if (IS_HIGHBIT_SET(c1))
{
if (c1 != LC_GB2312_80)
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, PG_EUC_CN,
(const char *) mic, len);
+ }
if (len < 3 || !IS_HIGHBIT_SET(mic[1]) || !IS_HIGHBIT_SET(mic[2]))
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
mic++;
*p++ = *mic++;
*p++ = *mic++;
@@ -122,12 +149,18 @@ mic2euc_cn(const unsigned char *mic, unsigned char *p, int len)
else
{ /* should be ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
*p++ = c1;
mic++;
len--;
}
}
*p = '\0';
+
+ return mic - start;
}
diff --git a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
index ea05436596d..81064cb6e98 100644
--- a/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/euc_jp_and_sjis.c
@@ -42,17 +42,20 @@ PG_FUNCTION_INFO_V1(mic_to_sjis);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
-static void sjis2mic(const unsigned char *sjis, unsigned char *p, int len);
-static void mic2sjis(const unsigned char *mic, unsigned char *p, int len);
-static void euc_jp2mic(const unsigned char *euc, unsigned char *p, int len);
-static void mic2euc_jp(const unsigned char *mic, unsigned char *p, int len);
-static void euc_jp2sjis(const unsigned char *mic, unsigned char *p, int len);
-static void sjis2euc_jp(const unsigned char *mic, unsigned char *p, int len);
+static int sjis2mic(const unsigned char *sjis, unsigned char *p, int len, bool noError);
+static int mic2sjis(const unsigned char *mic, unsigned char *p, int len, bool noError);
+static int euc_jp2mic(const unsigned char *euc, unsigned char *p, int len, bool noError);
+static int mic2euc_jp(const unsigned char *mic, unsigned char *p, int len, bool noError);
+static int euc_jp2sjis(const unsigned char *mic, unsigned char *p, int len, bool noError);
+static int sjis2euc_jp(const unsigned char *mic, unsigned char *p, int len, bool noError);
Datum
euc_jp_to_sjis(PG_FUNCTION_ARGS)
@@ -60,12 +63,14 @@ euc_jp_to_sjis(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_JP, PG_SJIS);
- euc_jp2sjis(src, dest, len);
+ converted = euc_jp2sjis(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -74,12 +79,14 @@ sjis_to_euc_jp(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_SJIS, PG_EUC_JP);
- sjis2euc_jp(src, dest, len);
+ converted = sjis2euc_jp(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -88,12 +95,14 @@ euc_jp_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_JP, PG_MULE_INTERNAL);
- euc_jp2mic(src, dest, len);
+ converted = euc_jp2mic(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -102,12 +111,14 @@ mic_to_euc_jp(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_EUC_JP);
- mic2euc_jp(src, dest, len);
+ converted = mic2euc_jp(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -116,12 +127,14 @@ sjis_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_SJIS, PG_MULE_INTERNAL);
- sjis2mic(src, dest, len);
+ converted = sjis2mic(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -130,20 +143,23 @@ mic_to_sjis(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_SJIS);
- mic2sjis(src, dest, len);
+ converted = mic2sjis(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
/*
* SJIS ---> MIC
*/
-static void
-sjis2mic(const unsigned char *sjis, unsigned char *p, int len)
+static int
+sjis2mic(const unsigned char *sjis, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = sjis;
int c1,
c2,
i,
@@ -167,7 +183,11 @@ sjis2mic(const unsigned char *sjis, unsigned char *p, int len)
* JIS X0208, X0212, user defined extended characters
*/
if (len < 2 || !ISSJISHEAD(c1) || !ISSJISTAIL(sjis[1]))
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SJIS, (const char *) sjis, len);
+ }
c2 = sjis[1];
k = (c1 << 8) + c2;
if (k >= 0xed40 && k < 0xf040)
@@ -257,21 +277,28 @@ sjis2mic(const unsigned char *sjis, unsigned char *p, int len)
else
{ /* should be ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SJIS, (const char *) sjis, len);
+ }
*p++ = c1;
sjis++;
len--;
}
}
*p = '\0';
+
+ return sjis - start;
}
/*
* MIC ---> SJIS
*/
-static void
-mic2sjis(const unsigned char *mic, unsigned char *p, int len)
+static int
+mic2sjis(const unsigned char *mic, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = mic;
int c1,
c2,
k,
@@ -284,8 +311,12 @@ mic2sjis(const unsigned char *mic, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
*p++ = c1;
mic++;
len--;
@@ -293,8 +324,12 @@ mic2sjis(const unsigned char *mic, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
if (c1 == LC_JISX0201K)
*p++ = mic[1];
else if (c1 == LC_JISX0208)
@@ -350,20 +385,27 @@ mic2sjis(const unsigned char *mic, unsigned char *p, int len)
}
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, PG_SJIS,
(const char *) mic, len);
+ }
mic += l;
len -= l;
}
*p = '\0';
+
+ return mic - start;
}
/*
* EUC_JP ---> MIC
*/
-static void
-euc_jp2mic(const unsigned char *euc, unsigned char *p, int len)
+static int
+euc_jp2mic(const unsigned char *euc, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = euc;
int c1;
int l;
@@ -374,8 +416,12 @@ euc_jp2mic(const unsigned char *euc, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JP,
(const char *) euc, len);
+ }
*p++ = c1;
euc++;
len--;
@@ -383,8 +429,12 @@ euc_jp2mic(const unsigned char *euc, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_EUC_JP, (const char *) euc, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JP,
(const char *) euc, len);
+ }
if (c1 == SS2)
{ /* 1 byte kana? */
*p++ = LC_JISX0201K;
@@ -406,14 +456,17 @@ euc_jp2mic(const unsigned char *euc, unsigned char *p, int len)
len -= l;
}
*p = '\0';
+
+ return euc - start;
}
/*
* MIC ---> EUC_JP
*/
-static void
-mic2euc_jp(const unsigned char *mic, unsigned char *p, int len)
+static int
+mic2euc_jp(const unsigned char *mic, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = mic;
int c1;
int l;
@@ -424,8 +477,12 @@ mic2euc_jp(const unsigned char *mic, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
*p++ = c1;
mic++;
len--;
@@ -433,8 +490,12 @@ mic2euc_jp(const unsigned char *mic, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
if (c1 == LC_JISX0201K)
{
*p++ = SS2;
@@ -452,20 +513,27 @@ mic2euc_jp(const unsigned char *mic, unsigned char *p, int len)
*p++ = mic[2];
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, PG_EUC_JP,
(const char *) mic, len);
+ }
mic += l;
len -= l;
}
*p = '\0';
+
+ return mic - start;
}
/*
* EUC_JP -> SJIS
*/
-static void
-euc_jp2sjis(const unsigned char *euc, unsigned char *p, int len)
+static int
+euc_jp2sjis(const unsigned char *euc, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = euc;
int c1,
c2,
k;
@@ -478,8 +546,12 @@ euc_jp2sjis(const unsigned char *euc, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JP,
(const char *) euc, len);
+ }
*p++ = c1;
euc++;
len--;
@@ -487,8 +559,12 @@ euc_jp2sjis(const unsigned char *euc, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_EUC_JP, (const char *) euc, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_JP,
(const char *) euc, len);
+ }
if (c1 == SS2)
{
/* hankaku kana? */
@@ -551,14 +627,17 @@ euc_jp2sjis(const unsigned char *euc, unsigned char *p, int len)
len -= l;
}
*p = '\0';
+
+ return euc - start;
}
/*
* SJIS ---> EUC_JP
*/
-static void
-sjis2euc_jp(const unsigned char *sjis, unsigned char *p, int len)
+static int
+sjis2euc_jp(const unsigned char *sjis, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = sjis;
int c1,
c2,
i,
@@ -573,8 +652,12 @@ sjis2euc_jp(const unsigned char *sjis, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SJIS,
(const char *) sjis, len);
+ }
*p++ = c1;
sjis++;
len--;
@@ -582,8 +665,12 @@ sjis2euc_jp(const unsigned char *sjis, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_SJIS, (const char *) sjis, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_SJIS,
(const char *) sjis, len);
+ }
if (c1 >= 0xa1 && c1 <= 0xdf)
{
/* JIS X0201 (1 byte kana) */
@@ -680,4 +767,6 @@ sjis2euc_jp(const unsigned char *sjis, unsigned char *p, int len)
len -= l;
}
*p = '\0';
+
+ return sjis - start;
}
diff --git a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
index 600c5cbc5cd..5a44262834a 100644
--- a/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/euc_kr_and_mic/euc_kr_and_mic.c
@@ -26,13 +26,16 @@ PG_FUNCTION_INFO_V1(mic_to_euc_kr);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
-static void euc_kr2mic(const unsigned char *euc, unsigned char *p, int len);
-static void mic2euc_kr(const unsigned char *mic, unsigned char *p, int len);
+static int euc_kr2mic(const unsigned char *euc, unsigned char *p, int len, bool noError);
+static int mic2euc_kr(const unsigned char *mic, unsigned char *p, int len, bool noError);
Datum
euc_kr_to_mic(PG_FUNCTION_ARGS)
@@ -40,12 +43,14 @@ euc_kr_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_KR, PG_MULE_INTERNAL);
- euc_kr2mic(src, dest, len);
+ converted = euc_kr2mic(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -54,20 +59,23 @@ mic_to_euc_kr(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_EUC_KR);
- mic2euc_kr(src, dest, len);
+ converted = mic2euc_kr(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
/*
* EUC_KR ---> MIC
*/
-static void
-euc_kr2mic(const unsigned char *euc, unsigned char *p, int len)
+static int
+euc_kr2mic(const unsigned char *euc, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = euc;
int c1;
int l;
@@ -78,8 +86,12 @@ euc_kr2mic(const unsigned char *euc, unsigned char *p, int len)
{
l = pg_encoding_verifymbchar(PG_EUC_KR, (const char *) euc, len);
if (l != 2)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_KR,
(const char *) euc, len);
+ }
*p++ = LC_KS5601;
*p++ = c1;
*p++ = euc[1];
@@ -89,22 +101,29 @@ euc_kr2mic(const unsigned char *euc, unsigned char *p, int len)
else
{ /* should be ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_KR,
(const char *) euc, len);
+ }
*p++ = c1;
euc++;
len--;
}
}
*p = '\0';
+
+ return euc - start;
}
/*
* MIC ---> EUC_KR
*/
-static void
-mic2euc_kr(const unsigned char *mic, unsigned char *p, int len)
+static int
+mic2euc_kr(const unsigned char *mic, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = mic;
int c1;
int l;
@@ -115,8 +134,12 @@ mic2euc_kr(const unsigned char *mic, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
*p++ = c1;
mic++;
len--;
@@ -124,18 +147,28 @@ mic2euc_kr(const unsigned char *mic, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
if (c1 == LC_KS5601)
{
*p++ = mic[1];
*p++ = mic[2];
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, PG_EUC_KR,
(const char *) mic, len);
+ }
mic += l;
len -= l;
}
*p = '\0';
+
+ return mic - start;
}
diff --git a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
index 7794d7ef8bf..bf87335c6a0 100644
--- a/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
@@ -32,17 +32,20 @@ PG_FUNCTION_INFO_V1(mic_to_big5);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
-static void euc_tw2big5(const unsigned char *euc, unsigned char *p, int len);
-static void big52euc_tw(const unsigned char *euc, unsigned char *p, int len);
-static void big52mic(const unsigned char *big5, unsigned char *p, int len);
-static void mic2big5(const unsigned char *mic, unsigned char *p, int len);
-static void euc_tw2mic(const unsigned char *euc, unsigned char *p, int len);
-static void mic2euc_tw(const unsigned char *mic, unsigned char *p, int len);
+static int euc_tw2big5(const unsigned char *euc, unsigned char *p, int len, bool noError);
+static int big52euc_tw(const unsigned char *euc, unsigned char *p, int len, bool noError);
+static int big52mic(const unsigned char *big5, unsigned char *p, int len, bool noError);
+static int mic2big5(const unsigned char *mic, unsigned char *p, int len, bool noError);
+static int euc_tw2mic(const unsigned char *euc, unsigned char *p, int len, bool noError);
+static int mic2euc_tw(const unsigned char *mic, unsigned char *p, int len, bool noError);
Datum
euc_tw_to_big5(PG_FUNCTION_ARGS)
@@ -50,12 +53,14 @@ euc_tw_to_big5(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_TW, PG_BIG5);
- euc_tw2big5(src, dest, len);
+ converted = euc_tw2big5(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -64,12 +69,14 @@ big5_to_euc_tw(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_BIG5, PG_EUC_TW);
- big52euc_tw(src, dest, len);
+ converted = big52euc_tw(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -78,12 +85,14 @@ euc_tw_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_TW, PG_MULE_INTERNAL);
- euc_tw2mic(src, dest, len);
+ converted = euc_tw2mic(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -92,12 +101,14 @@ mic_to_euc_tw(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_EUC_TW);
- mic2euc_tw(src, dest, len);
+ converted = mic2euc_tw(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -106,12 +117,14 @@ big5_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_BIG5, PG_MULE_INTERNAL);
- big52mic(src, dest, len);
+ converted = big52mic(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -120,21 +133,24 @@ mic_to_big5(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_BIG5);
- mic2big5(src, dest, len);
+ converted = mic2big5(src, dest, len, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
/*
* EUC_TW ---> Big5
*/
-static void
-euc_tw2big5(const unsigned char *euc, unsigned char *p, int len)
+static int
+euc_tw2big5(const unsigned char *euc, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = euc;
unsigned char c1;
unsigned short big5buf,
cnsBuf;
@@ -149,8 +165,12 @@ euc_tw2big5(const unsigned char *euc, unsigned char *p, int len)
/* Verify and decode the next EUC_TW input character */
l = pg_encoding_verifymbchar(PG_EUC_TW, (const char *) euc, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_TW,
(const char *) euc, len);
+ }
if (c1 == SS2)
{
c1 = euc[1]; /* plane No. */
@@ -171,8 +191,12 @@ euc_tw2big5(const unsigned char *euc, unsigned char *p, int len)
/* Write it out in Big5 */
big5buf = CNStoBIG5(cnsBuf, lc);
if (big5buf == 0)
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_EUC_TW, PG_BIG5,
(const char *) euc, len);
+ }
*p++ = (big5buf >> 8) & 0x00ff;
*p++ = big5buf & 0x00ff;
@@ -182,23 +206,30 @@ euc_tw2big5(const unsigned char *euc, unsigned char *p, int len)
else
{ /* should be ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_TW,
(const char *) euc, len);
+ }
*p++ = c1;
euc++;
len--;
}
}
*p = '\0';
+
+ return euc - start;
}
/*
* Big5 ---> EUC_TW
*/
-static void
-big52euc_tw(const unsigned char *big5, unsigned char *p, int len)
+static int
+big52euc_tw(const unsigned char *big5, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = big5;
unsigned short c1;
unsigned short big5buf,
cnsBuf;
@@ -213,8 +244,12 @@ big52euc_tw(const unsigned char *big5, unsigned char *p, int len)
{
l = pg_encoding_verifymbchar(PG_BIG5, (const char *) big5, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_BIG5,
(const char *) big5, len);
+ }
big5buf = (c1 << 8) | big5[1];
cnsBuf = BIG5toCNS(big5buf, &lc);
@@ -238,8 +273,12 @@ big52euc_tw(const unsigned char *big5, unsigned char *p, int len)
*p++ = cnsBuf & 0x00ff;
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_BIG5, PG_EUC_TW,
(const char *) big5, len);
+ }
big5 += l;
len -= l;
@@ -257,15 +296,18 @@ big52euc_tw(const unsigned char *big5, unsigned char *p, int len)
}
}
*p = '\0';
+
+ return big5 - start;
}
/*
* EUC_TW ---> MIC
*/
-static void
-euc_tw2mic(const unsigned char *euc, unsigned char *p, int len)
+static int
+euc_tw2mic(const unsigned char *euc, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = euc;
int c1;
int l;
@@ -276,8 +318,12 @@ euc_tw2mic(const unsigned char *euc, unsigned char *p, int len)
{
l = pg_encoding_verifymbchar(PG_EUC_TW, (const char *) euc, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_TW,
(const char *) euc, len);
+ }
if (c1 == SS2)
{
c1 = euc[1]; /* plane No. */
@@ -306,22 +352,29 @@ euc_tw2mic(const unsigned char *euc, unsigned char *p, int len)
else
{ /* should be ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_EUC_TW,
(const char *) euc, len);
+ }
*p++ = c1;
euc++;
len--;
}
}
*p = '\0';
+
+ return euc - start;
}
/*
* MIC ---> EUC_TW
*/
-static void
-mic2euc_tw(const unsigned char *mic, unsigned char *p, int len)
+static int
+mic2euc_tw(const unsigned char *mic, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = mic;
int c1;
int l;
@@ -332,8 +385,12 @@ mic2euc_tw(const unsigned char *mic, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
*p++ = c1;
mic++;
len--;
@@ -341,8 +398,12 @@ mic2euc_tw(const unsigned char *mic, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
if (c1 == LC_CNS11643_1)
{
*p++ = mic[1];
@@ -364,20 +425,27 @@ mic2euc_tw(const unsigned char *mic, unsigned char *p, int len)
*p++ = mic[3];
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, PG_EUC_TW,
(const char *) mic, len);
+ }
mic += l;
len -= l;
}
*p = '\0';
+
+ return mic - start;
}
/*
* Big5 ---> MIC
*/
-static void
-big52mic(const unsigned char *big5, unsigned char *p, int len)
+static int
+big52mic(const unsigned char *big5, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = big5;
unsigned short c1;
unsigned short big5buf,
cnsBuf;
@@ -391,8 +459,12 @@ big52mic(const unsigned char *big5, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_BIG5,
(const char *) big5, len);
+ }
*p++ = c1;
big5++;
len--;
@@ -400,8 +472,12 @@ big52mic(const unsigned char *big5, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_BIG5, (const char *) big5, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_BIG5,
(const char *) big5, len);
+ }
big5buf = (c1 << 8) | big5[1];
cnsBuf = BIG5toCNS(big5buf, &lc);
if (lc != 0)
@@ -414,20 +490,27 @@ big52mic(const unsigned char *big5, unsigned char *p, int len)
*p++ = cnsBuf & 0x00ff;
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_BIG5, PG_MULE_INTERNAL,
(const char *) big5, len);
+ }
big5 += l;
len -= l;
}
*p = '\0';
+
+ return big5 - start;
}
/*
* MIC ---> Big5
*/
-static void
-mic2big5(const unsigned char *mic, unsigned char *p, int len)
+static int
+mic2big5(const unsigned char *mic, unsigned char *p, int len, bool noError)
{
+ const unsigned char *start = mic;
unsigned short c1;
unsigned short big5buf,
cnsBuf;
@@ -440,8 +523,12 @@ mic2big5(const unsigned char *mic, unsigned char *p, int len)
{
/* ASCII */
if (c1 == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
*p++ = c1;
mic++;
len--;
@@ -449,8 +536,12 @@ mic2big5(const unsigned char *mic, unsigned char *p, int len)
}
l = pg_encoding_verifymbchar(PG_MULE_INTERNAL, (const char *) mic, len);
if (l < 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_MULE_INTERNAL,
(const char *) mic, len);
+ }
if (c1 == LC_CNS11643_1 || c1 == LC_CNS11643_2 || c1 == LCPRV2_B)
{
if (c1 == LCPRV2_B)
@@ -464,16 +555,26 @@ mic2big5(const unsigned char *mic, unsigned char *p, int len)
}
big5buf = CNStoBIG5(cnsBuf, c1);
if (big5buf == 0)
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, PG_BIG5,
(const char *) mic, len);
+ }
*p++ = (big5buf >> 8) & 0x00ff;
*p++ = big5buf & 0x00ff;
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_MULE_INTERNAL, PG_BIG5,
(const char *) mic, len);
+ }
mic += l;
len -= l;
}
*p = '\0';
+
+ return mic - start;
}
diff --git a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
index f424f881459..8752dcc09ac 100644
--- a/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
+++ b/src/backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c
@@ -30,8 +30,11 @@ PG_FUNCTION_INFO_V1(win1250_to_latin2);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -82,12 +85,14 @@ latin2_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_LATIN2, PG_MULE_INTERNAL);
- latin2mic(src, dest, len, LC_ISO8859_2, PG_LATIN2);
+ converted = latin2mic(src, dest, len, LC_ISO8859_2, PG_LATIN2, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -96,12 +101,14 @@ mic_to_latin2(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_LATIN2);
- mic2latin(src, dest, len, LC_ISO8859_2, PG_LATIN2);
+ converted = mic2latin(src, dest, len, LC_ISO8859_2, PG_LATIN2, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -110,13 +117,15 @@ win1250_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN1250, PG_MULE_INTERNAL);
- latin2mic_with_table(src, dest, len, LC_ISO8859_2, PG_WIN1250,
- win1250_2_iso88592);
+ converted = latin2mic_with_table(src, dest, len, LC_ISO8859_2, PG_WIN1250,
+ win1250_2_iso88592, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -125,13 +134,15 @@ mic_to_win1250(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_WIN1250);
- mic2latin_with_table(src, dest, len, LC_ISO8859_2, PG_WIN1250,
- iso88592_2_win1250);
+ converted = mic2latin_with_table(src, dest, len, LC_ISO8859_2, PG_WIN1250,
+ iso88592_2_win1250, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -140,12 +151,15 @@ latin2_to_win1250(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_LATIN2, PG_WIN1250);
- local2local(src, dest, len, PG_LATIN2, PG_WIN1250, iso88592_2_win1250);
+ converted = local2local(src, dest, len, PG_LATIN2, PG_WIN1250,
+ iso88592_2_win1250, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -154,10 +168,13 @@ win1250_to_latin2(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_WIN1250, PG_LATIN2);
- local2local(src, dest, len, PG_WIN1250, PG_LATIN2, win1250_2_iso88592);
+ converted = local2local(src, dest, len, PG_WIN1250, PG_LATIN2,
+ win1250_2_iso88592, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
index a358a707c11..431971c40cb 100644
--- a/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
+++ b/src/backend/utils/mb/conversion_procs/latin_and_mic/latin_and_mic.c
@@ -30,8 +30,11 @@ PG_FUNCTION_INFO_V1(mic_to_latin4);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -42,12 +45,14 @@ latin1_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_LATIN1, PG_MULE_INTERNAL);
- latin2mic(src, dest, len, LC_ISO8859_1, PG_LATIN1);
+ converted = latin2mic(src, dest, len, LC_ISO8859_1, PG_LATIN1, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,12 +61,14 @@ mic_to_latin1(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_LATIN1);
- mic2latin(src, dest, len, LC_ISO8859_1, PG_LATIN1);
+ converted = mic2latin(src, dest, len, LC_ISO8859_1, PG_LATIN1, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -70,12 +77,14 @@ latin3_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_LATIN3, PG_MULE_INTERNAL);
- latin2mic(src, dest, len, LC_ISO8859_3, PG_LATIN3);
+ converted = latin2mic(src, dest, len, LC_ISO8859_3, PG_LATIN3, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -84,12 +93,14 @@ mic_to_latin3(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_LATIN3);
- mic2latin(src, dest, len, LC_ISO8859_3, PG_LATIN3);
+ converted = mic2latin(src, dest, len, LC_ISO8859_3, PG_LATIN3, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -98,12 +109,14 @@ latin4_to_mic(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_LATIN4, PG_MULE_INTERNAL);
- latin2mic(src, dest, len, LC_ISO8859_4, PG_LATIN4);
+ converted = latin2mic(src, dest, len, LC_ISO8859_4, PG_LATIN4, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -112,10 +125,12 @@ mic_to_latin4(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_MULE_INTERNAL, PG_LATIN4);
- mic2latin(src, dest, len, LC_ISO8859_4, PG_LATIN4);
+ converted = mic2latin(src, dest, len, LC_ISO8859_4, PG_LATIN4, noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
index 75ed49ac54e..e45c7718945 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_big5/utf8_and_big5.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_big5);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ big5_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_BIG5, PG_UTF8);
- LocalToUtf(src, len, dest,
- &big5_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_BIG5);
+ converted = LocalToUtf(src, len, dest,
+ &big5_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_BIG5,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_big5(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_BIG5);
- UtfToLocal(src, len, dest,
- &big5_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_BIG5);
+ converted = UtfToLocal(src, len, dest,
+ &big5_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_BIG5,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
index 90ad316111a..e8303f38b68 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/utf8_and_cyrillic.c
@@ -33,8 +33,11 @@ PG_FUNCTION_INFO_V1(koi8u_to_utf8);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -44,16 +47,19 @@ utf8_to_koi8r(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_KOI8R);
- UtfToLocal(src, len, dest,
- &koi8r_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_KOI8R);
+ converted = UtfToLocal(src, len, dest,
+ &koi8r_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_KOI8R,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -62,16 +68,19 @@ koi8r_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_KOI8R, PG_UTF8);
- LocalToUtf(src, len, dest,
- &koi8r_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_KOI8R);
+ converted = LocalToUtf(src, len, dest,
+ &koi8r_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_KOI8R,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -80,16 +89,19 @@ utf8_to_koi8u(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_KOI8U);
- UtfToLocal(src, len, dest,
- &koi8u_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_KOI8U);
+ converted = UtfToLocal(src, len, dest,
+ &koi8u_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_KOI8U,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -98,14 +110,17 @@ koi8u_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_KOI8U, PG_UTF8);
- LocalToUtf(src, len, dest,
- &koi8u_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_KOI8U);
+ converted = LocalToUtf(src, len, dest,
+ &koi8u_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_KOI8U,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
index 018312489cb..d2d9c44b3ff 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc2004/utf8_and_euc2004.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_euc_jis_2004);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ euc_jis_2004_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_JIS_2004, PG_UTF8);
- LocalToUtf(src, len, dest,
- &euc_jis_2004_to_unicode_tree,
- LUmapEUC_JIS_2004_combined, lengthof(LUmapEUC_JIS_2004_combined),
- NULL,
- PG_EUC_JIS_2004);
+ converted = LocalToUtf(src, len, dest,
+ &euc_jis_2004_to_unicode_tree,
+ LUmapEUC_JIS_2004_combined, lengthof(LUmapEUC_JIS_2004_combined),
+ NULL,
+ PG_EUC_JIS_2004,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_euc_jis_2004(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_EUC_JIS_2004);
- UtfToLocal(src, len, dest,
- &euc_jis_2004_from_unicode_tree,
- ULmapEUC_JIS_2004_combined, lengthof(ULmapEUC_JIS_2004_combined),
- NULL,
- PG_EUC_JIS_2004);
+ converted = UtfToLocal(src, len, dest,
+ &euc_jis_2004_from_unicode_tree,
+ ULmapEUC_JIS_2004_combined, lengthof(ULmapEUC_JIS_2004_combined),
+ NULL,
+ PG_EUC_JIS_2004,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
index 62182a9ba8b..9892db0d102 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/utf8_and_euc_cn.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_euc_cn);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ euc_cn_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_CN, PG_UTF8);
- LocalToUtf(src, len, dest,
- &euc_cn_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_CN);
+ converted = LocalToUtf(src, len, dest,
+ &euc_cn_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_CN,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_euc_cn(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_EUC_CN);
- UtfToLocal(src, len, dest,
- &euc_cn_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_CN);
+ converted = UtfToLocal(src, len, dest,
+ &euc_cn_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_CN,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
index dc5abb5dfd4..88ea32b74ba 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/utf8_and_euc_jp.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_euc_jp);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ euc_jp_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_JP, PG_UTF8);
- LocalToUtf(src, len, dest,
- &euc_jp_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_JP);
+ converted = LocalToUtf(src, len, dest,
+ &euc_jp_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_JP,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_euc_jp(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_EUC_JP);
- UtfToLocal(src, len, dest,
- &euc_jp_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_JP);
+ converted = UtfToLocal(src, len, dest,
+ &euc_jp_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_JP,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
index 088a38d8390..11dee117f47 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/utf8_and_euc_kr.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_euc_kr);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ euc_kr_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_KR, PG_UTF8);
- LocalToUtf(src, len, dest,
- &euc_kr_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_KR);
+ converted = LocalToUtf(src, len, dest,
+ &euc_kr_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_KR,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_euc_kr(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_EUC_KR);
- UtfToLocal(src, len, dest,
- &euc_kr_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_KR);
+ converted = UtfToLocal(src, len, dest,
+ &euc_kr_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_KR,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
index a9fe94f88b8..29c03512819 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/utf8_and_euc_tw.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_euc_tw);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ euc_tw_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_EUC_TW, PG_UTF8);
- LocalToUtf(src, len, dest,
- &euc_tw_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_TW);
+ converted = LocalToUtf(src, len, dest,
+ &euc_tw_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_TW,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_euc_tw(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_EUC_TW);
- UtfToLocal(src, len, dest,
- &euc_tw_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_EUC_TW);
+ converted = UtfToLocal(src, len, dest,
+ &euc_tw_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_EUC_TW,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
index 96909b58859..72677fa6d40 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gb18030/utf8_and_gb18030.c
@@ -183,8 +183,11 @@ conv_utf8_to_18030(uint32 code)
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -193,16 +196,19 @@ gb18030_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_GB18030, PG_UTF8);
- LocalToUtf(src, len, dest,
- &gb18030_to_unicode_tree,
- NULL, 0,
- conv_18030_to_utf8,
- PG_GB18030);
+ converted = LocalToUtf(src, len, dest,
+ &gb18030_to_unicode_tree,
+ NULL, 0,
+ conv_18030_to_utf8,
+ PG_GB18030,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -211,14 +217,17 @@ utf8_to_gb18030(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_GB18030);
- UtfToLocal(src, len, dest,
- &gb18030_from_unicode_tree,
- NULL, 0,
- conv_utf8_to_18030,
- PG_GB18030);
+ converted = UtfToLocal(src, len, dest,
+ &gb18030_from_unicode_tree,
+ NULL, 0,
+ conv_utf8_to_18030,
+ PG_GB18030,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
index 78bbcd3ce7d..057bc65e521 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_gbk/utf8_and_gbk.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_gbk);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ gbk_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_GBK, PG_UTF8);
- LocalToUtf(src, len, dest,
- &gbk_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_GBK);
+ converted = LocalToUtf(src, len, dest,
+ &gbk_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_GBK,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_gbk(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_GBK);
- UtfToLocal(src, len, dest,
- &gbk_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_GBK);
+ converted = UtfToLocal(src, len, dest,
+ &gbk_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_GBK,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
index 348524f4a2c..d16b6fe31d8 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
@@ -52,8 +52,11 @@ PG_FUNCTION_INFO_V1(utf8_to_iso8859);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -100,6 +103,7 @@ iso8859_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
int i;
CHECK_ENCODING_CONVERSION_ARGS(-1, PG_UTF8);
@@ -108,12 +112,15 @@ iso8859_to_utf8(PG_FUNCTION_ARGS)
{
if (encoding == maps[i].encoding)
{
- LocalToUtf(src, len, dest,
- maps[i].map1,
- NULL, 0,
- NULL,
- encoding);
- PG_RETURN_VOID();
+ int converted;
+
+ converted = LocalToUtf(src, len, dest,
+ maps[i].map1,
+ NULL, 0,
+ NULL,
+ encoding,
+ noError);
+ PG_RETURN_INT32(converted);
}
}
@@ -122,7 +129,7 @@ iso8859_to_utf8(PG_FUNCTION_ARGS)
errmsg("unexpected encoding ID %d for ISO 8859 character sets",
encoding)));
- PG_RETURN_VOID();
+ PG_RETURN_INT32(0);
}
Datum
@@ -132,6 +139,7 @@ utf8_to_iso8859(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
int i;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, -1);
@@ -140,12 +148,15 @@ utf8_to_iso8859(PG_FUNCTION_ARGS)
{
if (encoding == maps[i].encoding)
{
- UtfToLocal(src, len, dest,
- maps[i].map2,
- NULL, 0,
- NULL,
- encoding);
- PG_RETURN_VOID();
+ int converted;
+
+ converted = UtfToLocal(src, len, dest,
+ maps[i].map2,
+ NULL, 0,
+ NULL,
+ encoding,
+ noError);
+ PG_RETURN_INT32(converted);
}
}
@@ -154,5 +165,5 @@ utf8_to_iso8859(PG_FUNCTION_ARGS)
errmsg("unexpected encoding ID %d for ISO 8859 character sets",
encoding)));
- PG_RETURN_VOID();
+ PG_RETURN_INT32(0);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
index 2cdca9f780d..0bc08829657 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
@@ -26,8 +26,11 @@ PG_FUNCTION_INFO_V1(utf8_to_iso8859_1);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -37,6 +40,8 @@ iso8859_1_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ unsigned char *start = src;
unsigned short c;
CHECK_ENCODING_CONVERSION_ARGS(PG_LATIN1, PG_UTF8);
@@ -45,7 +50,11 @@ iso8859_1_to_utf8(PG_FUNCTION_ARGS)
{
c = *src;
if (c == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_LATIN1, (const char *) src, len);
+ }
if (!IS_HIGHBIT_SET(c))
*dest++ = c;
else
@@ -58,7 +67,7 @@ iso8859_1_to_utf8(PG_FUNCTION_ARGS)
}
*dest = '\0';
- PG_RETURN_VOID();
+ PG_RETURN_INT32(src - start);
}
Datum
@@ -67,6 +76,8 @@ utf8_to_iso8859_1(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ unsigned char *start = src;
unsigned short c,
c1;
@@ -76,7 +87,11 @@ utf8_to_iso8859_1(PG_FUNCTION_ARGS)
{
c = *src;
if (c == 0)
+ {
+ if (noError)
+ break;
report_invalid_encoding(PG_UTF8, (const char *) src, len);
+ }
/* fast path for ASCII-subset characters */
if (!IS_HIGHBIT_SET(c))
{
@@ -102,11 +117,15 @@ utf8_to_iso8859_1(PG_FUNCTION_ARGS)
len -= 2;
}
else
+ {
+ if (noError)
+ break;
report_untranslatable_char(PG_UTF8, PG_LATIN1,
(const char *) src, len);
+ }
}
}
*dest = '\0';
- PG_RETURN_VOID();
+ PG_RETURN_INT32(src - start);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
index e09a7c8e41e..a760ab54ab6 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_johab/utf8_and_johab.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_johab);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ johab_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_JOHAB, PG_UTF8);
- LocalToUtf(src, len, dest,
- &johab_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_JOHAB);
+ converted = LocalToUtf(src, len, dest,
+ &johab_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_JOHAB,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_johab(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_JOHAB);
- UtfToLocal(src, len, dest,
- &johab_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_JOHAB);
+ converted = UtfToLocal(src, len, dest,
+ &johab_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_JOHAB,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
index c56fa80a4bb..23892790730 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis/utf8_and_sjis.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_sjis);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ sjis_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_SJIS, PG_UTF8);
- LocalToUtf(src, len, dest,
- &sjis_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_SJIS);
+ converted = LocalToUtf(src, len, dest,
+ &sjis_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_SJIS,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_sjis(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_SJIS);
- UtfToLocal(src, len, dest,
- &sjis_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_SJIS);
+ converted = UtfToLocal(src, len, dest,
+ &sjis_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_SJIS,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
index 458500998d4..94930659347 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/utf8_and_sjis2004.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_shift_jis_2004);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ shift_jis_2004_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_SHIFT_JIS_2004, PG_UTF8);
- LocalToUtf(src, len, dest,
- &shift_jis_2004_to_unicode_tree,
- LUmapSHIFT_JIS_2004_combined, lengthof(LUmapSHIFT_JIS_2004_combined),
- NULL,
- PG_SHIFT_JIS_2004);
+ converted = LocalToUtf(src, len, dest,
+ &shift_jis_2004_to_unicode_tree,
+ LUmapSHIFT_JIS_2004_combined, lengthof(LUmapSHIFT_JIS_2004_combined),
+ NULL,
+ PG_SHIFT_JIS_2004,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_shift_jis_2004(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_SHIFT_JIS_2004);
- UtfToLocal(src, len, dest,
- &shift_jis_2004_from_unicode_tree,
- ULmapSHIFT_JIS_2004_combined, lengthof(ULmapSHIFT_JIS_2004_combined),
- NULL,
- PG_SHIFT_JIS_2004);
+ converted = UtfToLocal(src, len, dest,
+ &shift_jis_2004_from_unicode_tree,
+ ULmapSHIFT_JIS_2004_combined, lengthof(ULmapSHIFT_JIS_2004_combined),
+ NULL,
+ PG_SHIFT_JIS_2004,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
index 3226ed03258..dfdc0dbfa2f 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_uhc/utf8_and_uhc.c
@@ -28,8 +28,11 @@ PG_FUNCTION_INFO_V1(utf8_to_uhc);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
Datum
@@ -38,16 +41,19 @@ uhc_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UHC, PG_UTF8);
- LocalToUtf(src, len, dest,
- &uhc_to_unicode_tree,
- NULL, 0,
- NULL,
- PG_UHC);
+ converted = LocalToUtf(src, len, dest,
+ &uhc_to_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_UHC,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
Datum
@@ -56,14 +62,17 @@ utf8_to_uhc(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
+ int converted;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, PG_UHC);
- UtfToLocal(src, len, dest,
- &uhc_from_unicode_tree,
- NULL, 0,
- NULL,
- PG_UHC);
+ converted = UtfToLocal(src, len, dest,
+ &uhc_from_unicode_tree,
+ NULL, 0,
+ NULL,
+ PG_UHC,
+ noError);
- PG_RETURN_VOID();
+ PG_RETURN_INT32(converted);
}
diff --git a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
index 1a0074d063c..8f046280029 100644
--- a/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
+++ b/src/backend/utils/mb/conversion_procs/utf8_and_win/utf8_and_win.c
@@ -48,8 +48,11 @@ PG_FUNCTION_INFO_V1(utf8_to_win);
* INTEGER, -- destination encoding id
* CSTRING, -- source string (null terminated C string)
* CSTRING, -- destination string (null terminated C string)
- * INTEGER -- source string length
- * ) returns VOID;
+ * INTEGER, -- source string length
+ * BOOL -- if true, don't throw an error if conversion fails
+ * ) returns INTEGER;
+ *
+ * Returns the number of bytes successfully converted.
* ----------
*/
@@ -81,6 +84,7 @@ win_to_utf8(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
int i;
CHECK_ENCODING_CONVERSION_ARGS(-1, PG_UTF8);
@@ -89,12 +93,15 @@ win_to_utf8(PG_FUNCTION_ARGS)
{
if (encoding == maps[i].encoding)
{
- LocalToUtf(src, len, dest,
- maps[i].map1,
- NULL, 0,
- NULL,
- encoding);
- PG_RETURN_VOID();
+ int converted;
+
+ converted = LocalToUtf(src, len, dest,
+ maps[i].map1,
+ NULL, 0,
+ NULL,
+ encoding,
+ noError);
+ PG_RETURN_INT32(converted);
}
}
@@ -103,7 +110,7 @@ win_to_utf8(PG_FUNCTION_ARGS)
errmsg("unexpected encoding ID %d for WIN character sets",
encoding)));
- PG_RETURN_VOID();
+ PG_RETURN_INT32(0);
}
Datum
@@ -113,6 +120,7 @@ utf8_to_win(PG_FUNCTION_ARGS)
unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
int len = PG_GETARG_INT32(4);
+ bool noError = PG_GETARG_BOOL(5);
int i;
CHECK_ENCODING_CONVERSION_ARGS(PG_UTF8, -1);
@@ -121,12 +129,15 @@ utf8_to_win(PG_FUNCTION_ARGS)
{
if (encoding == maps[i].encoding)
{
- UtfToLocal(src, len, dest,
- maps[i].map2,
- NULL, 0,
- NULL,
- encoding);
- PG_RETURN_VOID();
+ int converted;
+
+ converted = UtfToLocal(src, len, dest,
+ maps[i].map2,
+ NULL, 0,
+ NULL,
+ encoding,
+ noError);
+ PG_RETURN_INT32(converted);
}
}
@@ -135,5 +146,5 @@ utf8_to_win(PG_FUNCTION_ARGS)
errmsg("unexpected encoding ID %d for WIN character sets",
encoding)));
- PG_RETURN_VOID();
+ PG_RETURN_INT32(0);
}
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 67d1c4fc19f..a585e3a6f1e 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -406,12 +406,13 @@ pg_do_encoding_conversion(unsigned char *src, int len,
MemoryContextAllocHuge(CurrentMemoryContext,
(Size) len * MAX_CONVERSION_GROWTH + 1);
- OidFunctionCall5(proc,
- Int32GetDatum(src_encoding),
- Int32GetDatum(dest_encoding),
- CStringGetDatum(src),
- CStringGetDatum(result),
- Int32GetDatum(len));
+ (void) OidFunctionCall6(proc,
+ Int32GetDatum(src_encoding),
+ Int32GetDatum(dest_encoding),
+ CStringGetDatum(src),
+ CStringGetDatum(result),
+ Int32GetDatum(len),
+ BoolGetDatum(false));
/*
* If the result is large, it's worth repalloc'ing to release any extra
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e6c7b070f64..1dd5558b078 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10409,388 +10409,388 @@
# conversion functions
{ oid => '4302',
descr => 'internal conversion function for KOI8R to MULE_INTERNAL',
- proname => 'koi8r_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'koi8r_to_mic',
+ proname => 'koi8r_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'koi8r_to_mic',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4303',
descr => 'internal conversion function for MULE_INTERNAL to KOI8R',
- proname => 'mic_to_koi8r', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_koi8r',
+ proname => 'mic_to_koi8r', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_koi8r',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4304',
descr => 'internal conversion function for ISO-8859-5 to MULE_INTERNAL',
- proname => 'iso_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'iso_to_mic',
+ proname => 'iso_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'iso_to_mic',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4305',
descr => 'internal conversion function for MULE_INTERNAL to ISO-8859-5',
- proname => 'mic_to_iso', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_iso',
+ proname => 'mic_to_iso', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_iso',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4306',
descr => 'internal conversion function for WIN1251 to MULE_INTERNAL',
- proname => 'win1251_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'win1251_to_mic',
+ proname => 'win1251_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'win1251_to_mic',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4307',
descr => 'internal conversion function for MULE_INTERNAL to WIN1251',
- proname => 'mic_to_win1251', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_win1251',
+ proname => 'mic_to_win1251', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_win1251',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4308',
descr => 'internal conversion function for WIN866 to MULE_INTERNAL',
- proname => 'win866_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'win866_to_mic',
+ proname => 'win866_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'win866_to_mic',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4309',
descr => 'internal conversion function for MULE_INTERNAL to WIN866',
- proname => 'mic_to_win866', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_win866',
+ proname => 'mic_to_win866', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_win866',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4310', descr => 'internal conversion function for KOI8R to WIN1251',
- proname => 'koi8r_to_win1251', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'koi8r_to_win1251', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'koi8r_to_win1251', probin => '$libdir/cyrillic_and_mic' },
{ oid => '4311', descr => 'internal conversion function for WIN1251 to KOI8R',
- proname => 'win1251_to_koi8r', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'win1251_to_koi8r', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'win1251_to_koi8r', probin => '$libdir/cyrillic_and_mic' },
{ oid => '4312', descr => 'internal conversion function for KOI8R to WIN866',
- proname => 'koi8r_to_win866', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'koi8r_to_win866',
+ proname => 'koi8r_to_win866', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'koi8r_to_win866',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4313', descr => 'internal conversion function for WIN866 to KOI8R',
- proname => 'win866_to_koi8r', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'win866_to_koi8r',
+ proname => 'win866_to_koi8r', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'win866_to_koi8r',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4314',
descr => 'internal conversion function for WIN866 to WIN1251',
- proname => 'win866_to_win1251', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'win866_to_win1251', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'win866_to_win1251', probin => '$libdir/cyrillic_and_mic' },
{ oid => '4315',
descr => 'internal conversion function for WIN1251 to WIN866',
- proname => 'win1251_to_win866', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'win1251_to_win866', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'win1251_to_win866', probin => '$libdir/cyrillic_and_mic' },
{ oid => '4316',
descr => 'internal conversion function for ISO-8859-5 to KOI8R',
- proname => 'iso_to_koi8r', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'iso_to_koi8r',
+ proname => 'iso_to_koi8r', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'iso_to_koi8r',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4317',
descr => 'internal conversion function for KOI8R to ISO-8859-5',
- proname => 'koi8r_to_iso', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'koi8r_to_iso',
+ proname => 'koi8r_to_iso', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'koi8r_to_iso',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4318',
descr => 'internal conversion function for ISO-8859-5 to WIN1251',
- proname => 'iso_to_win1251', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'iso_to_win1251',
+ proname => 'iso_to_win1251', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'iso_to_win1251',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4319',
descr => 'internal conversion function for WIN1251 to ISO-8859-5',
- proname => 'win1251_to_iso', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'win1251_to_iso',
+ proname => 'win1251_to_iso', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'win1251_to_iso',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4320',
descr => 'internal conversion function for ISO-8859-5 to WIN866',
- proname => 'iso_to_win866', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'iso_to_win866',
+ proname => 'iso_to_win866', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'iso_to_win866',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4321',
descr => 'internal conversion function for WIN866 to ISO-8859-5',
- proname => 'win866_to_iso', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'win866_to_iso',
+ proname => 'win866_to_iso', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'win866_to_iso',
probin => '$libdir/cyrillic_and_mic' },
{ oid => '4322',
descr => 'internal conversion function for EUC_CN to MULE_INTERNAL',
- proname => 'euc_cn_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_cn_to_mic',
+ proname => 'euc_cn_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_cn_to_mic',
probin => '$libdir/euc_cn_and_mic' },
{ oid => '4323',
descr => 'internal conversion function for MULE_INTERNAL to EUC_CN',
- proname => 'mic_to_euc_cn', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_euc_cn',
+ proname => 'mic_to_euc_cn', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_euc_cn',
probin => '$libdir/euc_cn_and_mic' },
{ oid => '4324', descr => 'internal conversion function for EUC_JP to SJIS',
- proname => 'euc_jp_to_sjis', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_jp_to_sjis',
+ proname => 'euc_jp_to_sjis', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_jp_to_sjis',
probin => '$libdir/euc_jp_and_sjis' },
{ oid => '4325', descr => 'internal conversion function for SJIS to EUC_JP',
- proname => 'sjis_to_euc_jp', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'sjis_to_euc_jp',
+ proname => 'sjis_to_euc_jp', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'sjis_to_euc_jp',
probin => '$libdir/euc_jp_and_sjis' },
{ oid => '4326',
descr => 'internal conversion function for EUC_JP to MULE_INTERNAL',
- proname => 'euc_jp_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_jp_to_mic',
+ proname => 'euc_jp_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_jp_to_mic',
probin => '$libdir/euc_jp_and_sjis' },
{ oid => '4327',
descr => 'internal conversion function for SJIS to MULE_INTERNAL',
- proname => 'sjis_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'sjis_to_mic',
+ proname => 'sjis_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'sjis_to_mic',
probin => '$libdir/euc_jp_and_sjis' },
{ oid => '4328',
descr => 'internal conversion function for MULE_INTERNAL to EUC_JP',
- proname => 'mic_to_euc_jp', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_euc_jp',
+ proname => 'mic_to_euc_jp', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_euc_jp',
probin => '$libdir/euc_jp_and_sjis' },
{ oid => '4329',
descr => 'internal conversion function for MULE_INTERNAL to SJIS',
- proname => 'mic_to_sjis', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_sjis',
+ proname => 'mic_to_sjis', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_sjis',
probin => '$libdir/euc_jp_and_sjis' },
{ oid => '4330',
descr => 'internal conversion function for EUC_KR to MULE_INTERNAL',
- proname => 'euc_kr_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_kr_to_mic',
+ proname => 'euc_kr_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_kr_to_mic',
probin => '$libdir/euc_kr_and_mic' },
{ oid => '4331',
descr => 'internal conversion function for MULE_INTERNAL to EUC_KR',
- proname => 'mic_to_euc_kr', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_euc_kr',
+ proname => 'mic_to_euc_kr', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_euc_kr',
probin => '$libdir/euc_kr_and_mic' },
{ oid => '4332', descr => 'internal conversion function for EUC_TW to BIG5',
- proname => 'euc_tw_to_big5', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_tw_to_big5',
+ proname => 'euc_tw_to_big5', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_tw_to_big5',
probin => '$libdir/euc_tw_and_big5' },
{ oid => '4333', descr => 'internal conversion function for BIG5 to EUC_TW',
- proname => 'big5_to_euc_tw', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'big5_to_euc_tw',
+ proname => 'big5_to_euc_tw', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'big5_to_euc_tw',
probin => '$libdir/euc_tw_and_big5' },
{ oid => '4334',
descr => 'internal conversion function for EUC_TW to MULE_INTERNAL',
- proname => 'euc_tw_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_tw_to_mic',
+ proname => 'euc_tw_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_tw_to_mic',
probin => '$libdir/euc_tw_and_big5' },
{ oid => '4335',
descr => 'internal conversion function for BIG5 to MULE_INTERNAL',
- proname => 'big5_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'big5_to_mic',
+ proname => 'big5_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'big5_to_mic',
probin => '$libdir/euc_tw_and_big5' },
{ oid => '4336',
descr => 'internal conversion function for MULE_INTERNAL to EUC_TW',
- proname => 'mic_to_euc_tw', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_euc_tw',
+ proname => 'mic_to_euc_tw', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_euc_tw',
probin => '$libdir/euc_tw_and_big5' },
{ oid => '4337',
descr => 'internal conversion function for MULE_INTERNAL to BIG5',
- proname => 'mic_to_big5', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_big5',
+ proname => 'mic_to_big5', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_big5',
probin => '$libdir/euc_tw_and_big5' },
{ oid => '4338',
descr => 'internal conversion function for LATIN2 to MULE_INTERNAL',
- proname => 'latin2_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'latin2_to_mic',
+ proname => 'latin2_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'latin2_to_mic',
probin => '$libdir/latin2_and_win1250' },
{ oid => '4339',
descr => 'internal conversion function for MULE_INTERNAL to LATIN2',
- proname => 'mic_to_latin2', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_latin2',
+ proname => 'mic_to_latin2', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_latin2',
probin => '$libdir/latin2_and_win1250' },
{ oid => '4340',
descr => 'internal conversion function for WIN1250 to MULE_INTERNAL',
- proname => 'win1250_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'win1250_to_mic',
+ proname => 'win1250_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'win1250_to_mic',
probin => '$libdir/latin2_and_win1250' },
{ oid => '4341',
descr => 'internal conversion function for MULE_INTERNAL to WIN1250',
- proname => 'mic_to_win1250', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_win1250',
+ proname => 'mic_to_win1250', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_win1250',
probin => '$libdir/latin2_and_win1250' },
{ oid => '4342',
descr => 'internal conversion function for LATIN2 to WIN1250',
- proname => 'latin2_to_win1250', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'latin2_to_win1250', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'latin2_to_win1250', probin => '$libdir/latin2_and_win1250' },
{ oid => '4343',
descr => 'internal conversion function for WIN1250 to LATIN2',
- proname => 'win1250_to_latin2', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'win1250_to_latin2', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'win1250_to_latin2', probin => '$libdir/latin2_and_win1250' },
{ oid => '4344',
descr => 'internal conversion function for LATIN1 to MULE_INTERNAL',
- proname => 'latin1_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'latin1_to_mic',
+ proname => 'latin1_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'latin1_to_mic',
probin => '$libdir/latin_and_mic' },
{ oid => '4345',
descr => 'internal conversion function for MULE_INTERNAL to LATIN1',
- proname => 'mic_to_latin1', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_latin1',
+ proname => 'mic_to_latin1', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_latin1',
probin => '$libdir/latin_and_mic' },
{ oid => '4346',
descr => 'internal conversion function for LATIN3 to MULE_INTERNAL',
- proname => 'latin3_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'latin3_to_mic',
+ proname => 'latin3_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'latin3_to_mic',
probin => '$libdir/latin_and_mic' },
{ oid => '4347',
descr => 'internal conversion function for MULE_INTERNAL to LATIN3',
- proname => 'mic_to_latin3', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_latin3',
+ proname => 'mic_to_latin3', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_latin3',
probin => '$libdir/latin_and_mic' },
{ oid => '4348',
descr => 'internal conversion function for LATIN4 to MULE_INTERNAL',
- proname => 'latin4_to_mic', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'latin4_to_mic',
+ proname => 'latin4_to_mic', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'latin4_to_mic',
probin => '$libdir/latin_and_mic' },
{ oid => '4349',
descr => 'internal conversion function for MULE_INTERNAL to LATIN4',
- proname => 'mic_to_latin4', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'mic_to_latin4',
+ proname => 'mic_to_latin4', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'mic_to_latin4',
probin => '$libdir/latin_and_mic' },
{ oid => '4352', descr => 'internal conversion function for BIG5 to UTF8',
- proname => 'big5_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'big5_to_utf8',
+ proname => 'big5_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'big5_to_utf8',
probin => '$libdir/utf8_and_big5' },
{ oid => '4353', descr => 'internal conversion function for UTF8 to BIG5',
- proname => 'utf8_to_big5', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_big5',
+ proname => 'utf8_to_big5', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_big5',
probin => '$libdir/utf8_and_big5' },
{ oid => '4354', descr => 'internal conversion function for UTF8 to KOI8R',
- proname => 'utf8_to_koi8r', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_koi8r',
+ proname => 'utf8_to_koi8r', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_koi8r',
probin => '$libdir/utf8_and_cyrillic' },
{ oid => '4355', descr => 'internal conversion function for KOI8R to UTF8',
- proname => 'koi8r_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'koi8r_to_utf8',
+ proname => 'koi8r_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'koi8r_to_utf8',
probin => '$libdir/utf8_and_cyrillic' },
{ oid => '4356', descr => 'internal conversion function for UTF8 to KOI8U',
- proname => 'utf8_to_koi8u', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_koi8u',
+ proname => 'utf8_to_koi8u', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_koi8u',
probin => '$libdir/utf8_and_cyrillic' },
{ oid => '4357', descr => 'internal conversion function for KOI8U to UTF8',
- proname => 'koi8u_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'koi8u_to_utf8',
+ proname => 'koi8u_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'koi8u_to_utf8',
probin => '$libdir/utf8_and_cyrillic' },
{ oid => '4358', descr => 'internal conversion function for UTF8 to WIN',
- proname => 'utf8_to_win', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_win',
+ proname => 'utf8_to_win', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_win',
probin => '$libdir/utf8_and_win' },
{ oid => '4359', descr => 'internal conversion function for WIN to UTF8',
- proname => 'win_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'win_to_utf8',
+ proname => 'win_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'win_to_utf8',
probin => '$libdir/utf8_and_win' },
{ oid => '4360', descr => 'internal conversion function for EUC_CN to UTF8',
- proname => 'euc_cn_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_cn_to_utf8',
+ proname => 'euc_cn_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_cn_to_utf8',
probin => '$libdir/utf8_and_euc_cn' },
{ oid => '4361', descr => 'internal conversion function for UTF8 to EUC_CN',
- proname => 'utf8_to_euc_cn', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_euc_cn',
+ proname => 'utf8_to_euc_cn', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_euc_cn',
probin => '$libdir/utf8_and_euc_cn' },
{ oid => '4362', descr => 'internal conversion function for EUC_JP to UTF8',
- proname => 'euc_jp_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_jp_to_utf8',
+ proname => 'euc_jp_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_jp_to_utf8',
probin => '$libdir/utf8_and_euc_jp' },
{ oid => '4363', descr => 'internal conversion function for UTF8 to EUC_JP',
- proname => 'utf8_to_euc_jp', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_euc_jp',
+ proname => 'utf8_to_euc_jp', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_euc_jp',
probin => '$libdir/utf8_and_euc_jp' },
{ oid => '4364', descr => 'internal conversion function for EUC_KR to UTF8',
- proname => 'euc_kr_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_kr_to_utf8',
+ proname => 'euc_kr_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_kr_to_utf8',
probin => '$libdir/utf8_and_euc_kr' },
{ oid => '4365', descr => 'internal conversion function for UTF8 to EUC_KR',
- proname => 'utf8_to_euc_kr', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_euc_kr',
+ proname => 'utf8_to_euc_kr', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_euc_kr',
probin => '$libdir/utf8_and_euc_kr' },
{ oid => '4366', descr => 'internal conversion function for EUC_TW to UTF8',
- proname => 'euc_tw_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'euc_tw_to_utf8',
+ proname => 'euc_tw_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'euc_tw_to_utf8',
probin => '$libdir/utf8_and_euc_tw' },
{ oid => '4367', descr => 'internal conversion function for UTF8 to EUC_TW',
- proname => 'utf8_to_euc_tw', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_euc_tw',
+ proname => 'utf8_to_euc_tw', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_euc_tw',
probin => '$libdir/utf8_and_euc_tw' },
{ oid => '4368', descr => 'internal conversion function for GB18030 to UTF8',
- proname => 'gb18030_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'gb18030_to_utf8',
+ proname => 'gb18030_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'gb18030_to_utf8',
probin => '$libdir/utf8_and_gb18030' },
{ oid => '4369', descr => 'internal conversion function for UTF8 to GB18030',
- proname => 'utf8_to_gb18030', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_gb18030',
+ proname => 'utf8_to_gb18030', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_gb18030',
probin => '$libdir/utf8_and_gb18030' },
{ oid => '4370', descr => 'internal conversion function for GBK to UTF8',
- proname => 'gbk_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'gbk_to_utf8',
+ proname => 'gbk_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'gbk_to_utf8',
probin => '$libdir/utf8_and_gbk' },
{ oid => '4371', descr => 'internal conversion function for UTF8 to GBK',
- proname => 'utf8_to_gbk', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_gbk',
+ proname => 'utf8_to_gbk', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_gbk',
probin => '$libdir/utf8_and_gbk' },
{ oid => '4372',
descr => 'internal conversion function for UTF8 to ISO-8859 2-16',
- proname => 'utf8_to_iso8859', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_iso8859',
+ proname => 'utf8_to_iso8859', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_iso8859',
probin => '$libdir/utf8_and_iso8859' },
{ oid => '4373',
descr => 'internal conversion function for ISO-8859 2-16 to UTF8',
- proname => 'iso8859_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'iso8859_to_utf8',
+ proname => 'iso8859_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'iso8859_to_utf8',
probin => '$libdir/utf8_and_iso8859' },
{ oid => '4374', descr => 'internal conversion function for LATIN1 to UTF8',
- proname => 'iso8859_1_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'iso8859_1_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'iso8859_1_to_utf8', probin => '$libdir/utf8_and_iso8859_1' },
{ oid => '4375', descr => 'internal conversion function for UTF8 to LATIN1',
- proname => 'utf8_to_iso8859_1', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'utf8_to_iso8859_1', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'utf8_to_iso8859_1', probin => '$libdir/utf8_and_iso8859_1' },
{ oid => '4376', descr => 'internal conversion function for JOHAB to UTF8',
- proname => 'johab_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'johab_to_utf8',
+ proname => 'johab_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'johab_to_utf8',
probin => '$libdir/utf8_and_johab' },
{ oid => '4377', descr => 'internal conversion function for UTF8 to JOHAB',
- proname => 'utf8_to_johab', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_johab',
+ proname => 'utf8_to_johab', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_johab',
probin => '$libdir/utf8_and_johab' },
{ oid => '4378', descr => 'internal conversion function for SJIS to UTF8',
- proname => 'sjis_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'sjis_to_utf8',
+ proname => 'sjis_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'sjis_to_utf8',
probin => '$libdir/utf8_and_sjis' },
{ oid => '4379', descr => 'internal conversion function for UTF8 to SJIS',
- proname => 'utf8_to_sjis', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_sjis',
+ proname => 'utf8_to_sjis', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_sjis',
probin => '$libdir/utf8_and_sjis' },
{ oid => '4380', descr => 'internal conversion function for UHC to UTF8',
- proname => 'uhc_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'uhc_to_utf8',
+ proname => 'uhc_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'uhc_to_utf8',
probin => '$libdir/utf8_and_uhc' },
{ oid => '4381', descr => 'internal conversion function for UTF8 to UHC',
- proname => 'utf8_to_uhc', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4', prosrc => 'utf8_to_uhc',
+ proname => 'utf8_to_uhc', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool', prosrc => 'utf8_to_uhc',
probin => '$libdir/utf8_and_uhc' },
{ oid => '4382',
descr => 'internal conversion function for EUC_JIS_2004 to UTF8',
- proname => 'euc_jis_2004_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'euc_jis_2004_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'euc_jis_2004_to_utf8', probin => '$libdir/utf8_and_euc2004' },
{ oid => '4383',
descr => 'internal conversion function for UTF8 to EUC_JIS_2004',
- proname => 'utf8_to_euc_jis_2004', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'utf8_to_euc_jis_2004', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'utf8_to_euc_jis_2004', probin => '$libdir/utf8_and_euc2004' },
{ oid => '4384',
descr => 'internal conversion function for SHIFT_JIS_2004 to UTF8',
- proname => 'shift_jis_2004_to_utf8', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'shift_jis_2004_to_utf8', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'shift_jis_2004_to_utf8', probin => '$libdir/utf8_and_sjis2004' },
{ oid => '4385',
descr => 'internal conversion function for UTF8 to SHIFT_JIS_2004',
- proname => 'utf8_to_shift_jis_2004', prolang => 'c', prorettype => 'void',
- proargtypes => 'int4 int4 cstring internal int4',
+ proname => 'utf8_to_shift_jis_2004', prolang => 'c', prorettype => 'int4',
+ proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'utf8_to_shift_jis_2004', probin => '$libdir/utf8_and_sjis2004' },
{ oid => '4386',
descr => 'internal conversion function for EUC_JIS_2004 to SHIFT_JIS_2004',
proname => 'euc_jis_2004_to_shift_jis_2004', prolang => 'c',
- prorettype => 'void', proargtypes => 'int4 int4 cstring internal int4',
+ prorettype => 'int4', proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'euc_jis_2004_to_shift_jis_2004',
probin => '$libdir/euc2004_sjis2004' },
{ oid => '4387',
descr => 'internal conversion function for SHIFT_JIS_2004 to EUC_JIS_2004',
proname => 'shift_jis_2004_to_euc_jis_2004', prolang => 'c',
- prorettype => 'void', proargtypes => 'int4 int4 cstring internal int4',
+ prorettype => 'int4', proargtypes => 'int4 int4 cstring internal int4 bool',
prosrc => 'shift_jis_2004_to_euc_jis_2004',
probin => '$libdir/euc2004_sjis2004' },
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 549f2dd045d..4529a15a6ba 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -627,18 +627,18 @@ extern void pg_unicode_to_server(pg_wchar c, unsigned char *s);
extern unsigned short BIG5toCNS(unsigned short big5, unsigned char *lc);
extern unsigned short CNStoBIG5(unsigned short cns, unsigned char lc);
-extern void UtfToLocal(const unsigned char *utf, int len,
- unsigned char *iso,
- const pg_mb_radix_tree *map,
- const pg_utf_to_local_combined *cmap, int cmapsize,
- utf_local_conversion_func conv_func,
- int encoding);
-extern void LocalToUtf(const unsigned char *iso, int len,
- unsigned char *utf,
- const pg_mb_radix_tree *map,
- const pg_local_to_utf_combined *cmap, int cmapsize,
- utf_local_conversion_func conv_func,
- int encoding);
+extern int UtfToLocal(const unsigned char *utf, int len,
+ unsigned char *iso,
+ const pg_mb_radix_tree *map,
+ const pg_utf_to_local_combined *cmap, int cmapsize,
+ utf_local_conversion_func conv_func,
+ int encoding, bool noError);
+extern int LocalToUtf(const unsigned char *iso, int len,
+ unsigned char *utf,
+ const pg_mb_radix_tree *map,
+ const pg_local_to_utf_combined *cmap, int cmapsize,
+ utf_local_conversion_func conv_func,
+ int encoding, bool noError);
extern bool pg_verifymbstr(const char *mbstr, int len, bool noError);
extern bool pg_verify_mbstr(int encoding, const char *mbstr, int len,
@@ -656,18 +656,19 @@ extern void report_invalid_encoding(int encoding, const char *mbstr, int len) pg
extern void report_untranslatable_char(int src_encoding, int dest_encoding,
const char *mbstr, int len) pg_attribute_noreturn();
-extern void local2local(const unsigned char *l, unsigned char *p, int len,
- int src_encoding, int dest_encoding, const unsigned char *tab);
-extern void latin2mic(const unsigned char *l, unsigned char *p, int len,
- int lc, int encoding);
-extern void mic2latin(const unsigned char *mic, unsigned char *p, int len,
- int lc, int encoding);
-extern void latin2mic_with_table(const unsigned char *l, unsigned char *p,
- int len, int lc, int encoding,
- const unsigned char *tab);
-extern void mic2latin_with_table(const unsigned char *mic, unsigned char *p,
- int len, int lc, int encoding,
- const unsigned char *tab);
+extern int local2local(const unsigned char *l, unsigned char *p, int len,
+ int src_encoding, int dest_encoding, const unsigned char *tab,
+ bool noError);
+extern int latin2mic(const unsigned char *l, unsigned char *p, int len,
+ int lc, int encoding, bool noError);
+extern int mic2latin(const unsigned char *mic, unsigned char *p, int len,
+ int lc, int encoding, bool noError);
+extern int latin2mic_with_table(const unsigned char *l, unsigned char *p,
+ int len, int lc, int encoding,
+ const unsigned char *tab, bool noError);
+extern int mic2latin_with_table(const unsigned char *mic, unsigned char *p,
+ int len, int lc, int encoding,
+ const unsigned char *tab, bool noError);
#ifdef WIN32
extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 507b474b1bb..e4aab19ddaa 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1035,13 +1035,14 @@ WHERE p1.conproc = 0 OR
SELECT p.oid, p.proname, c.oid, c.conname
FROM pg_proc p, pg_conversion c
WHERE p.oid = c.conproc AND
- (p.prorettype != 'void'::regtype OR p.proretset OR
- p.pronargs != 5 OR
+ (p.prorettype != 'int4'::regtype OR p.proretset OR
+ p.pronargs != 6 OR
p.proargtypes[0] != 'int4'::regtype OR
p.proargtypes[1] != 'int4'::regtype OR
p.proargtypes[2] != 'cstring'::regtype OR
p.proargtypes[3] != 'internal'::regtype OR
- p.proargtypes[4] != 'int4'::regtype);
+ p.proargtypes[4] != 'int4'::regtype OR
+ p.proargtypes[5] != 'bool'::regtype);
oid | proname | oid | conname
-----+---------+-----+---------
(0 rows)
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 4189a5a4e09..90a735ea5c8 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -542,13 +542,14 @@ WHERE p1.conproc = 0 OR
SELECT p.oid, p.proname, c.oid, c.conname
FROM pg_proc p, pg_conversion c
WHERE p.oid = c.conproc AND
- (p.prorettype != 'void'::regtype OR p.proretset OR
- p.pronargs != 5 OR
+ (p.prorettype != 'int4'::regtype OR p.proretset OR
+ p.pronargs != 6 OR
p.proargtypes[0] != 'int4'::regtype OR
p.proargtypes[1] != 'int4'::regtype OR
p.proargtypes[2] != 'cstring'::regtype OR
p.proargtypes[3] != 'internal'::regtype OR
- p.proargtypes[4] != 'int4'::regtype);
+ p.proargtypes[4] != 'int4'::regtype OR
+ p.proargtypes[5] != 'bool'::regtype);
-- Check for conprocs that don't perform the specific conversion that
-- pg_conversion alleges they do, by trying to invoke each conversion
--
2.20.1
0005-Do-COPY-FROM-encoding-conversion-verification-in-lar.patchtext/x-patch; charset=UTF-8; name=0005-Do-COPY-FROM-encoding-conversion-verification-in-lar.patchDownload
From 2b658ed275f269d502baa7b6159b6d2020486169 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 16 Dec 2020 10:41:49 +0200
Subject: [PATCH 5/5] Do COPY FROM encoding conversion/verification in larger
chunks.
NOTE: This changes behavior in one corner-case: if client and server
encodings are the same single-byte encoding (e.g. latin1), previously the
input would not be checked for zero bytes ('\0'). Any fields containing
zero bytes would be truncated at the zero. But if encoding conversion was
needed, the conversion routine would throw an error on the zero. After
this commit, the input is always checked for zeros.
---
src/backend/commands/copyfrom.c | 41 +++--
src/backend/commands/copyfromparse.c | 213 +++++++++++++++++------
src/backend/utils/mb/mbutils.c | 55 ++++++
src/include/commands/copyfrom_internal.h | 27 ++-
src/include/mb/pg_wchar.h | 6 +
5 files changed, 262 insertions(+), 80 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1b14e9a6eb0..6c72167342b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -23,6 +23,7 @@
#include "access/tableam.h"
#include "access/xact.h"
#include "access/xlog.h"
+#include "catalog/namespace.h"
#include "commands/copy.h"
#include "commands/copyfrom_internal.h"
#include "commands/trigger.h"
@@ -147,15 +148,9 @@ CopyFromErrorCallback(void *arg)
/*
* Error is relevant to a particular line.
*
- * If line_buf still contains the correct line, and it's already
- * transcoded, print it. If it's still in a foreign encoding, it's
- * quite likely that the error is precisely a failure to do
- * encoding conversion (ie, bad data). We dare not try to convert
- * it, and at present there's no way to regurgitate it without
- * conversion. So we have to punt and just report the line number.
+ * If line_buf still contains the correct line, print it.
*/
- if (cstate->line_buf_valid &&
- (cstate->line_buf_converted || !cstate->need_transcoding))
+ if (cstate->line_buf_valid)
{
char *lineval;
@@ -1301,15 +1296,22 @@ BeginCopyFrom(ParseState *pstate,
cstate->file_encoding = cstate->opts.file_encoding;
/*
- * Set up encoding conversion info. Even if the file and server encodings
- * are the same, we must apply pg_any_to_server() to validate data in
- * multibyte encodings.
+ * Set up encoding conversion info. If the file and server encodings are
+ * the same, no conversion is required by we must still validate that the
+ * data is valid for the encoding.
*/
- cstate->need_transcoding =
- (cstate->file_encoding != GetDatabaseEncoding() ||
- pg_database_encoding_max_length() > 1);
- /* See Multibyte encoding comment above */
- cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
+ if (cstate->file_encoding == GetDatabaseEncoding() ||
+ cstate->file_encoding == PG_SQL_ASCII ||
+ GetDatabaseEncoding() == PG_SQL_ASCII)
+ {
+ cstate->need_transcoding = false;
+ }
+ else
+ {
+ cstate->need_transcoding = true;
+ cstate->conversion_proc = FindDefaultConversionProc(cstate->file_encoding,
+ GetDatabaseEncoding());
+ }
cstate->copy_src = COPY_FILE; /* default */
@@ -1338,7 +1340,12 @@ BeginCopyFrom(ParseState *pstate,
if (!cstate->opts.binary)
{
initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
+
+ if (cstate->need_transcoding)
+ {
+ cstate->conversion_buf = palloc(CONVERSION_BUF_SIZE + 1);
+ cstate->conversion_buf_index = cstate->conversion_buf_len = 0;
+ }
}
/* Assign range table, we'll need it in CopyFrom. */
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 34ed3cfcd5b..ffeebed43e3 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -116,7 +116,8 @@ static int CopyGetData(CopyFromState cstate, void *databuf,
int minread, int maxread);
static inline bool CopyGetInt32(CopyFromState cstate, int32 *val);
static inline bool CopyGetInt16(CopyFromState cstate, int16 *val);
-static bool CopyLoadRawBuf(CopyFromState cstate);
+static bool CopyLoadRawBufText(CopyFromState cstate);
+static bool CopyLoadRawBufBinary(CopyFromState cstate);
static int CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes);
void
@@ -357,6 +358,65 @@ CopyGetInt16(CopyFromState cstate, int16 *val)
return true;
}
+/*
+ * Convert input data from 'conversion_buf', writing it into
+ * 'raw_buf'.
+ *
+ * 'conversion_buf' mustn't be empty.
+ */
+static void
+CopyConvertBuf(CopyFromState cstate)
+{
+ int convertedbytes;
+ int srclen;
+ int dstlen;
+
+ Assert(cstate->raw_buf_index == 0);
+
+ srclen = cstate->conversion_buf_len - cstate->conversion_buf_index;
+ dstlen = RAW_BUF_SIZE - cstate->raw_buf_len + 1;
+
+ /*
+ * Do the conversion. This might stop short, if there is an invalid byte
+ * sequence in the input. We'll convert as much as we can in that case.
+ *
+ * Note: Even if we hit an invalid byte sequence, we don't report the error
+ * until all the valid bytes have been consumed. The input might contain
+ * an end-of-input marker (\.), and we don't want to report an error if
+ * the invalid byte sequence is after the end-of-input marker. We might
+ * still convert extra data after the end-of-input marker if it's valid
+ * for the encoding, but that's harmless.
+ */
+ convertedbytes = pg_do_encoding_conversion_buf(cstate->conversion_proc,
+ cstate->file_encoding,
+ GetDatabaseEncoding(),
+ (unsigned char *) cstate->conversion_buf + cstate->conversion_buf_index,
+ srclen,
+ (unsigned char *) cstate->raw_buf + cstate->raw_buf_len,
+ dstlen,
+ true);
+ if (convertedbytes == 0)
+ {
+ /*
+ * No more valid input in the buffer, and we have hit an invalid byte sequence.
+ * Let the conversion function throw the error.
+ */
+ convertedbytes = pg_do_encoding_conversion_buf(cstate->conversion_proc,
+ cstate->file_encoding,
+ GetDatabaseEncoding(),
+ (unsigned char *) cstate->conversion_buf + cstate->conversion_buf_index,
+ srclen,
+ (unsigned char *) cstate->raw_buf + cstate->raw_buf_len,
+ dstlen,
+ false);
+ /* pg_do_encoding_conversion_buf should've reported the error */
+ Assert(convertedbytes == 0);
+ elog(ERROR, "conversion error");
+ }
+ cstate->conversion_buf_index += convertedbytes;
+ cstate->raw_buf_len += strlen(cstate->raw_buf + cstate->raw_buf_len);
+ cstate->valid_raw_buf_len = cstate->raw_buf_len;
+}
/*
* CopyLoadRawBuf loads some more data into raw_buf
@@ -368,7 +428,90 @@ CopyGetInt16(CopyFromState cstate, int16 *val)
* when a multibyte character crosses a bufferload boundary.
*/
static bool
-CopyLoadRawBuf(CopyFromState cstate)
+CopyLoadRawBufText(CopyFromState cstate)
+{
+ int nbytes = RAW_BUF_BYTES(cstate);
+ int inbytes;
+
+ /* Copy down the unprocessed data if any. */
+ if (nbytes > 0)
+ {
+ memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
+ nbytes);
+ }
+ cstate->raw_buf_index = 0;
+ cstate->raw_buf_len = nbytes;
+
+ if (cstate->need_transcoding)
+ {
+ for (;;)
+ {
+ /* If we still have a good amount of unconverted data left, convert it. */
+ nbytes = cstate->conversion_buf_len - cstate->conversion_buf_index;
+ if (nbytes >= MAX_CONVERSION_GROWTH)
+ {
+ CopyConvertBuf(cstate);
+ return true;
+ }
+
+ /* Load more raw bytes to the conversion buffer */
+ if (nbytes > 0 && cstate->conversion_buf_index > 0)
+ {
+ memmove(cstate->conversion_buf, cstate->conversion_buf + cstate->conversion_buf_index,
+ nbytes);
+ }
+ cstate->conversion_buf_index = 0;
+ cstate->conversion_buf_len = nbytes;
+ inbytes = CopyGetData(cstate, cstate->conversion_buf + cstate->conversion_buf_len,
+ 1, CONVERSION_BUF_SIZE - cstate->conversion_buf_len);
+ cstate->conversion_buf_len += inbytes;
+
+ if (inbytes == 0)
+ {
+ /* Hit EOF. If we have any unconverted bytes left, convert them now */
+ if (cstate->conversion_buf_index < cstate->conversion_buf_len)
+ {
+ CopyConvertBuf(cstate);
+ return true;
+ }
+
+ /* truly hit EOF */
+ cstate->valid_raw_buf_len = 0;
+ return false;
+ }
+ }
+ }
+ else
+ {
+ /*
+ * No encoding conversion required. But we still need to verify that the input is
+ * valid.
+ *
+ * XXX: for single-byte encoding, the verification only needs to check that the
+ * input doesn't contain any zero bytes. Could we skip that altogether?
+ */
+ int validbytes;
+
+ inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
+ 1, RAW_BUF_SIZE - nbytes);
+ nbytes += inbytes;
+ cstate->raw_buf[nbytes] = '\0';
+ cstate->raw_buf_len = nbytes;
+
+ validbytes = pg_encoding_verifymbstr(cstate->file_encoding, cstate->raw_buf, nbytes);
+ if (validbytes == 0 && nbytes > 0)
+ {
+ report_invalid_encoding(cstate->file_encoding, cstate->raw_buf, nbytes);
+ }
+
+ cstate->valid_raw_buf_len = validbytes;
+ }
+
+ return (inbytes > 0);
+}
+
+static bool
+CopyLoadRawBufBinary(CopyFromState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
@@ -384,6 +527,7 @@ CopyLoadRawBuf(CopyFromState cstate)
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
cstate->raw_buf_len = nbytes;
+
return (inbytes > 0);
}
@@ -419,7 +563,7 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
/* Load more data if buffer is empty. */
if (RAW_BUF_BYTES(cstate) == 0)
{
- if (!CopyLoadRawBuf(cstate))
+ if (!CopyLoadRawBufBinary(cstate))
break; /* EOF */
}
@@ -695,9 +839,6 @@ CopyReadLine(CopyFromState cstate)
resetStringInfo(&cstate->line_buf);
cstate->line_buf_valid = true;
- /* Mark that encoding conversion hasn't occurred yet */
- cstate->line_buf_converted = false;
-
/* Parse data and transfer into line_buf */
result = CopyReadLineText(cstate);
@@ -710,10 +851,13 @@ CopyReadLine(CopyFromState cstate)
*/
if (cstate->copy_src == COPY_NEW_FE)
{
+ int inbytes;
+
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
- } while (CopyLoadRawBuf(cstate));
+ inbytes = CopyGetData(cstate, cstate->raw_buf,
+ 1, RAW_BUF_SIZE);
+ } while (inbytes > 0);
}
}
else
@@ -750,26 +894,6 @@ CopyReadLine(CopyFromState cstate)
}
}
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
-
return result;
}
@@ -785,7 +909,6 @@ CopyReadLineText(CopyFromState cstate)
bool need_data = false;
bool hit_eof = false;
bool result = false;
- char mblen_str[2];
/* CSV variables */
bool first_char_in_line = true;
@@ -803,8 +926,6 @@ CopyReadLineText(CopyFromState cstate)
escapec = '\0';
}
- mblen_str[1] = '\0';
-
/*
* The objective of this loop is to transfer the entire next input line
* into line_buf. Hence, we only care for detecting newlines (\r and/or
@@ -828,7 +949,7 @@ CopyReadLineText(CopyFromState cstate)
*/
copy_raw_buf = cstate->raw_buf;
raw_buf_ptr = cstate->raw_buf_index;
- copy_buf_len = cstate->raw_buf_len;
+ copy_buf_len = cstate->valid_raw_buf_len;
for (;;)
{
@@ -853,10 +974,10 @@ CopyReadLineText(CopyFromState cstate)
* Try to read some more data. This will certainly reset
* raw_buf_index to zero, and raw_buf_ptr must go with it.
*/
- if (!CopyLoadRawBuf(cstate))
+ if (!CopyLoadRawBufText(cstate))
hit_eof = true;
raw_buf_ptr = 0;
- copy_buf_len = cstate->raw_buf_len;
+ copy_buf_len = cstate->valid_raw_buf_len;
/*
* If we are completely out of data, break out of the loop,
@@ -1102,30 +1223,6 @@ CopyReadLineText(CopyFromState cstate)
* value, while in non-CSV mode, \. cannot be a data value.
*/
not_end_of_copy:
-
- /*
- * Process all bytes of a multi-byte character as a group.
- *
- * We only support multi-byte sequences where the first byte has the
- * high-bit set, so as an optimization we can avoid this block
- * entirely if it is not set.
- */
- if (cstate->encoding_embeds_ascii && IS_HIGHBIT_SET(c))
- {
- int mblen;
-
- /*
- * It is enough to look at the first byte in all our encodings, to
- * get the length. (GB18030 is a bit special, but still works for
- * our purposes; see comment in pg_gb18030_mblen())
- */
- mblen_str[0] = c;
- mblen = pg_encoding_mblen(cstate->file_encoding, mblen_str);
-
- IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1);
- IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
- raw_buf_ptr += mblen - 1;
- }
first_char_in_line = false;
} /* end of outer loop */
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index a585e3a6f1e..8c8b56cc2c9 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -436,6 +436,61 @@ pg_do_encoding_conversion(unsigned char *src, int len,
return result;
}
+/*
+ * Convert src string to another encoding.
+ *
+ * This function has a different API than the other conversion functions.
+ * The caller should've looked up the conversion function using
+ * FindDefaultConversionProc(). Unlike the other functions, the converted
+ * result is not palloc'd. It is written to a caller-supplied buffer instead.
+ *
+ * src_encoding - encoding to convert from
+ * dest_encoding - encoding to convert to
+ * src, srclen - input buffer and its length in bytes
+ * dest, destlen - destination buffer and its size in bytes
+ *
+ * The output is null-terminated.
+ *
+ * If destlen < srclen * MAX_CONVERSION_LENGTH + 1, the converted output
+ * wouldn't necessarily fit in the output buffer, and the function will not
+ * convert the whole input.
+ *
+ * TODO: It would be nice to also return the number of bytes written to the
+ * caller, to avoid a call to strlen().
+ */
+int
+pg_do_encoding_conversion_buf(Oid proc,
+ int src_encoding,
+ int dest_encoding,
+ unsigned char *src, int srclen,
+ unsigned char *dest, int destlen,
+ bool noError)
+{
+ Datum result;
+
+ /*
+ * If the destination buffer is not large enough to hold the
+ * result in the worst case, limit the input size passed to
+ * the conversion function.
+ *
+ * TODO: It would perhaps be more efficient to pass the destination
+ * buffer size to the conversion function, so that if the conversion
+ * expands less than the worst case, it could continue to fill up the
+ * whole buffer.
+ */
+ if ((Size) srclen >= ((destlen - 1) / (Size) MAX_CONVERSION_GROWTH))
+ srclen = ((destlen - 1) / (Size) MAX_CONVERSION_GROWTH);
+
+ result = OidFunctionCall6(proc,
+ Int32GetDatum(src_encoding),
+ Int32GetDatum(dest_encoding),
+ CStringGetDatum(src),
+ CStringGetDatum(dest),
+ Int32GetDatum(srclen),
+ BoolGetDatum(noError));
+ return DatumGetInt32(result);
+}
+
/*
* Convert string to encoding encoding_name. The source
* encoding is the DB encoding.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c15ea803c32..4366aa253cd 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -77,7 +77,7 @@ typedef struct CopyFromStateData
EolType eol_type; /* EOL type of input */
int file_encoding; /* file or remote side's character encoding */
bool need_transcoding; /* file encoding diff from server? */
- bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+ Oid conversion_proc;
/* parameters from the COPY command */
Relation rel; /* relation to copy from */
@@ -139,23 +139,40 @@ typedef struct CopyFromStateData
* line_buf is not used.)
*/
StringInfoData line_buf;
- bool line_buf_converted; /* converted to server encoding? */
bool line_buf_valid; /* contains the row being processed? */
/*
- * Finally, raw_buf holds raw data read from the data source (file or
- * client connection). In text mode, CopyReadLine parses this data
+ * conversion_buf holds raw input data read from the data source (file or
+ * client connection), not yet converted to the database encoding.
+ *
+ * If the encoding conversion is not required, the input data is read
+ * directly into 'raw_buf', and conversion_buf is not used.
+ */
+#define CONVERSION_BUF_SIZE 65536 /* we palloc CONVERSION_BUF_SIZE+1 bytes */
+ char *conversion_buf;
+ int conversion_buf_index;
+ int conversion_buf_len;
+
+ /*
+ * raw_buf holds input data, already converted to database encoding.
+ *
+ * In text mode, CopyReadLine parses this data
* sufficiently to locate line boundaries, then transfers the data to
- * line_buf and converts it. In binary mode, CopyReadBinaryData fetches
+ * line_buf. In binary mode, CopyReadBinaryData fetches
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
+ *
+ * XXX: 'raw_buf' is a bit of a misnomer, since the data in 'conversion_buf'
+ * is more raw than this.
*/
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int valid_raw_buf_len;
/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
} CopyFromStateData;
extern void ReceiveCopyBegin(CopyFromState cstate);
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 4529a15a6ba..c8f323a474a 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -616,6 +616,12 @@ extern int pg_bind_textdomain_codeset(const char *domainname);
extern unsigned char *pg_do_encoding_conversion(unsigned char *src, int len,
int src_encoding,
int dest_encoding);
+extern int pg_do_encoding_conversion_buf(Oid proc,
+ int src_encoding,
+ int dest_encoding,
+ unsigned char *src, int srclen,
+ unsigned char *dst, int dstlen,
+ bool noError);
extern char *pg_client_to_server(const char *s, int len);
extern char *pg_server_to_client(const char *s, int len);
--
2.20.1
On Wed, Dec 16, 2020 at 02:17:58PM +0200, Heikki Linnakangas wrote:
I've been looking at the COPY FROM parsing code, trying to refactor it so
that the parallel COPY would be easier to implement. I haven't touched
parallelism itself, just looking for ways to smoothen the way. And for ways
to speed up COPY in general.
Yes, this makes a lot of sense. Glad you are looking into this.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
One of the patches in this patch set is worth calling out separately:
0003-Add-direct-conversion-routines-between-EUC_TW-and-Bi.patch. Per
commit message:
Add direct conversion routines between EUC_TW and Big5.
Conversions between EUC_TW and Big5 were previously implemented by
converting the whole input to MIC first, and then from MIC to the
target encoding. Implement functions to convert directly between the
two.
The reason to do this now is that the next patch will change the
change the conversion function signature so that if the input is
invalid, we convert as much as we can and return the number of bytes
successfully converted. That's not possible if we use an intermediary
format, because if an error happens in the intermediary -> final
conversion, we lose track of the location of the invalid character in
the original input. Avoiding the intermediate step should be faster
too.
This patch is fairly independent of the others. It could be reviewed and
applied separately.
In order to verify that the new code is correct, I wrote some helper
plpgsql functions to generate all valid EUC_TW and Big5 byte sequences
that encode one character, and tested converting each of them. Then I
compared the the results with unpatched server, to check that the new
code performs the same conversion. This is perhaps overkill, but since
its pretty straightforward to enumerate all the input characters, might
as well do it.
For the sake of completeness, I wrote similar helpers for all the other
encodings and conversions. Except for UTF-8, there are too many formally
valid codepoints for that to feasible. This does test round-trip
conversions of all codepoints from all the other encodings to UTF-8 and
back, though, so there's pretty good coverage of UTF-8 too.
This test suite is probably too large to add to the source tree, but for
the sake of the archives, I'm attaching it here. The first patch adds
the test suite, including the expected output of each conversion. The
second patch contains expected output changes for the above patch to add
direct conversions between EUC_TW and Big5. It affected the error
messages for some byte sequences that cannot be converted. For example,
on unpatched master:
postgres=# select convert('\xfdcc', 'euc_tw', 'big5');
ERROR: character with byte sequence 0x95 0xfd 0xcc in encoding
"MULE_INTERNAL" has no equivalent in encoding "BIG5"
With the patch:
postgres=# select convert('\xfdcc', 'euc_tw', 'big5');
ERROR: character with byte sequence 0xfd 0xcc in encoding "EUC_TW" has
no equivalent in encoding "BIG5"
The old message talked about "MULE_INTERNAL" which exposes the
implementation detail that we used it as an intermediate in the
conversion. That can be confusing to a user, the new message makes more
sense. So that's also nice.
- Heikki
Attachments:
0001-Add-conversion-test-suite.patch.bz2application/x-bzip; name=0001-Add-conversion-test-suite.patch.bz2Download
BZh91AY&SYHK��k[_��\����N����`��� �` � 0 ^W��4�` @�Kj4� @ ���m�R @ ���j[* @ �Q�� �5��P @�M��H ���l � ���JR��)���) <