latin1 unicode conversion errors

Started by Kris Jurkaalmost 20 years ago3 messages
#1Kris Jurka
books@ejurka.com

Why is latin1 special in its conversion from unconvertible unicode data?
Other latin character sets add a warning, but latin1 errors out.

jurka=# create database utf8 with encoding ='utf8';
CREATE DATABASE
jurka=# \c utf8
You are now connected to database "utf8".
utf8=# create table t(a text);
CREATE TABLE
utf8=# insert into t values ('\346\231\243');
INSERT 0 1
utf8=# set client_encoding = 'latin2';
SET
utf8=# select * from t;
WARNING: ignoring unconvertible UTF-8 character 0xe699a3
a
---

(1 row)

utf8=# set client_encoding = 'latin1';
SET
utf8=# select * from t;
ERROR: could not convert UTF8 character 0x00e6 to ISO8859-1

Kris Jurka

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Kris Jurka (#1)
Re: latin1 unicode conversion errors

My guess is that it was coded by someone different and needs to be made
consistent.

---------------------------------------------------------------------------

Kris Jurka wrote:

Why is latin1 special in its conversion from unconvertible unicode data?
Other latin character sets add a warning, but latin1 errors out.

jurka=# create database utf8 with encoding ='utf8';
CREATE DATABASE
jurka=# \c utf8
You are now connected to database "utf8".
utf8=# create table t(a text);
CREATE TABLE
utf8=# insert into t values ('\346\231\243');
INSERT 0 1
utf8=# set client_encoding = 'latin2';
SET
utf8=# select * from t;
WARNING: ignoring unconvertible UTF-8 character 0xe699a3
a
---

(1 row)

utf8=# set client_encoding = 'latin1';
SET
utf8=# select * from t;
ERROR: could not convert UTF8 character 0x00e6 to ISO8859-1

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Kris Jurka (#1)
1 attachment(s)
Re: latin1 unicode conversion errors

OK, yea, it is inconsistent. I changed it do throw a warning instead.
Only patched to 8.2 because it is a behavior change.

---------------------------------------------------------------------------

Kris Jurka wrote:

Why is latin1 special in its conversion from unconvertible unicode data?
Other latin character sets add a warning, but latin1 errors out.

jurka=# create database utf8 with encoding ='utf8';
CREATE DATABASE
jurka=# \c utf8
You are now connected to database "utf8".
utf8=# create table t(a text);
CREATE TABLE
utf8=# insert into t values ('\346\231\243');
INSERT 0 1
utf8=# set client_encoding = 'latin2';
SET
utf8=# select * from t;
WARNING: ignoring unconvertible UTF-8 character 0xe699a3
a
---

(1 row)

utf8=# set client_encoding = 'latin1';
SET
utf8=# select * from t;
ERROR: could not convert UTF8 character 0x00e6 to ISO8859-1

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c,v
retrieving revision 1.13
diff -c -c -r1.13 utf8_and_iso8859_1.c
*** src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c	25 Dec 2005 02:14:18 -0000	1.13
--- src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c	12 Feb 2006 20:59:36 -0000
***************
*** 84,91 ****
  			len -= 2;
  		}
  		else if ((c & 0xe0) == 0xe0)
! 			elog(ERROR, "could not convert UTF8 character 0x%04x to ISO8859-1",
! 				 c);
  		else
  		{
  			*dest++ = c;
--- 84,93 ----
  			len -= 2;
  		}
  		else if ((c & 0xe0) == 0xe0)
! 			ereport(WARNING,
! 					(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
! 					 errmsg("ignoring unconvertible UTF-8 character 0x%04x",
! 							c)));
  		else
  		{
  			*dest++ = c;