Strange UTF-8 behaviour
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<small><font face="Century Gothic">Hi there all. <br>
I am quite new to Postgres, so forgive me if this question seems
obvious. <br>
<br>
I have created a database with the UTF-8 encoding (createdb cassa
--encoding=UTF-8) .<br>
Then I have made the following tests :<br>
<br>
</font></small><small><font face="Century Gothic">cassa=> </font></small><small><font
face="Century Gothic">create table test(id varchar(5));<br>
cassa=> insert into test values ('12345');<br>
INSERT 178725 1<br>
cassa=> insert into test values ('123è');<br>
INSERT 178726 1<br>
cassa=> insert into test values ('1234è');<br>
ERROR: value too long for type character varying(5)<br>
<br>
<br>
but if I try <br>
cassa=> select '#' || id || '#' from test;<br>
?column?<br>
----------<br>
#12345#<br>
#123è#<br>
(2 rows)<br>
<br>
<br>
so, apparently the chars are stored the rigth way (</font></small><small><font
face="Century Gothic"> #123è#) but when trying the query the è char is
parsed as 2 chars ....<br>
<br>
The database server version is 7.3.4 on a RedHat 9 machine ...<br>
<br>
Any clue ?<br>
<br>
Tia <br>
Marco<br>
</font></small><small><font face="Century Gothic"><br>
<br>
</font></small>
<pre class="moz-signature" cols="72">--
Ever noticed how fast windows run ? neither did I
</pre>
</body>
</html>
My guess is that something in the chain of getting the data into the
database is measuring:
BYTES
not
CHARACTERS.
"Marco Ferretti" <marco.ferretti@jrc.it> wrote:
</quote--------------------------------------->
<snip>
I have created a database with the UTF-8 encoding (createdb cassa
--encoding=UTF-8) .
Then I have made the following tests :
cassa=> create table test(id varchar(5));
cassa=> insert into test values ('12345');
INSERT 178725 1
cassa=> insert into test values ('123è');
INSERT 178726 1
cassa=> insert into test values ('1234è');
ERROR: value too long for type character varying(5)
<snip>
so, apparently the chars are stored the rigth way ( #123è#) but when
trying the query the è char is parsed as 2 chars ....
The database server version is 7.3.4 on a RedHat 9 machine ...
Any clue ?
</quote--------------------------------------->
Import Notes
Resolved by subject fallback
On Thu, Sep 16, 2004 at 06:10:13PM +0200, Marco Ferretti wrote:
I am quite new to Postgres, so forgive me if this question seems
obvious. <br>
<br>
I have created a database with the UTF-8 encoding (createdb cassa
--encoding=UTF-8) .<br>
Then I have made the following tests :<br>
FWIW, I can't reproduce this using 7.3.6. Is there anything special
about your 'e' character, or it's a plain 'e'?
$ createdb test --encoding=UTF-8
CREATE DATABASE
COMMENT
$ psql test
Welcome to psql 7.3.6, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
test=# create table test (id char(5));
CREATE TABLE
test=# insert into test values ('1234e');
INSERT 16993 1
test=# create table test2 (id varchar(5));
CREATE TABLE
test=# insert into test2 values ('1234e');
INSERT 16996 1
test=# insert into test2 values ('123e');
INSERT 16997 1
test=# select '#' || id || '#', length(id) from test2;
?column? | length
----------+--------
#1234e# | 5
#123e# | 4
(2 rows)
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Escucha y olvidar�s; ve y recordar�s; haz y entender�s" (Confucio)
Hi Alvaro,
FWIW, I can't reproduce this using 7.3.6. Is there anything special
about your 'e' character, or it's a plain 'e'?
Maybe you didn't get the email correctly. It was an e with grave
accent:, just like this:
� (UTF-8 encoded)
I just checked on PG 7.4.3 / NetBSD, with this results:
egrave=# CREATE TABLE test (data varchar(5));
CREATE
egrave=# show server_encoding ;
client_encoding
-----------------
UNICODE
(1 row)
egrave=# show client_encoding ; -- don't know why it is set to unicode
client_encoding
-----------------
UNICODE
(1 row)
egrave=# INSERT INTO test VALUES ('1234�');
egrave'# '\r
Query buffer reset (cleared).
egrave=# set client_encoding = 'ISO8859-1';
SET
egrave=# show client_encoding ;
client_encoding
-----------------
ISO8859-1
(1 row)
egrave=# INSERT INTO test VALUES ('1234�');
INSERT 25340 1
egrave=# SELECT * FROM test;
data
------
1234�
(1 row)
It seems all is working when client encoding is set correctly up. Try to
check you client and server encoding.
I've also double checked with:
egrave=# SET client_encoding = 'ISO8859-2';
SET
egrave=# SELECT * FROM test;
WARNING: ignoring unconvertible UTF-8 character 0xc3a8
data
------
1234
(1 row)
Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Hi,
è (UTF-8 encoded)
Sorry, I actually forgot to switch encoding :)
I just hope the last part of the email was readable.
Ciao ciao
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#330099">
Thanks to all you guys ! You really helped<br>
<br>
marco <br>
</body>
</html>