Another encoding issue
Hi all,
Here's another interesting encoding issue. I cannot recall having seen it
on the lists.
---
[swm@laptop build7]$ bin/createdb -E LATIN1 test
CREATE DATABASE
[swm@laptop build7]$ cat break.sh
dat=`echo -en "\245\241"`
echo "create table test (d text);"
echo "insert into test values('$dat');"
[swm@laptop build7]$ sh break.sh | bin/psql test
CREATE TABLE
INSERT 0 1
[swm@laptop build7]$ bin/createdb -T test test2
CREATE DATABASE
[swm@laptop build7]$ bin/createdb -T test -E UTF-8 test2
CREATE DATABASE
[swm@laptop build7]$ bin/pg_dump -C test2 > test2.dmp
[swm@laptop build7]$ bin/dropdb test2
DROP DATABASE
[swm@laptop build7]$ bin/psql template1 -f test2.dmp
SET
SET
SET
CREATE DATABASE
ALTER DATABASE
You are now connected to database "test2".
[...]
CREATE TABLE
ALTER TABLE
psql:test2.dmp:345: ERROR: invalid UTF-8 byte sequence detected near byte
0xa5
CONTEXT: COPY test, line 1, column d: " "
[...]
---
Until createdb() is a lot more sophisticated, we cannot translate
characters between encodings. I don't think this is a huge issue though,
as most people are only going to be creating empty databases anyway.
Still, it probably requires documentation.
Thoughts?
Thanks,
Gavin
If we're bringing up odd encoding issues, why not talk about the mystery
encoding of the shared catalogs? :)
Basically depending on which database you're logged into when you alter
a catalog will affect what encoding the new object appears as in the
shared catalog.
This for one makes it impossible for us in phpPgAdmin to display a list
of databases, where some database names are in EUC and some are in UTF-8
and some are in LATIN5...
I bring it up as I notice that in MySQL 5 at least, all system object
names (in our case that'd be all strings in the shared catalogs) are
stored in UTF-8, always.
Chris
Gavin Sherry wrote:
Show quoted text
Hi all,
Here's another interesting encoding issue. I cannot recall having seen it
on the lists.---
[swm@laptop build7]$ bin/createdb -E LATIN1 test
CREATE DATABASE
[swm@laptop build7]$ cat break.sh
dat=`echo -en "\245\241"`echo "create table test (d text);"
echo "insert into test values('$dat');"
[swm@laptop build7]$ sh break.sh | bin/psql test
CREATE TABLE
INSERT 0 1
[swm@laptop build7]$ bin/createdb -T test test2
CREATE DATABASE
[swm@laptop build7]$ bin/createdb -T test -E UTF-8 test2
CREATE DATABASE
[swm@laptop build7]$ bin/pg_dump -C test2 > test2.dmp
[swm@laptop build7]$ bin/dropdb test2
DROP DATABASE
[swm@laptop build7]$ bin/psql template1 -f test2.dmp
SET
SET
SET
CREATE DATABASE
ALTER DATABASE
You are now connected to database "test2".
[...]
CREATE TABLE
ALTER TABLE
psql:test2.dmp:345: ERROR: invalid UTF-8 byte sequence detected near byte
0xa5
CONTEXT: COPY test, line 1, column d: " "
[...]
---Until createdb() is a lot more sophisticated, we cannot translate
characters between encodings. I don't think this is a huge issue though,
as most people are only going to be creating empty databases anyway.
Still, it probably requires documentation.Thoughts?
Thanks,
Gavin
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
Gavin Sherry <swm@linuxworld.com.au> writes:
Here's another interesting encoding issue. I cannot recall having seen it
on the lists.
This problem has been mentioned before, eg here
http://archives.postgresql.org/pgsql-hackers/2005-03/msg01004.php
(that whole thread is relevant to the problem).
But I agree it's not well documented.
regards, tom lane