Sequence of characters not supported by psql/pg_dump
Hello,
I use PostgreSQL 7.0.2 on linux.
The base was set with initdb -E UNICODE.
I have many Strings with accents (french language).
Some of them aren't supported by queries or pg_dump:
WORKING EXAMPLE:
DB=# select * from element_attribute where java_lang_string like 'Scholt�_s';
doc_id | tag_id | java_lang_string | type | java_lang_integer
--------+--------+------------------+------+-------------------
(0
rows)
NOT WORKING EXAMPLE (psql still waiting for characters to be entered):
DB=# select * from element_attribute where java_lang_string like 'Scholt�s';
DB'#
ERRORS IN FILE GENERATED BY PG_DUMP (AS A RESULT THE WHOLE TABLE WON'T BE
RESTORED):
2220 3 Faure java.lang.String \N
2221 3 Rosi�res java.lang.String \N
2222 3 Rosi�res java.lang.String \N
2223 3 Rosi�res java.lang.String \N
2224 3 Rosi�res java.lang.String \N
2225 3 Rosi�res java.lang.String \N
2226 3 Rosi�res java.lang.String \N
2227 3 Scholt�s
2228 3 Scholt�s
2229 3 Scholt�s
2230 3 Scholt�s
2231 3 Scholt�s
2232 3 Scholt�s
2233 3 Scholt�s
2234 3 Scholt�s
2235 3 Whirlpool java.lang.String \N
As you can see, the string ended by '�s' is quite lethal for Postgre.
The success of pg_dump depends on database content !
It seems that any string ending by an accent followed by less than 2 characters
is a problem.
Is it a known problem?
Is it a database configuration problem or a real bug in Postgre?
This problem is on my production database. I would appreciate If you could give
me an answer soon on that point.
Thanks in advance.
Samuel Minne
I use PostgreSQL 7.0.2 on linux.
The base was set with initdb -E UNICODE.I have many Strings with accents (french language).
Some of them aren't supported by queries or pg_dump:WORKING EXAMPLE:
DB=# select * from element_attribute where java_lang_string like 'Scholt�_s';
Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is
encoded in UTF-8? It's 2 bytes long and starting with 0xc...
--
Tatsuo Ishii
Le jeu, 11 jan 2001, Tatsuo Ishii a �crit :
I use PostgreSQL 7.0.2 on linux.
The base was set with initdb -E UNICODE.I have many Strings with accents (french language).
Some of them aren't supported by queries or pg_dump:WORKING EXAMPLE:
DB=# select * from element_attribute where java_lang_string like 'Scholt�_s';Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is
encoded in UTF-8? It's 2 bytes long and starting with 0xc...
--
Tatsuo Ishii
Sorry for the delay. We had very strong production constraint, and I had to let it go for a while.
Well, How can i know about the internal coding of this letter in the database ?
The texts where inserted from command line inserts. When inserted from JDBC (so it's supposed to be unicode),
I can't see any difference, even in the generated dump file.
Today I've seen a post about the same problem, that gives a solution to produce a working dump: pg_dump -d $dumpfile.
This command produce this kind of lines (you :
INSERT INTO "element_texte" VALUES (634,'','Filtration du Plasma','\350');
I noticed that using \xxx notation i can handle special characters in queries from psql command line, too.
But it doesn't look like unicode coding (cf www.unicode.org/charts and LATIN1-Supplement), as I was expecting.
So, I have the following questions:
- What kind of code is this ?
- can I get the translation chart somewhere ?
- why isn't it UNICODE ?
- why do I have to use \xxx code (and not 0x..., or directly the special character like "�") from the psql command line,
although it is supposed to support UNICODE ?
- why don't we find this kind of characters in the dump file when the -d option is not set ?
(I assume this cause the restore to fail, and I think it could be considered as a bug).
Thank you for your help
Le jeu, 11 jan 2001, Tatsuo Ishii a ���crit :
I use PostgreSQL 7.0.2 on linux.
The base was set with initdb -E UNICODE.I have many Strings with accents (french language).
Some of them aren't supported by queries or pg_dump:WORKING EXAMPLE:
DB=# select * from element_attribute where java_lang_string like 'Scholt���_s';Are you sure that the letter (LATIN SMALL LETTER E WITH GRAVE) is
encoded in UTF-8? It's 2 bytes long and starting with 0xc...
--
Tatsuo IshiiSorry for the delay. We had very strong production constraint, and I had to let it go for a while.
Well, How can i know about the internal coding of this letter in the database ?
The texts where inserted from command line inserts. When inserted from JDBC (so it's supposed to be unicode),
I can't see any difference, even in the generated dump file.Today I've seen a post about the same problem, that gives a solution to produce a working dump: pg_dump -d $dumpfile.
This command produce this kind of lines (you :
INSERT INTO "element_texte" VALUES (634,'','Filtration du Plasma','\350');I noticed that using \xxx notation i can handle special characters in queries from psql command line, too.
But it doesn't look like unicode coding (cf www.unicode.org/charts and LATIN1-Supplement), as I was expecting.So, I have the following questions:
- What kind of code is this ?
Probably ISO 8859-1.
- can I get the translation chart somewhere ?
- why isn't it UNICODE ?
Because you didn't input as UTF-8.
- why do I have to use \xxx code (and not 0x..., or directly the special character like "���") from the psql command line,
although it is supposed to support UNICODE ?
- why don't we find this kind of characters in the dump file when the -d option is not set ?
(I assume this cause the restore to fail, and I think it could be considered as a bug).Thank you for your help
In the releases prior 7.1, you need to input UTF-8 explicitely. 7.1
has the ability that does automatic encoding conversion between ISO
8859-1 and UTF-8. That means, if you type in characters in ISO 8859-1,
PostgreSQL will convert it to UTF-8 then store into the database.
--
Tatsuo Ishii