Weird encoding behavior

Started by "Leonardo M. Ramé"about 17 years ago7 messagesgeneral
Jump to latest
#1"Leonardo M. Ramé"
martinrame@yahoo.com

Hi, I'm experiencing a weird behavior when storing latin characters to a
PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
working since September 2008, it wasn't updated nor replaced since its
first installation.

The weirdness of the problem is that sometimes the characters are stored
correctly, but sometimes doesn't (allways by the same program), the
field type is Varchar(30), and for example the text "NU�EZ" is stored as
"NU?EZ".

The data comes from an external application in an XML file (also
Latin1), then, a Delphi service parses the XML and create the
Insert/Update statements to store the data in the database. I'd try to
reproduce the bug by sending XML files with '�' to the service, but it
is stored correctly.

Also, there's a front end that allows users to see/edit the data in a
user friendlier way. Again, I checked by inserting records with '�'
using this front-end, and also are stored correctly.

Does anyone faced the same problem? any workaround?

Thanks in advance,
Leonardo

#2Martin Gainty
mgainty@hotmail.com
In reply to: "Leonardo M. Ramé" (#1)
Re: Weird encoding behavior

Hola

use Ñ for spanish N
http://webdesign.about.com/od/localization/l/blhtmlcodes-sp.htm

Saludos Cordiales desde EEUU!
Martin
______________________________________________
Disclaimer and confidentiality note
This message is confidential and may be privileged. If you are not the intended recipient, we kindly ask you to please inform the sender. Any unauthorised dissemination or copying hereof is prohibited. This message serves for information purposes only and shall not have any legally binding effect. Given that e-mails can easily be subject to manipulation, we can not accept any liability for the content provided.

From: martinrame@yahoo.com
Subject: [GENERAL] Weird encoding behavior
Date: Wed, 25 Mar 2009 15:52:55 -0300
To: pgsql-general@postgresql.org

Hi, I'm experiencing a weird behavior when storing latin characters to a
PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
working since September 2008, it wasn't updated nor replaced since its
first installation.

The weirdness of the problem is that sometimes the characters are stored
correctly, but sometimes doesn't (allways by the same program), the
field type is Varchar(30), and for example the text "NUÑEZ" is stored as
"NU?EZ".

The data comes from an external application in an XML file (also
Latin1), then, a Delphi service parses the XML and create the
Insert/Update statements to store the data in the database. I'd try to
reproduce the bug by sending XML files with 'Ñ' to the service, but it
is stored correctly.

Also, there's a front end that allows users to see/edit the data in a
user friendlier way. Again, I checked by inserting records with 'Ñ'
using this front-end, and also are stored correctly.

Does anyone faced the same problem? any workaround?

Thanks in advance,
Leonardo

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

_________________________________________________________________
Internet Explorer 8 – Now Available. Faster, safer, easier.
http://clk.atdmt.com/MRT/go/141323790/direct/01/

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: "Leonardo M. Ramé" (#1)
Re: Weird encoding behavior

Leonardo M. Ramé wrote:

Hi, I'm experiencing a weird behavior when storing latin characters to a
PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
working since September 2008, it wasn't updated nor replaced since its
first installation.

The weirdness of the problem is that sometimes the characters are stored
correctly, but sometimes doesn't (allways by the same program), the
field type is Varchar(30), and for example the text "NUÑEZ" is stored as
"NU?EZ".

The data comes from an external application in an XML file (also
Latin1), then, a Delphi service parses the XML and create the
Insert/Update statements to store the data in the database. I'd try to
reproduce the bug by sending XML files with 'Ñ' to the service, but it
is stored correctly.

Also, there's a front end that allows users to see/edit the data in a
user friendlier way. Again, I checked by inserting records with 'Ñ'
using this front-end, and also are stored correctly.

Does anyone faced the same problem? any workaround?

Well, there *must* be one client that stores wrong data...

As a first step, can you find out the code point of the character that
is represented as "?" in your E-Mail?

Something like
SELECT ascii(substr('NU?EZ', 3, 1));
except that instead of the string literal you substitute the column
containing the bad value.

Yours,
Laurenz Albe

#4"Leonardo M. Ramé"
martinrame@yahoo.com
In reply to: Laurenz Albe (#3)
Re: Weird encoding behavior

Well, there *must* be one client that stores wrong data...

You are right, I'll ask someone in site to look at *each* client hunting
for the root of the problem. It must be a Windows Regional Settings or
something similar.

As a first step, can you find out the code point of the character that
is represented as "?" in your E-Mail?

Something like
SELECT ascii(substr('NU?EZ', 3, 1));
except that instead of the string literal you substitute the column
containing the bad value.

Yours,
Laurenz Albe

I did what you suggested, and it responds with a 63 when the string is
"NU?NEZ" and 209 when it's "NU�EZ".

Thanks,
Leonardo.

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: "Leonardo M. Ramé" (#1)
Re: Weird encoding behavior

"Leonardo M. Ram�" wrote:

Hi, I'm experiencing a weird behavior when storing latin characters to a
PostgreSQL 8.3.1.876 server. The database is Latin1 encoded, and it is
working since September 2008, it wasn't updated nor replaced since its
first installation.

The weirdness of the problem is that sometimes the characters are stored
correctly, but sometimes doesn't (allways by the same program), the
field type is Varchar(30), and for example the text "NU�EZ" is stored as
"NU?EZ".

You need to be sure that the client_encoding variable is set to whatever
the client is actually using. Some client programs may be using UTF8,
and thus they should do "SET client_encoding TO utf8" on connection
start; others may be using Win-1252, Latin1, or other encodings. If you
declare each correctly when the connection is established, the server
will do the right conversion for you automatically. If you fail to do
that you will have a mess of characters, just like you do now.

BTW it is very advisable to update to 8.3.7. It has a lot of bug fixes
since 8.3.1.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: "Leonardo M. Ramé" (#4)
Re: Weird encoding behavior

Leonardo M. Ramé wrote:

As a first step, can you find out the code point of the character that
is represented as "?" in your E-Mail?

Something like
SELECT ascii(substr('NU?EZ', 3, 1));
except that instead of the string literal you substitute the column
containing the bad value.

I did what you suggested, and it responds with a 63 when the string is
"NU?NEZ" and 209 when it's "NUÑEZ".

63 is indeed a question mark. Since such a conversion would not be
done by PostgreSQL, "something else" must convert Ñ to ?N *before*
it gets into PostgreSQL...

Yours,
Laurenz Albe

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#6)
Re: Weird encoding behavior

"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:

Leonardo M. Ram� wrote:

I did what you suggested, and it responds with a 63 when the string is
"NU?NEZ" and 209 when it's "NU�EZ".

63 is indeed a question mark. Since such a conversion would not be
done by PostgreSQL, "something else" must convert � to ?N *before*
it gets into PostgreSQL...

Yeah, I think this destroys the theory that it's due to a wrong choice
of client_encoding setting. What you'd be likely to get from that is
a "character can't be translated" kind of error, not silent substitution
of a question mark. The damage must be getting done on the client side.

regards, tom lane