ISO-8859-1 encoding not enforced?
Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's
the database encoding?
Because people using this database can happily insert any old non-LATIN1
junk into the database, then when I export as XML, all XML validation
fails because the encoding is not correct.
If this is not expected behaviour, I will submit an example script
showing the problem...
Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's
the database encoding?
AFAIK, there are no illegal characters in 8859-1, except \0 which we
do reject.
regards, tom lane
Tom Lane said:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if
that's the database encoding?AFAIK, there are no illegal characters in 8859-1, except \0 which we do
reject.
Perhaps Chris is confusing ISO/IEC 8859-1 with ISO-8859-1 a.k.a. Latin-1.
According to the wikipedia,
"The IANA has approved ISO-8859-1 (note the extra hyphen), a superset of
ISO/IEC 8859-1, for use on the Internet. This character map, or character
set or code page, supplements the assignments made by ISO/IEC 8859-1,
mapping control characters to code values 00-1F, 7F, and 80-9F. It thus
provides for 256 characters via every possible 8-bit value.
[snip]
The name Latin-1 is an informal alias [for ISO-8859-1] unrecognized by ISO
or the IANA, but is perhaps meaningful in some computer software."
But let's not start accepting \0 ;-)
cheers
andrew
Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's
the database encoding?AFAIK, there are no illegal characters in 8859-1, except \0 which we
do reject.
Hmmm...
It turns out I was confused by the developer who reported this issue.
Basically they have a requirement that they only want the parts of
LATIN1 that can be converted to single byte UTF8 (ie. 7bit ascii).
Only about 8 of these high bit characters existed in our database, so I
replaced them and put in a CHECK constraint on a few fields like this:
CHECK (description = convert(description, 'ISO-8859-1', 'UTF-8'))
Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :)
Chris
On Wed, Apr 13, 2005 at 10:10:32AM +0800, Christopher Kings-Lynne wrote:
Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :)
Given all the problems with unwanted recoding I've seen, I think such an
encoding should be the default instead of unchecked-8-bits SQL_ASCII :-(
--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Amanece. (Ignacio Reyes)
El Cerro San Crist�bal me mira, c�nicamente, con ojos de virgen"