ISO-8859-1 encoding not enforced?

Started by Christopher Kings-Lynnealmost 21 years ago6 messages
#1Christopher Kings-Lynne
chriskl@familyhealth.com.au

Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's
the database encoding?

Because people using this database can happily insert any old non-LATIN1
junk into the database, then when I export as XML, all XML validation
fails because the encoding is not correct.

If this is not expected behaviour, I will submit an example script
showing the problem...

Chris

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#1)
Re: ISO-8859-1 encoding not enforced?

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's
the database encoding?

AFAIK, there are no illegal characters in 8859-1, except \0 which we
do reject.

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: ISO-8859-1 encoding not enforced?

Tom Lane said:

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if
that's the database encoding?

AFAIK, there are no illegal characters in 8859-1, except \0 which we do
reject.

Perhaps Chris is confusing ISO/IEC 8859-1 with ISO-8859-1 a.k.a. Latin-1.

According to the wikipedia,

"The IANA has approved ISO-8859-1 (note the extra hyphen), a superset of
ISO/IEC 8859-1, for use on the Internet. This character map, or character
set or code page, supplements the assignments made by ISO/IEC 8859-1,
mapping control characters to code values 00-1F, 7F, and 80-9F. It thus
provides for 256 characters via every possible 8-bit value.
[snip]
The name Latin-1 is an informal alias [for ISO-8859-1] unrecognized by ISO
or the IANA, but is perhaps meaningful in some computer software."

But let's not start accepting \0 ;-)

cheers

andrew

#4Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#2)
Re: ISO-8859-1 encoding not enforced?

Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's
the database encoding?

AFAIK, there are no illegal characters in 8859-1, except \0 which we
do reject.

Hmmm...

It turns out I was confused by the developer who reported this issue.
Basically they have a requirement that they only want the parts of
LATIN1 that can be converted to single byte UTF8 (ie. 7bit ascii).

Only about 8 of these high bit characters existed in our database, so I
replaced them and put in a CHECK constraint on a few fields like this:

CHECK (description = convert(description, 'ISO-8859-1', 'UTF-8'))

Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :)

Chris

#5Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Christopher Kings-Lynne (#4)
Re: ISO-8859-1 encoding not enforced?

On Wed, Apr 13, 2005 at 10:10:32AM +0800, Christopher Kings-Lynne wrote:

Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :)

Given all the problems with unwanted recoding I've seen, I think such an
encoding should be the default instead of unchecked-8-bits SQL_ASCII :-(

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Amanece. (Ignacio Reyes)
El Cerro San Crist�bal me mira, c�nicamente, con ojos de virgen"

#6Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Alvaro Herrera (#5)
Re: ISO-8859-1 encoding not enforced?

Given all the problems with unwanted recoding I've seen, I think such an
encoding should be the default instead of unchecked-8-bits SQL_ASCII :-(

I agree, but that would be a nightmare of backwards compaitibility :D

Chris