[9.1beta1] UTF-8/Regex Word-Character Definition excluding accented letters
PostgreSQL 9.1beta1, compiled by Visual C++ build 1500, 64-bit (EnterpriseDB
Install Executable)
CREATE DATABASE betatest
TEMPLATE template0
ENCODING 'UTF8'
LC_COLLATE 'C'
LC_CTYPE 'C';
[connect to database]
CREATE DOMAIN idcode AS text
NOT NULL CHECK (VALUE ~* '^\w[-:\w]*$')
;
SELECT 'AAAAAéaaaaa'::idcode; // -> SQL Error: ERROR: value for domain
idcode violates check constraint "idcode_check" (note the accented e
between all the As)
This is running just fine against a 9.0 install on the same machine. [\w]
is Unicode aware and server encoding is set (and confirmed via SHOW) to be
UTF8.
David J.
"David Johnston" <polobo@yahoo.com> writes:
PostgreSQL 9.1beta1, compiled by Visual C++ build 1500, 64-bit (EnterpriseDB
Install Executable)
CREATE DATABASE betatest
TEMPLATE template0
ENCODING 'UTF8'
LC_COLLATE 'C'
LC_CTYPE 'C';
CREATE DOMAIN idcode AS text
NOT NULL CHECK (VALUE ~* '^\w[-:\w]*$')
;
SELECT 'AAAAA�aaaaa'::idcode; // -> SQL Error: ERROR: value for domain
idcode violates check constraint "idcode_check" (note the accented �e�
between all the �A�s)
AFAICS that's correct behavior. C locale should not think that � is
a letter.
This is running just fine against a 9.0 install on the same machine.
We made some strides towards getting locale-sensitive stuff to work as
it "should" in 9.1. In particular, platform-specific creative
interpretations of what C locale means shouldn't happen anymore ...
regards, tom lane
Got it. Changing LC_CTYPE to " English_United States.1252" restores the
correct behavior.
Thanks.
David J.
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, May 30, 2011 10:40 PM
To: David Johnston
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] [9.1beta1] UTF-8/Regex Word-Character Definition
excluding accented letters"David Johnston" <polobo@yahoo.com> writes:
PostgreSQL 9.1beta1, compiled by Visual C++ build 1500, 64-bit
(EnterpriseDB Install Executable)CREATE DATABASE betatest
TEMPLATE template0
ENCODING 'UTF8'
LC_COLLATE 'C'
LC_CTYPE 'C';CREATE DOMAIN idcode AS text
NOT NULL CHECK (VALUE ~* '^\w[-:\w]*$') ;SELECT 'AAAAAéaaaaa'::idcode; // -> SQL Error: ERROR: value for
domain idcode violates check constraint "idcode_check" (note theaccented e
between all the As)
AFAICS that's correct behavior. C locale should not think that é is a
letter.
This is running just fine against a 9.0 install on the same machine.
We made some strides towards getting locale-sensitive stuff to work as it
"should" in 9.1. In particular, platform-specific creative
interpretations of
Show quoted text
what C locale means shouldn't happen anymore ...
regards, tom lane