C locale + unicode

Started by John Sidney-Woollettover 21 years ago5 messagesgeneral
Jump to latest
#1John Sidney-Woollett
johnsw@wardbrook.com

Does anyone know if it's permitted to use the 'C' locale with a UNICODE
encoded database in 7.4.6? And will it work correctly?

Or do you have to use a en_XX.utf8 locale if you want to use unicode
encoding for your databases?

John Sidney-Woollett

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Sidney-Woollett (#1)
Re: C locale + unicode

John Sidney-Woollett <johnsw@wardbrook.com> writes:

Does anyone know if it's permitted to use the 'C' locale with a UNICODE
encoded database in 7.4.6?

Yes.

And will it work correctly?

For suitably small values of "correctly", sure. Textual sort ordering
would be by byte values, which might be a bit unintuitive for Unicode
characters. And I don't think upper()/lower() would work very nicely
for characters outside the basic ASCII set. But AFAIR those are the
only gotchas. People in the Far East, who tend not to care about either
of those points, use 'C' locale with various multibyte character sets
all the time.

regards, tom lane

#3John Sidney-Woollett
johnsw@wardbrook.com
In reply to: Tom Lane (#2)
Re: C locale + unicode

Tom, thanks for the info.

Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded
databases? (They don't seem to work on chars > standard ascii on my
7.4.6 db). Is this locale or encoding specific issue?

Is there likely to be a significant difference in speed between a
database using a UTF-8 locale and the C locale (if you don't care about
the small issues you detailed below)?

Thanks.

John Sidney-Woollett

Tom Lane wrote:

Show quoted text

John Sidney-Woollett <johnsw@wardbrook.com> writes:

Does anyone know if it's permitted to use the 'C' locale with a UNICODE
encoded database in 7.4.6?

Yes.

And will it work correctly?

For suitably small values of "correctly", sure. Textual sort ordering
would be by byte values, which might be a bit unintuitive for Unicode
characters. And I don't think upper()/lower() would work very nicely
for characters outside the basic ASCII set. But AFAIR those are the
only gotchas. People in the Far East, who tend not to care about either
of those points, use 'C' locale with various multibyte character sets
all the time.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Sidney-Woollett (#3)
Re: C locale + unicode

John Sidney-Woollett <johnsw@wardbrook.com> writes:

Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded
databases? (They don't seem to work on chars > standard ascii on my
7.4.6 db). Is this locale or encoding specific issue?

Before 8.0, they don't work on multibyte characters, period. In 8.0
they work according to your locale setting.

Is there likely to be a significant difference in speed between a
database using a UTF-8 locale and the C locale (if you don't care about
the small issues you detailed below)?

I'd expect the C locale to be materially faster for text sorting.
Don't have a number offhand.

regards, tom lane

#5John Sidney-Woollett
johnsw@wardbrook.com
In reply to: Tom Lane (#4)
Re: C locale + unicode

Thanks for the info - to the point and much appreciated!

John Sidney-Woollett

Tom Lane wrote:

Show quoted text

John Sidney-Woollett <johnsw@wardbrook.com> writes:

Do upper() and lower() only work correctly for postgres v8 UTF-8 encoded
databases? (They don't seem to work on chars > standard ascii on my
7.4.6 db). Is this locale or encoding specific issue?

Before 8.0, they don't work on multibyte characters, period. In 8.0
they work according to your locale setting.

Is there likely to be a significant difference in speed between a
database using a UTF-8 locale and the C locale (if you don't care about
the small issues you detailed below)?

I'd expect the C locale to be materially faster for text sorting.
Don't have a number offhand.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)