Default PostgreSQL server encoding - Change to unicode (utf8)

Started by Léa Massiotabout 14 years ago4 messagesgeneral
Jump to latest
#1Léa Massiot
lmhelp1@orange.fr

Hello,

Thank you for reading my post.

When I run the command:

I get the following messages:

I would like the cluster (and the databases) encoding to be unicode (UTF8).

What can I do?
Can I set the default encoding I want for the whole PostgreSQL server
somewhere?

Thank you for helping and best regards.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5505985.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Léa Massiot (#1)
Re: Default PostgreSQL server encoding - Change to unicode (utf8)

On 02/22/2012 11:20 AM, L�a Massiot wrote:

Hello,

Thank you for reading my post.

When I run the command:

I get the following messages:

The messages ?

I would like the cluster (and the databases) encoding to be unicode (UTF8).

What can I do?
Can I set the default encoding I want for the whole PostgreSQL server
somewhere?

A good place to start for your options is:
http://www.postgresql.org/docs/9.0/interactive/locale.html
http://www.postgresql.org/docs/9.0/interactive/multibyte.html

Thank you for helping and best regards.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5505985.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Adrian Klaver
adrian.klaver@gmail.com

#3Léa Massiot
lmhelp1@orange.fr
In reply to: Adrian Klaver (#2)
Re: Default PostgreSQL server encoding - Change to unicode (utf8)

Hello.
Thank you for your answer.
I used the <raw> and </raw> tags, this is probably the reason
why you couldn't see the messages...
Thank you for the two links.
I read this (in the second one): "On Windows, however, UTF-8 encoding can be
used with any locale." yet I still have some questions...

On Unix (Debian GNU Linux Squeeze):

=========================================================================================
psql_cmd> \l

----------+----------+----------+-------------+------------
Name | Owner | Encoding | Collation | Ctype
----------+----------+----------+-------------+------------
template1 | postgres | UTF8 | en_us.UTF-8 | en_us.UTF-8

=========================================================================================

On Windows (XP):

=========================================================================================
psql_cmd> \l

----------+----------+----------+----------------------------+---------------------------
Name | Owner | Encoding | Collation | Ctype

----------+----------+----------+----------------------------+---------------------------
template1 | postgres | UTF8 | English_United States.1252 |
English_United States.1252

=========================================================================================

Question 1
Focusing on the "Collation" and "Ctype" columns,
has "English_United States.1252" something to do with "Windows-1252"
("CP-1252")?
"CP-1252" is an 8 bits character encoding (so, it can map codes to 2^8
characters at most).
How compatible is this with an "UTF8" "Encoding"?
For people testing PostgreSQL under Windows, is there any other more
appropriate "Collation" that could be used to set a database collation?
There is no "locale -a" command avaiblable under Windows. Is there any
workaround?

Question 2
Suppose I have a PostgreSQL table which has a VARCHAR column "text".
Suppose I want to insert the string "Li 李" which contains the Chinese
ideograph 李.
How can I do this with an "INSERT INTO" command?
I wish I could do something like:
INSERT INTO t (text) VALUES ('Li U+674E')
or
INSERT INTO t (text) VALUES ('Li \u674E')
How can I do this?

Thanks and best regards.
--
Léa

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Default-PostgreSQL-server-encoding-Change-to-unicode-utf8-tp5505985p5518720.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Léa Massiot (#3)
Re: Default PostgreSQL server encoding - Change to unicode (utf8)

On Monday, February 27, 2012 3:55:43 am Léa Massiot wrote:

Hello.
Thank you for your answer.

Thank you for the two links.
I read this (in the second one): "On Windows, however, UTF-8 encoding can
be used with any locale." yet I still have some questions...

Question 1
Focusing on the "Collation" and "Ctype" columns,
has "English_United States.1252" something to do with "Windows-1252"
("CP-1252")?
"CP-1252" is an 8 bits character encoding (so, it can map codes to 2^8
characters at most).
How compatible is this with an "UTF8" "Encoding"?
For people testing PostgreSQL under Windows, is there any other more
appropriate "Collation" that could be used to set a database collation?

This is answered in the first link I sent:

http://www.postgresql.org/docs/9.0/interactive/locale.html

" Windows uses more verbose locale names, such as German_Germany or Swedish_Sweden.1252,
but the principles are the same."

"
LC_COLLATE String sort order
LC_CTYPE Character classification (What is a letter? Its upper-case equivalent?
"

So appropriate depends on what sorting character rules you want to follow. By the way
both of these are fixed at database creation and cannot be changed.

There is no "locale -a" command avaiblable under Windows. Is there any
workaround?

A little Googling found this. I am not a regular Windows user, so there may be
better options out there:

http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/systeminfo.mspx?mfr=true

Thanks and best regards.
--
Léa

--
Adrian Klaver
adrian.klaver@gmail.com