Encoding problem with 7.4

Started by E.Rodichevabout 22 years ago20 messages
#1E.Rodichev
er@sai.msu.su

Hi,

I just noticed some incorrect behaviour for postgresql-7.4 related
to locale.

After installing 7.4 I created database completely from scratch
with cyrillic locale:

su postgres
export LC_CTYPE=ru_RU.KOI8-R
export LC_COLLATE=ru_RU.KOI8-R
/usr/local/pgsql/bin/initdb -D /db2/pgdata
/usr/local/pgsql/bin/createuser -d er

Then I switch off to my normal account. At this point I have:

/e:1>psql -l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
template0 | postgres | SQL_ASCII
template1 | postgres | SQL_ASCII
(2 rows)

Then I created new db:

/e:2>createdb test
CREATE DATABASE
/e:3>psql -l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
template0 | postgres | SQL_ASCII
template1 | postgres | SQL_ASCII
test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.
DB test is really in ru_RU.KOI8-R, not ASCII. I can create tables
with ascii characters, and with non-ascii (cyrillic) as well,
and order by, select upper, etc. works in ru_RU.KOI8-R locale.

After first initdb it doesn't affected by my LC_CTYPE and LC_COLLATE
settings. I may set

export LC_CTYPE=ru_RU.KOI8-R
export LC_COLLATE=ru_RU.KOI8-R

or

export LC_CTYPE=C
export LC_COLLATE=C

but order by and select upper works really in cyrillic locale.

As I may see, there are two points here:

1. Reporting Encoding as SQL_ASCII is incorrect - all db are in KOI8,
not in SQL_ASCII;

2. More generally, such kind of fixed locale behaviour is not very
convenient. More natural way looks as follows: the user got
a db encoding as it specified at the moment createdb is issued.
By this way it will be possible to have different databases with
different encodings.

Best regards,
Evgeny Rodichev

_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er

#2Peter Eisentraut
peter_e@gmx.net
In reply to: E.Rodichev (#1)
Re: Encoding problem with 7.4

E.Rodichev writes:

I just noticed some incorrect behaviour for postgresql-7.4 related
to locale.

Maybe you should first read the documentation to understand how it
actually works.

--
Peter Eisentraut peter_e@gmx.net

In reply to: E.Rodichev (#1)
Re: Encoding problem with 7.4

Le Jeudi 27 Novembre 2003 20:56, E.Rodichev a écrit :

After installing 7.4 I created database completely from scratch
with cyrillic locale:

Dear Evgeny,

If you want to go 'fast', do not hesitate to install pgAdmin3 GUI from
http://www.pgadmin.org. We will be able to create and manage a database in
KOI8 enconding. You can choose an UTF-8 encoding as well.

pgAdmin3 displays the needed SQL. Therefore you can learn the PostgreSQL/SQL99
syntax quite fast. Also, we provide the full PostgreSQL documentation.

Cheers,
Jean-Michel

#4Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: E.Rodichev (#1)
Re: Encoding problem with 7.4

After installing 7.4 I created database completely from scratch
with cyrillic locale:

su postgres
export LC_CTYPE=ru_RU.KOI8-R
export LC_COLLATE=ru_RU.KOI8-R
/usr/local/pgsql/bin/initdb -D /db2/pgdata

You need to go:

/usr/local/pgsql/bin/initdb -D /db2/pgdata -E KOI8

To set the default encoding to KOI8.

Then I switch off to my normal account. At this point I have:

/e:1>psql -l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
template0 | postgres | SQL_ASCII
template1 | postgres | SQL_ASCII
(2 rows)

Locale and encoding are two quite different things.

Chris

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: E.Rodichev (#1)
Re: Encoding problem with 7.4

"E.Rodichev" <er@sai.msu.su> writes:

/e:2>createdb test

test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.

What's incorrect about it? You didn't ask for any other encoding
than SQL_ASCII.

You can set the default encoding at initdb time, IIRC, but you didn't.

regards, tom lane

#6E.Rodichev
er@sai.msu.su
In reply to: Tom Lane (#5)
Re: Encoding problem with 7.4

On Fri, 28 Nov 2003, Tom Lane wrote:

"E.Rodichev" <er@sai.msu.su> writes:

/e:2>createdb test

test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.

What's incorrect about it? You didn't ask for any other encoding
than SQL_ASCII.

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

Best wishes,
E.R.

You can set the default encoding at initdb time, IIRC, but you didn't.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er

#7Peter Eisentraut
peter_e@gmx.net
In reply to: E.Rodichev (#6)
Re: Encoding problem with 7.4

E.Rodichev writes:

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

The encoding is only a declaration of your intentions. What you actually
put into the database is your responsibility.

--
Peter Eisentraut peter_e@gmx.net

#8Andrew Dunstan
andrew@dunslane.net
In reply to: E.Rodichev (#6)
Re: Encoding problem with 7.4

E.Rodichev wrote:

On Fri, 28 Nov 2003, Tom Lane wrote:

"E.Rodichev" <er@sai.msu.su> writes:

/e:2>createdb test

test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.

What's incorrect about it? You didn't ask for any other encoding
than SQL_ASCII.

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

Best wishes,
E.R.

You can set the default encoding at initdb time, IIRC, but you didn't.

You can set the default at initdb time, or per database at createdb
time, but it has to be done explicitly. You seem to think it should be
picked up from the environment, but this isn't so, you must use the
-E|--encoding flag on either createdb or initdb, or if creating directly
from SQL use the ENCODING option on "create database" to use something
other than the default set by initdb.

examples:

[andrew@Thor bin]$ ./initdb /tmp/enctry
The files belonging to this database system will be owned by user "andrew".
This user must also own the server process.

The database cluster will be initialized with locales:
COLLATE: ru_RU.KOI8-R
CTYPE: ru_RU.KOI8-R
MESSAGES: en_US.iso885915
MONETARY: en_US.iso885915
NUMERIC: en_US.iso885915
TIME: en_US.iso885915

creating directory /tmp/enctry... ok
creating directory /tmp/enctry/base... ok
creating directory /tmp/enctry/global... ok
creating directory /tmp/enctry/pg_xlog... ok
creating directory /tmp/enctry/pg_clog... ok
selecting default max_connections... 100
selecting default shared_buffers... 1000
creating configuration files... ok
creating template1 database in /tmp/enctry/base/1... ok
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... ok
setting privileges on built-in objects... ok
creating information schema... ok
vacuuming database template1... ok
copying template1 to template0... ok

Success. You can now start the database server using:

./postmaster -D /tmp/enctry
or
./pg_ctl -D /tmp/enctry -l logfile start

[andrew@Thor bin]$ ./pg_ctl -D /tmp/enctry -l /tmp/enclog -o '-p 5433' start
postmaster successfully started
[andrew@Thor bin]$ ./createdb -E KOI8-R -p 5433 testme
CREATE DATABASE
[andrew@Thor bin]$ ./psql -p 5433 -l
List of databases
Name | Owner | Encoding
-----------+--------+-----------
template0 | andrew | SQL_ASCII
template1 | andrew | SQL_ASCII
testme | andrew | KOI8
(3 rows)

[andrew@Thor bin]$ ./pg_ctl -D /tmp/enctry -o '-p 5433' stop
waiting for postmaster to shut down......done
postmaster successfully shut down
[andrew@Thor bin]$ rm -rf /tmp/enctry
[andrew@Thor bin]$ ./initdb -E KOI8-R /tmp/enctry
The files belonging to this database system will be owned by user "andrew".
This user must also own the server process.

The database cluster will be initialized with locales:
COLLATE: ru_RU.KOI8-R
CTYPE: ru_RU.KOI8-R
MESSAGES: en_US.iso885915
MONETARY: en_US.iso885915
NUMERIC: en_US.iso885915
TIME: en_US.iso885915

creating directory /tmp/enctry... ok
creating directory /tmp/enctry/base... ok
creating directory /tmp/enctry/global... ok
creating directory /tmp/enctry/pg_xlog... ok
creating directory /tmp/enctry/pg_clog... ok
selecting default max_connections... 100
selecting default shared_buffers... 1000
creating configuration files... ok
creating template1 database in /tmp/enctry/base/1... ok
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... ok
setting privileges on built-in objects... ok
creating information schema... ok
vacuuming database template1... ok
copying template1 to template0... ok

Success. You can now start the database server using:

./postmaster -D /tmp/enctry
or
./pg_ctl -D /tmp/enctry -l logfile start

[andrew@Thor bin]$ ./pg_ctl -D /tmp/enctry -l /tmp/enclog -o '-p 5433' start
postmaster successfully started
[andrew@Thor bin]$ ./createdb -p 5433 testme
CREATE DATABASE
[andrew@Thor bin]$ ./psql -p 5433 -l
List of databases
Name | Owner | Encoding
-----------+--------+----------
template0 | andrew | KOI8
template1 | andrew | KOI8
testme | andrew | KOI8
(3 rows)

[andrew@Thor bin]$

cheers

andrew

#9Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: E.Rodichev (#6)
Re: Encoding problem with 7.4

On Wed, 3 Dec 2003, E.Rodichev wrote:

On Fri, 28 Nov 2003, Tom Lane wrote:

"E.Rodichev" <er@sai.msu.su> writes:

/e:2>createdb test

test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.

What's incorrect about it? You didn't ask for any other encoding
than SQL_ASCII.

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
since you didn't override the default encoding at initdb time or at
createdb time. You did choose LC_ values that seem to want KOI8, but
locale and encoding are separate, if you want KOI8 encoding, you have to
say so.

#10E.Rodichev
er@sai.msu.su
In reply to: Stephan Szabo (#9)
Re: Encoding problem with 7.4

On Wed, 3 Dec 2003, Stephan Szabo wrote:

On Wed, 3 Dec 2003, E.Rodichev wrote:

On Fri, 28 Nov 2003, Tom Lane wrote:

"E.Rodichev" <er@sai.msu.su> writes:

/e:2>createdb test

test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.

What's incorrect about it? You didn't ask for any other encoding
than SQL_ASCII.

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
since you didn't override the default encoding at initdb time or at
createdb time. You did choose LC_ values that seem to want KOI8, but
locale and encoding are separate, if you want KOI8 encoding, you have to
say so.

Yes, it is!

If db "test" is SQL_ASCII, AND all LC_* env are set to "C", the sorting of
ASCII characters is, for example,

a
A
b
B
c
C

not

A
B
C
a
b
c

(the first order is true for ru_RU.KOI8-R, the latter one - for C).

To summarize shortly:

- initdb _without_ -E flag, but with ru_RU.KOI8-R environment;
- createdb with any environment;
- psql indicates SQL_ASCII;
- sorting and upper/lowercasing are in ru_RU.KOI8-R, even with LC_*
environment is set to "C".

Where is the logic?

Best wishes,
E.R.

#11Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: E.Rodichev (#10)
Re: Encoding problem with 7.4

On Wed, Dec 03, 2003 at 11:42:34PM +0300, E.Rodichev wrote:

On Wed, 3 Dec 2003, Stephan Szabo wrote:

No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
since you didn't override the default encoding at initdb time or at
createdb time. You did choose LC_ values that seem to want KOI8, but
locale and encoding are separate, if you want KOI8 encoding, you have to
say so.

Yes, it is!

What apparently you haven't picked up yet is that the _locale_ is a
different and unrelated configuration setting from the _encoding_.
Sort order is locale related; you already got that one right. Now you
need to go after the encoding.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

#12Andrew Dunstan
andrew@dunslane.net
In reply to: E.Rodichev (#10)
Re: Encoding problem with 7.4

E.Rodichev wrote:

On Wed, 3 Dec 2003, Stephan Szabo wrote:

On Wed, 3 Dec 2003, E.Rodichev wrote:

On Fri, 28 Nov 2003, Tom Lane wrote:

"E.Rodichev" <er@sai.msu.su> writes:

/e:2>createdb test

test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.

What's incorrect about it? You didn't ask for any other encoding
than SQL_ASCII.

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
since you didn't override the default encoding at initdb time or at
createdb time. You did choose LC_ values that seem to want KOI8, but
locale and encoding are separate, if you want KOI8 encoding, you have to
say so.

Yes, it is!

If db "test" is SQL_ASCII, AND all LC_* env are set to "C", the sorting of
ASCII characters is, for example,

a
A
b
B
c
C

not

A
B
C
a
b
c

(the first order is true for ru_RU.KOI8-R, the latter one - for C).

To summarize shortly:

- initdb _without_ -E flag, but with ru_RU.KOI8-R environment;
- createdb with any environment;
- psql indicates SQL_ASCII;
- sorting and upper/lowercasing are in ru_RU.KOI8-R, even with LC_*
environment is set to "C".

Where is the logic?

Encoding and collation order are two different things. LC_* settings
have no effect on encoding.

see http://www.postgresql.org/docs/current/static/charset.html

cheers

andrew

#13Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: E.Rodichev (#10)
Re: Encoding problem with 7.4

On Wed, 3 Dec 2003, E.Rodichev wrote:

On Wed, 3 Dec 2003, Stephan Szabo wrote:

On Wed, 3 Dec 2003, E.Rodichev wrote:

On Fri, 28 Nov 2003, Tom Lane wrote:

"E.Rodichev" <er@sai.msu.su> writes:

/e:2>createdb test

test | er | SQL_ASCII <----- Incorrect!
(3 rows)

Let's note than the last line is in fact completely incorrect.

What's incorrect about it? You didn't ask for any other encoding
than SQL_ASCII.

It is incorrect, because database "test" is, really, in KOI8, NOT in SQL_ASCII
in this example, as I explained in my mail.

No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
since you didn't override the default encoding at initdb time or at
createdb time. You did choose LC_ values that seem to want KOI8, but
locale and encoding are separate, if you want KOI8 encoding, you have to
say so.

Yes, it is!

*sigh*

(the first order is true for ru_RU.KOI8-R, the latter one - for C).

To summarize shortly:

- initdb _without_ -E flag, but with ru_RU.KOI8-R environment;
- createdb with any environment;
- psql indicates SQL_ASCII;
- sorting and upper/lowercasing are in ru_RU.KOI8-R, even with LC_*
environment is set to "C".

Only the locale settings at initdb time matter. Changing the LC_* later
is not going to change what the database does. Encoding and locale are
separate (but related) and it is your responsibility to make sure the
choices are consistent. If you do not specify an encoding, SQL_ASCII is
used for the encoding. If the characters happen to line up appropriately
for what your ru_RU.KOI8-R locale expects it'll even happen to appear to
work for sorting and case changes (and things like isprint). Which part of
this are you not understanding?

#14E.Rodichev
er@sai.msu.su
In reply to: Alvaro Herrera (#11)
Re: Encoding problem with 7.4

On Wed, 3 Dec 2003, Alvaro Herrera wrote:

On Wed, Dec 03, 2003 at 11:42:34PM +0300, E.Rodichev wrote:

On Wed, 3 Dec 2003, Stephan Szabo wrote:

No, it isn't. As far as PostgreSQL is concerned the database is SQL_ASCII
since you didn't override the default encoding at initdb time or at
createdb time. You did choose LC_ values that seem to want KOI8, but
locale and encoding are separate, if you want KOI8 encoding, you have to
say so.

Yes, it is!

What apparently you haven't picked up yet is that the _locale_ is a
different and unrelated configuration setting from the _encoding_.
Sort order is locale related; you already got that one right. Now you

Sorry, I got it WRONG!

Sort order for C locale MUST be the abcABC, not aAbBcC.

But I got aAbBcC.

Best wishes,
E.R.

need to go after the encoding.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El destino baraja y nosotros jugamos" (A. Schopenhauer)

_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er

#15E.Rodichev
er@sai.msu.su
In reply to: Andrew Dunstan (#12)
Re: Encoding problem with 7.4

On Wed, 3 Dec 2003, Andrew Dunstan wrote:

Encoding and collation order are two different things. LC_* settings
have no effect on encoding.

see http://www.postgresql.org/docs/current/static/charset.html

I am trying to point out to reverse dependency:

encoding (1) has effect on LC_* settings and (2) the indication of
encoding is incorrect.

Is it right?

Regards,
E.R.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er

#16E.Rodichev
er@sai.msu.su
In reply to: Stephan Szabo (#13)
Re: Encoding problem with 7.4

On Wed, 3 Dec 2003, Stephan Szabo wrote:

Only the locale settings at initdb time matter. Changing the LC_* later
is not going to change what the database does. Encoding and locale are
separate (but related) and it is your responsibility to make sure the
choices are consistent. If you do not specify an encoding, SQL_ASCII is
used for the encoding. If the characters happen to line up appropriately
for what your ru_RU.KOI8-R locale expects it'll even happen to appear to
work for sorting and case changes (and things like isprint). Which part of
this are you not understanding?

Thank you, it is much more consistent answer. But again, the things are
going not exactly the way you wrote.

From your opinion the chain is

data -> encoding transform -> locale transform -> output

It looks clean and reasonable.

Encoding transform may be set during initdb or createdb (is it true?)

But when locale transform is defined? In general unix flavor it should
depend on LC_* setting (is it true?)

As I described in my first posting the situation is different. Namely,
locale setting now defines _encoding transform_ (and data representation
in storage), but _locale transform_ doesnt depend on LC_*.

Best wishes,
E.R.

_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er

#17Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: E.Rodichev (#16)
Re: Encoding problem with 7.4

On Thu, 4 Dec 2003, E.Rodichev wrote:

On Wed, 3 Dec 2003, Stephan Szabo wrote:

Only the locale settings at initdb time matter. Changing the LC_* later
is not going to change what the database does. Encoding and locale are
separate (but related) and it is your responsibility to make sure the
choices are consistent. If you do not specify an encoding, SQL_ASCII is
used for the encoding. If the characters happen to line up appropriately
for what your ru_RU.KOI8-R locale expects it'll even happen to appear to
work for sorting and case changes (and things like isprint). Which part of
this are you not understanding?

Thank you, it is much more consistent answer. But again, the things are
going not exactly the way you wrote.

From your opinion the chain is

data -> encoding transform -> locale transform -> output

It looks clean and reasonable.

Encoding transform may be set during initdb or createdb (is it true?)

But when locale transform is defined? In general unix flavor it should
depend on LC_* setting (is it true?)

As I described in my first posting the situation is different. Namely,
locale setting now defines _encoding transform_ (and data representation
in storage), but _locale transform_ doesnt depend on LC_*.

The locale settings depend on LC_* at initdb time only. When the
postmaster starts it sets the locale based on the stored values from
initdb, not on the current environment.

With an SQL_ASCII database being accessed from a client with
client_encoding set to SQL_ASCII (which it should be if you aren't setting
it) the byte values of a string are passed along with no conversion for
the encoding. This means that from within one environment you should get
back what you put in, so it might *look* like it's KOI8-R if that's what
you're in, but it's not because someone accessing it from say an ISO8859-1
system may see something different.

#18E.Rodichev
er@sai.msu.su
In reply to: Stephan Szabo (#17)
Re: Encoding problem with 7.4

On Wed, 3 Dec 2003, Stephan Szabo wrote:

The locale settings depend on LC_* at initdb time only. When the
postmaster starts it sets the locale based on the stored values from
initdb, not on the current environment.

With an SQL_ASCII database being accessed from a client with
client_encoding set to SQL_ASCII (which it should be if you aren't setting
it) the byte values of a string are passed along with no conversion for
the encoding. This means that from within one environment you should get
back what you put in, so it might *look* like it's KOI8-R if that's what
you're in, but it's not because someone accessing it from say an ISO8859-1
system may see something different.

As a result, the possibility to control encodings and locales looks as
follows:

initdb createdb psql
Encoding: Y Y Y
Locale: Y N N

It seems that more natural scheme will be

initdb createdb psql
Encoding: Y Y Y
Locale: Y Y Y

Now the possibility to use different encodings for createdb and psql is
a bit strange... Also, it is impossible to have different locales
for different databases within one cluster, and it is impossible to use
different locales with one database. The latter is even more critical.
The reason is that the sorting under C locale is much more effective compared with
one under another locales (10-50 times faster for some implementations!).
Another reason is that for some applications it is _necessary_ to use different
sort order for different tables. For example, I may have two tables:
russian_persons and forein_persons, and i'd like to print the sorted list
of persons. The russian_persons names must be sorted with ru_RU.KOI8-R locale,
and the forein_persons - with C locale.

Best wishes,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er

#19Andrew Dunstan
andrew@dunslane.net
In reply to: E.Rodichev (#18)
Re: Encoding problem with 7.4

E.Rodichev wrote:

On Wed, 3 Dec 2003, Stephan Szabo wrote:

The locale settings depend on LC_* at initdb time only. When the
postmaster starts it sets the locale based on the stored values from
initdb, not on the current environment.

With an SQL_ASCII database being accessed from a client with
client_encoding set to SQL_ASCII (which it should be if you aren't setting
it) the byte values of a string are passed along with no conversion for
the encoding. This means that from within one environment you should get
back what you put in, so it might *look* like it's KOI8-R if that's what
you're in, but it's not because someone accessing it from say an ISO8859-1
system may see something different.

As a result, the possibility to control encodings and locales looks as
follows:

initdb createdb psql
Encoding: Y Y Y
Locale: Y N N

It seems that more natural scheme will be

initdb createdb psql
Encoding: Y Y Y
Locale: Y Y Y

Now the possibility to use different encodings for createdb and psql is
a bit strange... Also, it is impossible to have different locales
for different databases within one cluster, and it is impossible to use
different locales with one database. The latter is even more critical.
The reason is that the sorting under C locale is much more effective compared with
one under another locales (10-50 times faster for some implementations!).
Another reason is that for some applications it is _necessary_ to use different
sort order for different tables. For example, I may have two tables:
russian_persons and forein_persons, and i'd like to print the sorted list
of persons. The russian_persons names must be sorted with ru_RU.KOI8-R locale,
and the forein_persons - with C locale.

see Multi-Language Support section on TODO list at
http://developer.postgresql.org/todo.php - note that this specifies
per-column locales rather than per-table, which should be even more useful.

Most of these items have no names against them, meaning you could work
on them ...

cheers

andrew

#20Stephan Szabo
sszabo@megazone.bigpanda.com
In reply to: E.Rodichev (#18)
Re: Encoding problem with 7.4

On Thu, 4 Dec 2003, E.Rodichev wrote:

On Wed, 3 Dec 2003, Stephan Szabo wrote:

The locale settings depend on LC_* at initdb time only. When the
postmaster starts it sets the locale based on the stored values from
initdb, not on the current environment.

With an SQL_ASCII database being accessed from a client with
client_encoding set to SQL_ASCII (which it should be if you aren't setting
it) the byte values of a string are passed along with no conversion for
the encoding. This means that from within one environment you should get
back what you put in, so it might *look* like it's KOI8-R if that's what
you're in, but it's not because someone accessing it from say an ISO8859-1
system may see something different.

As a result, the possibility to control encodings and locales looks as
follows:

initdb createdb psql
Encoding: Y Y Y

As a note you can change the *client* encoding from psql, not the *server*
encoding. They're also two separate notions.

Andrew already commented on the TODO list. You may also wish to look
through the archives for a recent message from Peter E on the subject as
he was looking into starting towards multiple collations and such.