Question for Postgres 8.3

Started by Clemens Schwaighoferover 18 years ago10 messagesgeneral

Jump to latest

Clemens Schwaighofer

cs@tequila.co.jp

over 18 years ago

hi,

on the page: <http://www.postgresql.org/docs/8.3/static/release-8-3.html>

in point E.1.2.3:

* Disallow database encodings that are inconsistent with the server's
locale setting (Tom)

does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
to create a EUC_JP database it gets rejected? do I missunderstand that?

Normaly my servers have default locale set to UTF-8.en_US but also have the
locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
but I have some clients that created EUC_JP databases, will the upgrade
affect this?

[ Clemens Schwaighofer -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group ]
[ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp ]

Tom Lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Clemens Schwaighofer (#1)

Re: Question for Postgres 8.3

Clemens Schwaighofer <cs@tequila.co.jp> writes:

* Disallow database encodings that are inconsistent with the server's
locale setting (Tom)

does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
to create a EUC_JP database it gets rejected? do I missunderstand that?

Nope, you have it correctly.

Normaly my servers have default locale set to UTF-8.en_US but also have the
locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
but I have some clients that created EUC_JP databases, will the upgrade
affect this?

I'm surprised your clients haven't been screaming about bogus sorting
and upper/lowercasing behavior.

If you want to support multiple encodings, the only safe locale choice
is (and always has been) C. If you doubt this, troll the archives for
awhile --- for example, searching for locale+encoding in pgsql-bugs
should provide plenty of amusing reading matter. 8.3 is just refusing
to do things that are known to be unsafe in previous releases.

regards, tom lane

Clemens Schwaighofer

cs@tequila.co.jp

over 18 years ago

In reply to: Tom Lane (#2)

Re: Question for Postgres 8.3

On 02/05/2008 11:38 AM, Tom Lane wrote:

Clemens Schwaighofer <cs@tequila.co.jp> writes:

* Disallow database encodings that are inconsistent with the server's
locale setting (Tom)

does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
to create a EUC_JP database it gets rejected? do I missunderstand that?

Nope, you have it correctly.

okay, good, as I finally have a reason to change this then on those view
databases.

Normaly my servers have default locale set to UTF-8.en_US but also have the
locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
but I have some clients that created EUC_JP databases, will the upgrade
affect this?

I'm surprised your clients haven't been screaming about bogus sorting
and upper/lowercasing behavior.

well, originally they were on a server that is also in EUC_JP, but they
need to move away to one of my servers that is pure UTF-8, and I cannot
change this setting because of my other databases, so probably better to
convert them over to UTF-8 and adept the scripts to convert to the
target encoding, rather than going on with the same mess.

If you want to support multiple encodings, the only safe locale choice
is (and always has been) C. If you doubt this, troll the archives for
awhile --- for example, searching for locale+encoding in pgsql-bugs
should provide plenty of amusing reading matter. 8.3 is just refusing
to do things that are known to be unsafe in previous releases.

regards, tom lane

--
[ Clemens Schwaighofer -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group ]
[ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp ]

Hiroshi Saito

z-saito@guitar.ocn.ne.jp

over 18 years ago

In reply to: Clemens Schwaighofer (#1)

Re: Question for Postgres 8.3

Hi.

----- Original Message -----
From: "Clemens Schwaighofer" <cs@tequila.co.jp>

hi,

on the page: <http://www.postgresql.org/docs/8.3/static/release-8-3.html>

in point E.1.2.3:

* Disallow database encodings that are inconsistent with the server's
locale setting (Tom)

does this mean, if my server LOCALE is for example UTF-8.en_US, and I want
to create a EUC_JP database it gets rejected? do I missunderstand that?

Normaly my servers have default locale set to UTF-8.en_US but also have the
locales for UTF-8.ja_JP and EUC_JP there, 99.9% of my databases are utf-8,
but I have some clients that created EUC_JP databases, will the upgrade
affect this?

When using server encoding for EUC_JP and UTF-8 simultaneously, it is not allowed
except LOCALE=C. Moreover, it can also use SQL_ASCII satisfactory natural.
please initdb --no-locale.

Regards,
Hiroshi Saito

David Wall

d.wall@computer.org

over 18 years ago

In reply to: Tom Lane (#2)

Re: Question for Postgres 8.3

If you want to support multiple encodings, the only safe locale choice
is (and always has been) C. If you doubt this, troll the archives for
awhile --- for example, searching for locale+encoding in pgsql-bugs
should provide plenty of amusing reading matter.

This is most interesting. I think some of use UTF-8 under the
impression that it would support unicode/java (and thus US-ascii)
easily, and of course then allow for foreign language encodings when we
need to internationalize. Thank goodness we only plan to I18N for a
decade and never got around to it!

David

Hiroshi Saito

z-saito@guitar.ocn.ne.jp

over 18 years ago

In reply to: Clemens Schwaighofer (#1)

Re: Question for Postgres 8.3

I'm surprised your clients haven't been screaming about bogus sorting
and upper/lowercasing behavior.

Umm, This is a mere information. (Since you dislike, this may be a noise.?)

As for upper/lower.
It does not change, even if uses which locale in us Japanese.
(there is nothing with the alphabet)
Then, In spite of using a small letter and a capital letter properly clearly.

As for sorting.
It has a difference, if sorting of a dictionary is desired. However, each is the not
Japanese order of the JIS X 4061:1996(Collation of Japanese character string),
probably. but, I think that LOCAL=C is clear as a sorting of a dictionary.
It seems that selection is good by the situation well.

See, Although you may be unable to see (However, this is Japanes shift-jis)
http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt

Regards,
Hiroshi Saito

Hiroshi Saito

z-saito@guitar.ocn.ne.jp

over 18 years ago

In reply to: Clemens Schwaighofer (#1)

Re: Question for Postgres 8.3

Hi.

This reply is very glad for me!:-)
However, There is still a locale problem.
Anyway, thanks!!!!

Regards,
Hiroshi Saito

----- Original Message -----
From: "Clemens Schwaighofer" <cs@tequila.co.jp>

Show quoted text

I'm surprised your clients haven't been screaming about bogus sorting
and upper/lowercasing behavior.

Umm, This is a mere information. (Since you dislike, this may be a noise.?)

As for upper/lower.
It does not change, even if uses which locale in us Japanese. (there is
nothing with the alphabet) Then, In spite of using a small letter and a
capital letter properly clearly.

As for sorting.
It has a difference, if sorting of a dictionary is desired. However,
each is the not
Japanese order of the JIS X 4061:1996(Collation of Japanese character
string), probably. but, I think that LOCAL=C is clear as a sorting of a
dictionary. It seems that selection is good by the situation well.
See, Although you may be unable to see (However, this is Japanes shift-jis)
http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt

もしかしたらLinux LOCALEはUTF-8.en_USとPostgresのLOCALEはCその日本語の
ソートは正しくない？僕はいつでもLOCAL=Cある、と今テストしました。ソート
は正しいです。

ありがとうございます。

--
[ Clemens Schwaighofer -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group ]
[ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp ]

Clemens Schwaighofer

cs@tequila.co.jp

over 18 years ago

In reply to: Hiroshi Saito (#7)

Re: Question for Postgres 8.3

On 02/05/2008 03:38 PM, Hiroshi Saito wrote:

Hi.

This reply is very glad for me!:-)
However, There is still a locale problem.

If I use an EUC database on an UTF8 system? Well I will try out and see
if it breaks everything.

Anyway, thanks!!!!

Regards,
Hiroshi Saito

----- Original Message ----- From: "Clemens Schwaighofer"
<cs@tequila.co.jp>

I'm surprised your clients haven't been screaming about bogus sorting
and upper/lowercasing behavior.

Umm, This is a mere information. (Since you dislike, this may be a
noise.?)

As for upper/lower.
It does not change, even if uses which locale in us Japanese. (there is
nothing with the alphabet) Then, In spite of using a small letter and a
capital letter properly clearly.

As for sorting.
It has a difference, if sorting of a dictionary is desired. However,
each is the not
Japanese order of the JIS X 4061:1996(Collation of Japanese character
string), probably. but, I think that LOCAL=C is clear as a sorting of a
dictionary. It seems that selection is good by the situation well.
See, Although you may be unable to see (However, this is Japanes
shift-jis)
http://winpg.jp/~saito/pg_bug/Japanese_LowerUpper_sort.txt

もしかしたらLinux LOCALEはUTF-8.en_USとPostgresのLOCALEはCその日本語の
ソートは正しくない？僕はいつでもLOCAL=Cある、と今テストしました。ソート
は正しいです。

ありがとうございます。

--
[ Clemens Schwaighofer -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group ]
[ 6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703 Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp ]

rihad

rihad@mail.ru

over 18 years ago

In reply to: Clemens Schwaighofer (#8)

Re: Question for Postgres 8.3

If you want to support multiple encodings, the only safe locale choice
is (and always has been) C.

I should be ashamed for asking this, but would someone care to tell me
how encoding differs from locale?

My postgresql FreeBSD rcNG script reads:

postgresql_initdb_flags=${postgresql_initdb_flags:-"--encoding=utf-8
--lc-collate=C"}

As I understand it collation is part of locale, but encoding is
"something else"?

Thanks.

Import Notes

Resolved by subject fallback

#10

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: rihad (#9)

Re: Question for Postgres 8.3

"rihad" <rihad@mail.ru> writes:

If you want to support multiple encodings, the only safe locale choice
is (and always has been) C.

I should be ashamed for asking this, but would someone care to tell me how
encoding differs from locale?

One you missed is a character set, which is just a set of possible characters
(not bytes, abstract things called characters).

An encoding is a mapping from a series of binary bytes to a series of
characters from a character set, like UTF-8 or Big5 or just plain ascii.

A locale is a set of rules for how to sort (collation), format dates, numbers,
currencies, etc like es_US or jp_JP

The problem is that a locale needs to know what the string it's looking is at
to decide how to sort it, so it has to be designed for a particular encoding.
In Unix that encoding is tacked on the end like en_US.UTF-8.

C is a bit of special case since it sorts based on the binary representation
rather than the characters. That's true for any 1-byte encoding based locale
but C is more predictable when you actually have binary data.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning