pgsql: Re-allow UTF8 encodings on win32.

Started by Nonameabout 18 years ago30 messages
#1Noname
mha@postgresql.org

Log Message:
-----------
Re-allow UTF8 encodings on win32. Since UTF8 is converted to
UTF16 before being used, all (valid) locales will work for this.

Modified Files:
--------------
pgsql/src/backend/commands:
dbcommands.c (r1.201 -> r1.202)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/dbcommands.c?r1=1.201&r2=1.202)
pgsql/src/bin/initdb:
initdb.c (r1.146 -> r1.147)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/bin/initdb/initdb.c?r1=1.146&r2=1.147)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

mha@postgresql.org (Magnus Hagander) writes:

Re-allow UTF8 encodings on win32. Since UTF8 is converted to
UTF16 before being used, all (valid) locales will work for this.

So where do we stand on the Windows locale/encoding business --- are
we happy with the behavior now, or does it still need work?

regards, tom lane

#3Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#2)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Tom Lane wrote:

mha@postgresql.org (Magnus Hagander) writes:

Re-allow UTF8 encodings on win32. Since UTF8 is converted to
UTF16 before being used, all (valid) locales will work for this.

So where do we stand on the Windows locale/encoding business --- are
we happy with the behavior now, or does it still need work?

I think we're good. But I'd like to hear some verification from somebody
else. Specifically, I'd like to hear a signoff from someone who can
actually do "real tests" on a locale that's not US and not Swedish.
Also, I'd like to hear from the Japanese people (Hiroshi? Can you do
this?) that we didn't break it for them. I don't think we did, but I
want to be sure :)

Hiroshi, and whomever else can help to test, this is only testing the
backend, not the installer. The installer may need a few minor tweaks
still once the backend is considered fixed. And what needs to be tested
is CVS HEAD as of today.

//Magnus

#4Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

Um, It seems that it only passed the strict check of chklocale.c. Probably, It may
enable mistaken selection...However, I will clarify a problem by the test.

Regards,
Hiroshi Saito

From: "Magnus Hagander" <magnus@hagander.net>

Show quoted text

Tom Lane wrote:

mha@postgresql.org (Magnus Hagander) writes:

Re-allow UTF8 encodings on win32. Since UTF8 is converted to
UTF16 before being used, all (valid) locales will work for this.

So where do we stand on the Windows locale/encoding business --- are
we happy with the behavior now, or does it still need work?

I think we're good. But I'd like to hear some verification from somebody
else. Specifically, I'd like to hear a signoff from someone who can
actually do "real tests" on a locale that's not US and not Swedish.
Also, I'd like to hear from the Japanese people (Hiroshi? Can you do
this?) that we didn't break it for them. I don't think we did, but I
want to be sure :)

Hiroshi, and whomever else can help to test, this is only testing the
backend, not the installer. The installer may need a few minor tweaks
still once the backend is considered fixed. And what needs to be tested
is CVS HEAD as of today.

//Magnus

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Magnus Hagander (#3)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

2007/10/16, Magnus Hagander <magnus@hagander.net>:

Tom Lane wrote:

mha@postgresql.org (Magnus Hagander) writes:

Re-allow UTF8 encodings on win32. Since UTF8 is converted to
UTF16 before being used, all (valid) locales will work for this.

So where do we stand on the Windows locale/encoding business --- are
we happy with the behavior now, or does it still need work?

I think we're good. But I'd like to hear some verification from somebody
else. Specifically, I'd like to hear a signoff from someone who can
actually do "real tests" on a locale that's not US and not Swedish.
Also, I'd like to hear from the Japanese people (Hiroshi? Can you do
this?) that we didn't break it for them. I don't think we did, but I
want to be sure :)

Hiroshi, and whomever else can help to test, this is only testing the
backend, not the installer. The installer may need a few minor tweaks
still once the backend is considered fixed. And what needs to be tested
is CVS HEAD as of today.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

I can test it with czech locale. Can I download binaries anywhere?

Pavel

#6Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

I can test it with czech locale. Can I download binaries anywhere?

http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs.tgz
It is a thing after regression test.(MinGW+gcc)

Regards,
Hiroshi Saito

#7Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

Um, It seems that it only passed the strict check of chklocale.c. Probably, It may
enable mistaken selection...However, I will clarify a problem by the test.

First, it is one problem....
http://winpg.jp/~saito/pg83/pg83b1-err.txt

And a test continues....

#8Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is ShiftJIS locale)

And a test continues....

Regards,
Hiroshi Saito

#9Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Saito (#7)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hiroshi Saito wrote:

Hi.

Um, It seems that it only passed the strict check of chklocale.c.
Probably, It may enable mistaken selection...However, I will clarify a
problem by the test.

First, it is one problem....
http://winpg.jp/~saito/pg83/pg83b1-err.txt

And a test continues....

But SJIS isn't supposed to work, no?

//Magnus

#10Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Saito (#8)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hiroshi Saito wrote:

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is
ShiftJIS locale)

And a test continues....

What text search config would you expect?

//Magnus

#11Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

Hiroshi Saito wrote:

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is
ShiftJIS locale)

And a test continues....

What text search config would you expect?

This problem here is that locale of initdb passes Japanese_Japan.932.

Regards,
Hiroshi Saito

#12Dave Page
dpage@postgresql.org
In reply to: Hiroshi Saito (#8)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hiroshi Saito wrote:

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is
ShiftJIS locale)

And a test continues....

The changes that were made were only to re-enable UTF-8.

SJIS wasn't ever supported as a server encoding
(http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
fact that initdb continues if you use Japanese_Japan.932 is an
inconsistency I reported previously but has yet to be fixed.

/D

#13Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

From: "Dave Page" <dpage@postgresql.org>

Hiroshi Saito wrote:

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is
ShiftJIS locale)

And a test continues....

The changes that were made were only to re-enable UTF-8.

Yes, Please see,
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
Is that initdb is successful a problem as for this?

SJIS wasn't ever supported as a server encoding
(http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
fact that initdb continues if you use Japanese_Japan.932 is an
inconsistency I reported previously but has yet to be fixed.

Yes, However, Encoding and locale are not equivalent.

Regards,
Hiroshi Saito

#14Dave Page
dpage@postgresql.org
In reply to: Hiroshi Saito (#13)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hiroshi Saito wrote:

From: "Dave Page" <dpage@postgresql.org>

Hiroshi Saito wrote:

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is
ShiftJIS locale)

And a test continues....

The changes that were made were only to re-enable UTF-8.

Yes, Please see,
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
Is that initdb is successful a problem as for this?

Oh, sorry - misread that. I chatted with Magnus about that. It is
correct, but misleading. pg_control will say Japanese_Japan.932 as well
iirc, even though it is really Japanese_Japan.65001.

Regards, Dave

#15Magnus Hagander
magnus@hagander.net
In reply to: Dave Page (#12)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Dave Page wrote:

Hiroshi Saito wrote:

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is
ShiftJIS locale)

And a test continues....

The changes that were made were only to re-enable UTF-8.

SJIS wasn't ever supported as a server encoding
(http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
fact that initdb continues if you use Japanese_Japan.932 is an
inconsistency I reported previously but has yet to be fixed.

That is a good point, if unrelated to this very discussion. Do we want
to change that thing to an exit instead of complain-and-continue? I
think yes?

//Magnus

#16Magnus Hagander
magnus@hagander.net
In reply to: Dave Page (#14)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Dave Page wrote:

Hiroshi Saito wrote:

From: "Dave Page" <dpage@postgresql.org>

Hiroshi Saito wrote:

Hi.

Second, it is big problem....
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
It is text serch config error.
However, It passes initdb.(locale=Japanese_Japan.932 ... This is
ShiftJIS locale)

And a test continues....

The changes that were made were only to re-enable UTF-8.

Yes, Please see,
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
Is that initdb is successful a problem as for this?

Oh, sorry - misread that. I chatted with Magnus about that. It is
correct, but misleading. pg_control will say Japanese_Japan.932 as well
iirc, even though it is really Japanese_Japan.65001.

Not so. The locale is Japanese_Japan, really. That's the only part
that's relevant for UTF16 encodings, which is what we use to do UTF8. We
specifically *don't* try to use Japanese_Japan.65001.

//Magnus

#17Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

From: "Dave Page" <dpage@postgresql.org>

Yes, Please see,
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
Is that initdb is successful a problem as for this?

Oh, sorry - misread that. I chatted with Magnus about that. It is
correct, but misleading. pg_control will say Japanese_Japan.932 as well
iirc, even though it is really Japanese_Japan.65001.

But, Please see.
http://winpg.jp/~saito/pg83/pg83b1-err3.txt
Japanese_Japan.65001 is error...
Japanese_Japan is true.

Regards,
Hiroshi Saito

#18Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Saito (#17)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hiroshi Saito wrote:

Hi.

From: "Dave Page" <dpage@postgresql.org>

Yes, Please see,
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
Is that initdb is successful a problem as for this?

Oh, sorry - misread that. I chatted with Magnus about that. It is
correct, but misleading. pg_control will say Japanese_Japan.932 as well
iirc, even though it is really Japanese_Japan.65001.

But, Please see.
http://winpg.jp/~saito/pg83/pg83b1-err3.txt
Japanese_Japan.65001 is error...
Japanese_Japan is true.

Yes, that is expected. If you explicitly ask for the .65001 locale it
will try the one that doesn't have the proper NLS files, and that
shouldn't work. If you just put in Japanese_Japan, it will use the UTF16
locale.

//Magnus

#19Dave Page
dpage@postgresql.org
In reply to: Magnus Hagander (#16)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Magnus Hagander wrote:

Not so. The locale is Japanese_Japan, really. That's the only part
that's relevant for UTF16 encodings, which is what we use to do UTF8. We
specifically *don't* try to use Japanese_Japan.65001.

Thats not what I mean. From a *usability* perspective, Hiroshi should
see Japanese_Japan.65001 because he's selected UTF-8 in Japanese_Japan.
He shouldn't see Japanese_Japan.932 because that definitely isn't what
he selected.

/D

#20Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

But, Please see.
http://winpg.jp/~saito/pg83/pg83b1-err3.txt
Japanese_Japan.65001 is error...
Japanese_Japan is true.

However, The test of this state is continued.
But but but, Sorry, I face to a bed...

Regards,
Hiroshi Saito

#21Magnus Hagander
magnus@hagander.net
In reply to: Dave Page (#19)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Dave Page wrote:

Magnus Hagander wrote:

Not so. The locale is Japanese_Japan, really. That's the only part
that's relevant for UTF16 encodings, which is what we use to do UTF8. We
specifically *don't* try to use Japanese_Japan.65001.

Thats not what I mean. From a *usability* perspective, Hiroshi should
see Japanese_Japan.65001 because he's selected UTF-8 in Japanese_Japan.
He shouldn't see Japanese_Japan.932 because that definitely isn't what
he selected.

I'l grant you that from a usbility perspective, he should see
Japanese_Japan. Not the .65001 part, though.

//Magnus

#22Dave Page
dpage@postgresql.org
In reply to: Hiroshi Saito (#17)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hiroshi Saito wrote:

Hi.

From: "Dave Page" <dpage@postgresql.org>

Yes, Please see,
http://winpg.jp/~saito/pg83/pg83b1-err2.txt
Is that initdb is successful a problem as for this?

Oh, sorry - misread that. I chatted with Magnus about that. It is
correct, but misleading. pg_control will say Japanese_Japan.932 as well
iirc, even though it is really Japanese_Japan.65001.

But, Please see.
http://winpg.jp/~saito/pg83/pg83b1-err3.txt
Japanese_Japan.65001 is error...
Japanese_Japan is true.

Yes, we're faking utf-8 support using utf-16. Specifying it as you have
there bypasses the workaround and tries to use the 65001 codepage which
then fails because LC_CTYPE cannot be set to .65001 in any locale.

/D

#23Dave Page
dpage@postgresql.org
In reply to: Magnus Hagander (#21)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Magnus Hagander wrote:

Dave Page wrote:

Magnus Hagander wrote:

Not so. The locale is Japanese_Japan, really. That's the only part
that's relevant for UTF16 encodings, which is what we use to do UTF8. We
specifically *don't* try to use Japanese_Japan.65001.

Thats not what I mean. From a *usability* perspective, Hiroshi should
see Japanese_Japan.65001 because he's selected UTF-8 in Japanese_Japan.
He shouldn't see Japanese_Japan.932 because that definitely isn't what
he selected.

I'l grant you that from a usbility perspective, he should see
Japanese_Japan. Not the .65001 part, though.

Well, that depends on whether we care that we're actually faking the
utf-8 support and/or we want to keep the message consistent with what
you'd see in other locales.

/D

#24Pavel Stehule
pavel.stehule@gmail.com
In reply to: Hiroshi Saito (#6)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

2007/10/16, Hiroshi Saito <z-saito@guitar.ocn.ne.jp>:

Hi.

I can test it with czech locale. Can I download binaries anywhere?

http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs.tgz
It is a thing after regression test.(MinGW+gcc)

I have problem, there isn't libintl-2.dll

Pavel

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#15)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Magnus Hagander <magnus@hagander.net> writes:

Dave Page wrote:

SJIS wasn't ever supported as a server encoding
(http://www.postgresql.org/docs/8.2/interactive/multibyte.html). The
fact that initdb continues if you use Japanese_Japan.932 is an
inconsistency I reported previously but has yet to be fixed.

That is a good point, if unrelated to this very discussion. Do we want
to change that thing to an exit instead of complain-and-continue? I
think yes?

Yeah, I thought we'd agreed to that a few days ago.

regards, tom lane

#26Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

From: "Pavel Stehule" <pavel.stehule@gmail.com>

I can test it with czech locale. Can I download binaries anywhere?

http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs.tgz
It is a thing after regression test.(MinGW+gcc)

I have problem, there isn't libintl-2.dll

Ooops, sorry, it is full-build.
Please, this is minimum composition
http://winpg.jp/~saito/pg83/postgresql-8.3beta-cvs-minbin.tgz
Thanks.

Regards,
Hiroshi Saito

#27Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hi.

From: "Magnus Hagander" <magnus@hagander.net>

But, Please see.
http://winpg.jp/~saito/pg83/pg83b1-err3.txt
Japanese_Japan.65001 is error...
Japanese_Japan is true.

Yes, that is expected. If you explicitly ask for the .65001 locale it
will try the one that doesn't have the proper NLS files, and that
shouldn't work. If you just put in Japanese_Japan, it will use the UTF16
locale.

Umm, As for result ...
initdb -E UTF8 --locale=Japanese_Japan -D../data
http://winpg.jp/~saito/pg83/pg83b1-err4.txt
It seems that it is only complemented.

Regards,
Hiroshi Saito

#28Dave Page
dpage@postgresql.org
In reply to: Hiroshi Saito (#27)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Hiroshi Saito wrote:

Hi.

From: "Magnus Hagander" <magnus@hagander.net>

But, Please see.
http://winpg.jp/~saito/pg83/pg83b1-err3.txt
Japanese_Japan.65001 is error...
Japanese_Japan is true.

Yes, that is expected. If you explicitly ask for the .65001 locale it
will try the one that doesn't have the proper NLS files, and that
shouldn't work. If you just put in Japanese_Japan, it will use the UTF16
locale.

Umm, As for result ... initdb -E UTF8 --locale=Japanese_Japan -D../data
http://winpg.jp/~saito/pg83/pg83b1-err4.txt
It seems that it is only complemented.

Yes, that is expected, though not entirely to my tastes. The cluster
should still actually be in utf-8 however.

/D

#29Pavel Stehule
pavel.stehule@gmail.com
In reply to: Dave Page (#28)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

I did some test, but without success,

Pavel

I have win2003 Server .. with czech locales support.

I:\PGSQL\BIN>initdb -D ../data -L i:\pgsql\share

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale Czech_Czech Republic.1250.

could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

"CP1250"

INITDB: could not find suitable encoding for locale Czech_Czech Republic.1250

Rerun INITDB with the -E option.

Try "INITDB --help" for more information.

I:\PGSQL\BIN>

I:\PGSQL\BIN>initdb -E UTF-8 -D ../data -L i:\pgsql\share

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale Czech_Czech Republic.1250.

could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

"CP1250"

INITDB: could not find suitable text search configuration for locale Czech_Czech

Republic.1250

The default text search configuration will be set to "simple".

fixing permissions on existing directory ../data ... ok

creating subdirectories ... ok

selecting default max_connections ... 10

selecting default shared_buffers/max_fsm_pages ... 400kB/20000

creating configuration files ... ok

creating template1 database in ../data/base/1 ... FATAL: could not select a sui

table default timezone

DETAIL: It appears that your GMT time zone uses leap seconds. PostgreSQL does n

ot support leap seconds.

child process exited with exit code 1

INITDB: removing contents of data directory "../data"

I:\PGSQL\BIN>initdb -E win1250 --locale="Czech_Czech Republic.1250" -D ../data -

L i:\pgsql\share

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale Czech_Czech Republic.1250.

could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

"CP1250"

INITDB: could not find suitable text search configuration for locale Czech_Czech

Republic.1250

The default text search configuration will be set to "simple".

fixing permissions on existing directory ../data ... ok

creating subdirectories ... ok

selecting default max_connections ... 10

selecting default shared_buffers/max_fsm_pages ... 400kB/20000

creating configuration files ... ok

creating template1 database in ../data/base/1 ... FATAL: could not select a sui

table default timezone

DETAIL: It appears that your GMT time zone uses leap seconds. PostgreSQL does n

ot support leap seconds.

child process exited with exit code 1

INITDB: removing contents of data directory "../data"

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#29)
Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

"Pavel Stehule" <pavel.stehule@gmail.com> writes:

could not determine encoding for locale "Czech_Czech Republic.1250": codeset is

"CP1250"

Hm, we seem to have missed an entry for PG_WIN1250. Fixed.

regards, tom lane