Re: UNICODE/UTF-8 on win32

Started by Magnus Haganderabout 21 years ago32 messages
#1Magnus Hagander
mha@sollentuna.net

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on win32, it
needs to be converted to UTF16 and use the wide-character versions of
the fucntion. Which we do not do.
(See
http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
and
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)

I don't *think* we need to disable ito n the client. AFAIK, the client
interfaces don't use any of these functions, and I've seen reports of
people using that long before we had a native win32 server.

//Magnus

Show quoted text

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the encoding
enum type, so you can't just ifdef out individual entries.

(Or perhaps something can be done in pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does that work OK?

regards, tom lane

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Magnus Hagander (#1)

TODO updated:

o Disallow encodings like UTF8 which PostgreSQL supports
but the operating system does not (already disallowed by
pginstaller)

To fix UTF8, the data needs to be converted to UTF16 and then
the Win32 strcoll() can be used.

---------------------------------------------------------------------------

Magnus Hagander wrote:

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on win32, it
needs to be converted to UTF16 and use the wide-character versions of
the fucntion. Which we do not do.
(See
http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
and
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)

I don't *think* we need to disable ito n the client. AFAIK, the client
interfaces don't use any of these functions, and I've seen reports of
people using that long before we had a native win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the encoding
enum type, so you can't just ifdef out individual entries.

(Or perhaps something can be done in pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does that work OK?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Magnus Hagander (#1)

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other functions does not
work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

Show quoted text

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on win32, it
needs to be converted to UTF16 and use the wide-character versions of
the fucntion. Which we do not do.
(See
http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
and
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)

I don't *think* we need to disable ito n the client. AFAIK, the client
interfaces don't use any of these functions, and I've seen reports of
people using that long before we had a native win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the encoding
enum type, so you can't just ifdef out individual entries.

(Or perhaps something can be done in pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does that work OK?

regards, tom lane

#4Magnus Hagander
mha@sollentuna.net
In reply to: Tatsuo Ishii (#3)

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other functions does not
work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

The main issue is not with upper/lower, it's with ORDER BY (and doesn't
that affect indexes as well). This affects Japanese as well, no?

I didn't consider the C locale. Do you know for a fact that it works
there on win32 as well, or is that an assumption? (I don't know either
way)

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

I was under the impression that *no* languages worked. If some do work,
then we definitly should not kill it.

It would be good to have some way of detecting if it worked or not at
the time of creation of the database. But I have no idea on how to do
that in a reasonable way.

//Magnus

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#4)

"Magnus Hagander" <mha@sollentuna.net> writes:

I didn't consider the C locale. Do you know for a fact that it works
there on win32 as well, or is that an assumption?

It should work. The only use of strcoll() in the backend is in
varstr_cmp which uses strncmp() instead for C locale. Lack of
working upper/lower is hardly a fatal objection, considering that
we never had that for UTF8 before 8.0 anyway. But you do have to
have working varstr_cmp.

It would be good to have some way of detecting if it worked or not at
the time of creation of the database. But I have no idea on how to do
that in a reasonable way.

At this point I'd say that any combination of UTF8 encoding with a non
C/POSIX locale probably isn't going to work on Windows. Tatsuo, do you
know of other cases that will work?

regards, tom lane

#6Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#5)

"Magnus Hagander" <mha@sollentuna.net> writes:

I didn't consider the C locale. Do you know for a fact that it works
there on win32 as well, or is that an assumption?

It should work. The only use of strcoll() in the backend is in
varstr_cmp which uses strncmp() instead for C locale. Lack of
working upper/lower is hardly a fatal objection, considering that
we never had that for UTF8 before 8.0 anyway. But you do have to
have working varstr_cmp.

It would be good to have some way of detecting if it worked or not at
the time of creation of the database. But I have no idea on how to do
that in a reasonable way.

At this point I'd say that any combination of UTF8 encoding with a non
C/POSIX locale probably isn't going to work on Windows. Tatsuo, do you
know of other cases that will work?

No. I think C is the only working locale.
--
Tatsuo Ishii

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Magnus Hagander (#4)

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other functions does not
work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

The main issue is not with upper/lower, it's with ORDER BY (and doesn't
that affect indexes as well). This affects Japanese as well, no?

As long as used with C locale, indexes should be ok. ORDER BY is not
perfect but we can live with it. Since Japanese is an ideogram, we
cannot rely on ORDER BY character codes to sort Japanese characters
anyway. I believe same thing can be said to Chinese.

I didn't consider the C locale. Do you know for a fact that it works
there on win32 as well, or is that an assumption? (I don't know either
way)

I have not tested 8.0 on win32, but I think it should work with C
locale since I know PowerGres, which is based on 7.4, works.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

I was under the impression that *no* languages worked. If some do work,
then we definitly should not kill it.

It would be good to have some way of detecting if it worked or not at
the time of creation of the database. But I have no idea on how to do
that in a reasonable way.

--
Tatsuo Ishii

#8Jonathan Barnhart
jdbarnhart@yahoo.com
In reply to: Tatsuo Ishii (#7)
Any chance of a merge module?

What would it take to make the PG installer into a merge module? I
don't have the stuff to build PG so I can't build the PG install,
though I do have Wix. It would make my life (and anyone else using PG
for a specific app) a lot easier if you guys would allow us to embed
the PG install in our own install. This would let us just pass in the
setup info for the app and let PG install mostly silently. For my app,
the only thing the user needs to see from PG is the license which is
different from the commercial license on the rest of the product. The
rest I can configure from the main install. Right now the end user has
to configure things right and follow directions, and that leads to tech
support issues when they screw up. I tried using the silent install
option on the main MSI and got all sorts of problems. (Besides, many
Win2k setups with their old MSIexec don't even support a silent
install.)

=====
"We'll do the undoable, work the unworkable, scrute the inscrutable and have a long, hard look at the ineffable to see whether it might not be effed after all"

#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tatsuo Ishii (#3)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Magnus, where are we on this? Seems we should allow unicode encoding
and just not unicode locale in pginstaller.

Also, Unicode is changing to UTF-8 in 8.1.

---------------------------------------------------------------------------

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other functions does not
work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on win32, it
needs to be converted to UTF16 and use the wide-character versions of
the fucntion. Which we do not do.
(See
http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
and
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)

I don't *think* we need to disable ito n the client. AFAIK, the client
interfaces don't use any of these functions, and I've seen reports of
people using that long before we had a native win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the encoding
enum type, so you can't just ifdef out individual entries.

(Or perhaps something can be done in pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does that work OK?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#10Magnus Hagander
mha@sollentuna.net
In reply to: Bruce Momjian (#9)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

The installer does not permit it, but initdb lets you do anything yuo
want - I think that's where we are. If you know what you're doing, you
can use it by manually initdbing.

There is no such thing as "unicode locale". Unicode (UTF8) is an
encoding, that has to be paired with a locale. I assume you mean C
locale.

While UPPER/LOWER does not matter, sort order does - for indexes if
nothing else. I'm unsure if this works - I think I read reports about
itn ot working, but I haven't tried it out myself.

I was hoping for a final solution for 8.1 which actually fixes it so it
works all the way. Not sure if I can make that happen myself, but I can
always try unless someone else does it.

//mha

Show quoted text

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: den 22 februari 2005 04:43
To: Tatsuo Ishii

Magnus, where are we on this? Seems we should allow unicode encoding
and just not unicode locale in pginstaller.

Also, Unicode is changing to UTF-8 in 8.1.

---------------------------------------------------------------
------------

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other

functions does not

work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so

on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID:

<6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on

win32, it

needs to be converted to UTF16 and use the wide-character

versions of

the fucntion. Which we do not do.
(See

http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php

and

http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg0
0106.php)

I don't *think* we need to disable ito n the client.

AFAIK, the client

interfaces don't use any of these functions, and I've seen

reports of

people using that long before we had a native win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list.

What's the

problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match

the encoding

enum type, so you can't just ifdef out individual entries.

(Or perhaps something can be done in

pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does

that work OK?

regards, tom lane

---------------------------(end of

broadcast)---------------------------

TIP 1: subscribe and unsubscribe commands go to

majordomo@postgresql.org

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073
#11Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Magnus Hagander (#10)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Magnus Hagander wrote:

The installer does not permit it, but initdb lets you do anything yuo
want - I think that's where we are. If you know what you're doing, you
can use it by manually initdbing.

There is no such thing as "unicode locale". Unicode (UTF8) is an
encoding, that has to be paired with a locale. I assume you mean C
locale.

Oh, sorry. So there is no ordering in Unicode? No wonder some
languages can't use Unicode effectively. I can see why ordering is
meaningless for creating a document that is just displayed but important
for a database.

I have added the last sentence to the TODO list:

o Disallow encodings like UTF8 which PostgreSQL supports
but the operating system does not (already disallowed by
pginstaller)

To fix UTF8, the data needs to be converted to UTF16 and then
the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering.

While UPPER/LOWER does not matter, sort order does - for indexes if
nothing else. I'm unsure if this works - I think I read reports about
itn ot working, but I haven't tried it out myself.

I assume C just compares the bytes, meaning equality comparisons are
fine, but greater/less than is consistent but meaningless.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#12John Hansen
john@geeknet.com.au
In reply to: Bruce Momjian (#11)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

To fix UTF8, the data needs to be converted to
UTF16 and then
the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering.

Right,. So if that's fixed, then UTF8 will work only on windows?
(currently, upper/lower does not work with 2+ byte unicode characters, on any OS)

... John

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Hansen (#12)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

"John Hansen" <john@geeknet.com.au> writes:

Right,. So if that's fixed, then UTF8 will work only on windows?

No.

(currently, upper/lower does not work with 2+ byte unicode characters, on any OS)

This information is obsolete.

regards, tom lane

#14John Hansen
john@geeknet.com.au
In reply to: Tom Lane (#13)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

K, let me rephrase:

currently, upper/lower does not work with 2+ byte unicode characters, on any OS under the C locale.

... John

#15John Hansen
john@geeknet.com.au
In reply to: John Hansen (#14)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

currently, upper/lower does not work with 2+ byte unicode
characters, on any OS under the C locale.

Btw,...

There are only 15 cases in the utf8 repertoire that depends on locale, these are the only cases where pg should report:

ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding.

When doing a select upper/lower (col)
All others should work just fine.

The error should probably also be changed to a warning, and just return the offending character unmodified.

... John

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#11)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Bruce Momjian wrote:

Oh, sorry. So there is no ordering in Unicode?

That statement is meaningless. Unicode is a character set, not a
collation order.

No wonder some
languages can't use Unicode effectively.

That has nothing to do with it.

o Disallow encodings like UTF8 which PostgreSQL supports
but the operating system does not (already disallowed by
pginstaller)

I think the warning that initdb shouts out is already enough for this.
I don't think we want to disallow this for people who know what they
are doing.

I assume C just compares the bytes, meaning equality comparisons are
fine, but greater/less than is consistent but meaningless.

That statement is independent of whether you use Unicode or something
else.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#17Peter Eisentraut
peter_e@gmx.net
In reply to: John Hansen (#14)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

John Hansen wrote:

currently, upper/lower does not work with 2+ byte unicode characters,
on any OS under the C locale.

Sure it does. It's just that the defined behavior of the C locale is
often useless in practice.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#18John Hansen
john@geeknet.com.au
In reply to: Peter Eisentraut (#17)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

John Hansen wrote:

currently, upper/lower does not work with 2+ byte unicode

characters,

on any OS under the C locale.

Sure it does. It's just that the defined behavior of the C
locale is often useless in practice.

select upper('æøå');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding.

Consequently it seems that is does not work.

... John

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Hansen (#18)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

"John Hansen" <john@geeknet.com.au> writes:

Sure it does. It's just that the defined behavior of the C
locale is often useless in practice.

select upper('æøå');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding.

Consequently it seems that is does not work.

"It fails on my machine" should not be read as "it doesn't work for anyone".
It all depends on how your local mbstowcs() works.

regards, tom lane

#20John Hansen
john@geeknet.com.au
In reply to: Tom Lane (#19)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

select upper('æøå');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably

incompatible with the database encoding.

Consequently it seems that is does not work.

"It fails on my machine" should not be read as "it doesn't
work for anyone".
It all depends on how your local mbstowcs() works.

Ok,... Do you have an example of a system on which it works?

... John

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Hansen (#20)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

"John Hansen" <john@geeknet.com.au> writes:

"It fails on my machine" should not be read as "it doesn't
work for anyone".
It all depends on how your local mbstowcs() works.

Ok,... Do you have an example of a system on which it works?

On HPUX 10.20, mbstowcs seems to treat all byte values as single-byte
characters in C locale, so my sample-of-one says that it works
everywhere ;-).

Nonetheless, it's clear that in C locale mbstowcs cannot be buying us
anything compared to using the old <ctype.h> macros, so I'm fine with
adding a check on the locale as per previous discussion.

regards, tom lane

#22John Hansen
john@geeknet.com.au
In reply to: Tom Lane (#21)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

On HPUX 10.20, mbstowcs seems to treat all byte values as
single-byte characters in C locale, so my sample-of-one says
that it works everywhere ;-).

Right, so for the sample SQL I sent earlier, the result would be the same as the input?
That's hardly a working upper/lower....

If a character doesn't have case then fine, but one that does, should at least produce a warning if it cannot be converted.

... John

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Hansen (#22)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

"John Hansen" <john@geeknet.com.au> writes:

Right, so for the sample SQL I sent earlier, the result would be the same as the input?
That's hardly a working upper/lower....

[ shrug... ] It works per the locale definition, which is that only
7-bit-ASCII a-z/A-Z get converted.

The bottom line here is that we rely on the locale setting for this
behavior, and that's not likely to change real soon. If you dislike
the locale definition then you should be using a different locale.
In particular I think the issue here is really that your platform's
definition of "C locale" says that bytes above x7F are illegal
characters. My platform's doesn't. The thing to be changing is the
locale definition.

regards, tom lane

#24Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#16)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Peter Eisentraut wrote:

o Disallow encodings like UTF8 which PostgreSQL supports
but the operating system does not (already disallowed by
pginstaller)

I think the warning that initdb shouts out is already enough for this.
I don't think we want to disallow this for people who know what they
are doing.

I have updated the Win32 TODO item:

o Add support for Unicode

To fix this, the data needs to be converted to/from UTF16/UTF8
so the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering or character set classes.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#25Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tatsuo Ishii (#3)
Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other functions does not
work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.

I have just applied a patch to CVS HEAD and 8.0.X that disables
locale-aware handling of upper/lower/initcap when the locale is C or
POSIX.

With these changes, it seems safe to allow pginstaller to use UTF8
encoding of the locale is C/POSIX. If we don't do that, I am concerned
that Asian users will either make a hacked installer or be required to
run initdb manually by following complex instructions.

We could throw a warning if the combination is selected as a compromise.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#26Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tatsuo Ishii (#3)
Re: [HACKERS] UNICODE/UTF-8 on win32

Where are we on this? As far as I can tell, we never disabled UTF8 on
Win32 in our code. The only thing we did do was to disable UTF8 in
pginstaller. See this FAQ item:

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.6

Is the current setup OK? Should we allow UTF8 on Win32 for languages
that can use C locale, like Asian languages?

---------------------------------------------------------------------------

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other functions does not
work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on win32, it
needs to be converted to UTF16 and use the wide-character versions of
the fucntion. Which we do not do.
(See
http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
and
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)

I don't *think* we need to disable ito n the client. AFAIK, the client
interfaces don't use any of these functions, and I've seen reports of
people using that long before we had a native win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the encoding
enum type, so you can't just ifdef out individual entries.

(Or perhaps something can be done in pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does that work OK?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#27John Hansen
john@geeknet.com.au
In reply to: Bruce Momjian (#26)
Re: [HACKERS] UNICODE/UTF-8 on win32

Look at the upper/lower I sent to the list, they should be able to
replace upper/lower for the utf8 encoding.... (and works independent of
locale)..

... John

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian
Sent: Sunday, April 24, 2005 10:35 PM
To: Tatsuo Ishii
Cc: mha@sollentuna.net; tgl@sss.pgh.pa.us;
pgsql-hackers-win32@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Where are we on this? As far as I can tell, we never disabled UTF8 on
Win32 in our code. The only thing we did do was to disable
UTF8 in pginstaller. See this FAQ item:

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.6

Is the current setup OK? Should we allow UTF8 on Win32 for
languages that can use C locale, like Asian languages?

--------------------------------------------------------------
-------------

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other

functions does not

work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact

UPPER/LOWER

does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so

on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID:
<6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

UNICODE/UTF-8 does not work on the win32 server. The

reason is that

strcoll() and friends don't work with it. To support it

on win32, it

needs to be converted to UTF16 and use the wide-character

versions

of the fucntion. Which we do not do.
(See

http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.

php
and

http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.

php)

I don't *think* we need to disable ito n the client. AFAIK, the
client interfaces don't use any of these functions, and I've seen
reports of people using that long before we had a native

win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's
the problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the
encoding enum type, so you can't just ifdef out

individual entries.

(Or perhaps something can be done in

pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does

that work OK?

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073

---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index
scan if your
joining column's datatypes do not match

#28Bruce Momjian
pgman@candle.pha.pa.us
In reply to: John Hansen (#27)
Re: [HACKERS] UNICODE/UTF-8 on win32

John Hansen wrote:

Look at the upper/lower I sent to the list, they should be able to
replace upper/lower for the utf8 encoding.... (and works independent of
locale)..

You mean ICU? Yes, it seems like a good approach for 8.1.

---------------------------------------------------------------------------

... John

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian
Sent: Sunday, April 24, 2005 10:35 PM
To: Tatsuo Ishii
Cc: mha@sollentuna.net; tgl@sss.pgh.pa.us;
pgsql-hackers-win32@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Where are we on this? As far as I can tell, we never disabled UTF8 on
Win32 in our code. The only thing we did do was to disable
UTF8 in pginstaller. See this FAQ item:

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.6

Is the current setup OK? Should we allow UTF8 on Win32 for
languages that can use C locale, like Asian languages?

--------------------------------------------------------------
-------------

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other

functions does not

work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact

UPPER/LOWER

does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so

on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID:
<6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

UNICODE/UTF-8 does not work on the win32 server. The

reason is that

strcoll() and friends don't work with it. To support it

on win32, it

needs to be converted to UTF16 and use the wide-character

versions

of the fucntion. Which we do not do.
(See

http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.

php
and

http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.

php)

I don't *think* we need to disable ito n the client. AFAIK, the
client interfaces don't use any of these functions, and I've seen
reports of people using that long before we had a native

win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's
the problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the
encoding enum type, so you can't just ifdef out

individual entries.

(Or perhaps something can be done in

pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does

that work OK?

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073

---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index
scan if your
joining column's datatypes do not match

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#29John Hansen
john@geeknet.com.au
In reply to: Bruce Momjian (#28)
Re: [HACKERS] UNICODE/UTF-8 on win32

Ehmm,... No the upper/lower replacements I sent to -hackers

ICU was not me.... Tho for win32 you're better off writing wrapper
classes for the win32 native functions.

Show quoted text

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Sunday, April 24, 2005 10:50 PM
To: John Hansen
Cc: Tatsuo Ishii; mha@sollentuna.net; tgl@sss.pgh.pa.us;
pgsql-hackers-win32@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

John Hansen wrote:

Look at the upper/lower I sent to the list, they should be able to
replace upper/lower for the utf8 encoding.... (and works

independent

of locale)..

You mean ICU? Yes, it seems like a good approach for 8.1.

--------------------------------------------------------------
-------------

... John

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce
Momjian
Sent: Sunday, April 24, 2005 10:35 PM
To: Tatsuo Ishii
Cc: mha@sollentuna.net; tgl@sss.pgh.pa.us;
pgsql-hackers-win32@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [pgsql-hackers-win32]

UNICODE/UTF-8 on win32

Where are we on this? As far as I can tell, we never

disabled UTF8

on
Win32 in our code. The only thing we did do was to disable
UTF8 in pginstaller. See this FAQ item:

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.6

Is the current setup OK? Should we allow UTF8 on Win32 for
languages that can use C locale, like Asian languages?

--------------------------------------------------------------
-------------

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision
you guys made. The fact that UPPER/LOWER and some other

functions does not

work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and

probably Chinese

and
Korean) does not have a concept upper/lower. So the fact

UPPER/LOWER

does not work with UTF-8/win32 is not problem for Japanese (and
for some other languages). Just using C locale with UTF-8 is
enough in this case.

In summary, I think you guys are going to overkill the

multibyte

support functionality on UTF-8/win32 because of the

fact that some

langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so

on as well.

I strongly object the policy to try to unconditionaly disable
UTF-8 support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID:
<6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

UNICODE/UTF-8 does not work on the win32 server. The

reason is that

strcoll() and friends don't work with it. To support it

on win32, it

needs to be converted to UTF16 and use the wide-character

versions

of the fucntion. Which we do not do.
(See

http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.

php
and

http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.

php)

I don't *think* we need to disable ito n the client.

AFAIK, the

client interfaces don't use any of these functions, and I've
seen reports of people using that long before we had a native

win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list.
What's the problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will

that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match the
encoding enum type, so you can't just ifdef out

individual entries.

(Or perhaps something can be done in

pg_valid_server_encoding?)

Making the valid_server_encoding function reject

it might work.

Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does

that work OK?

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073

---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an

index scan

if your
joining column's datatypes do not match

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Hansen (#27)
Re: [HACKERS] UNICODE/UTF-8 on win32

"John Hansen" <john@geeknet.com.au> writes:

Look at the upper/lower I sent to the list, they should be able to
replace upper/lower for the utf8 encoding.... (and works independent of
locale)..

I was under the impression we couldn't use these, precisely because they
weren't locale-aware. ("It works for most people" isn't good enough.)

In any case, don't we need a solution that covers sorting (strcoll) as
well as upper/lower?

regards, tom lane

#31John Hansen
john@geeknet.com.au
In reply to: Tom Lane (#30)
Re: [HACKERS] UNICODE/UTF-8 on win32

Right, they were meant as a starting point, but if you can point me to
how I can obtain the current locale, then I can fix them to cover the
remaining 15 special cases.

... John

Show quoted text

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, April 25, 2005 2:01 AM
To: John Hansen
Cc: Bruce Momjian; Tatsuo Ishii; mha@sollentuna.net;
pgsql-hackers-win32@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

"John Hansen" <john@geeknet.com.au> writes:

Look at the upper/lower I sent to the list, they should be able to
replace upper/lower for the utf8 encoding.... (and works

independent

of locale)..

I was under the impression we couldn't use these, precisely
because they weren't locale-aware. ("It works for most
people" isn't good enough.)

In any case, don't we need a solution that covers sorting
(strcoll) as well as upper/lower?

regards, tom lane

#32Magnus Hagander
mha@sollentuna.net
In reply to: John Hansen (#31)
Re: [HACKERS] UNICODE/UTF-8 on win32

That is pretty much where we are ;-)
I think we're fine for 8.0.x with this, because if you actually need
UTF-8 (and can live with sorting broken, no upper/lower etc), you can do
it using a manual initdb.

For 8.1, I think the ICU approach looks a lot more promising than trying
to do "on the fly conversion to UTF-16 and back". Especially if there is
profit in having ICU for other platforms as well, since we would do
without win32 specific code for that (I seem to recall there being
discussions about other platforms needing it as well - and the guy who
did it didn't do it for win32, so there is at least some..)

I was planning to test the ICU patch on win32 to see that it works at
all, but I haven't had the time to do that just yet.

//Magnus

Show quoted text

Where are we on this? As far as I can tell, we never disabled UTF8 on
Win32 in our code. The only thing we did do was to disable UTF8 in
pginstaller. See this FAQ item:

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.6

Is the current setup OK? Should we allow UTF8 on Win32 for languages
that can use C locale, like Asian languages?

---------------------------------------------------------------
------------

Tatsuo Ishii wrote:

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other

functions does not

work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so

on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha@sollentuna.net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID:

<6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se>

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on

win32, it

needs to be converted to UTF16 and use the wide-character

versions of

the fucntion. Which we do not do.
(See

http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php

and

http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg0
0106.php)

I don't *think* we need to disable ito n the client.

AFAIK, the client

interfaces don't use any of these functions, and I've seen

reports of

people using that long before we had a native win32 server.

//Magnus

-----Original Message-----
From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
Sent: den 1 januari 2005 01:10
To: tgl@sss.pgh.pa.us
Cc: Magnus Hagander; pgsql-hackers-win32@postgresql.org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32

Sorry, but I don't subscribe to pgsql-hackers-win32 list.

What's the

problem here?
--
Tatsuo Ishii

"Magnus Hagander" <mha@sollentuna.net> writes:

We know it's broken and won't be fixed for 8.0.

If we just #ifndef WIN32 the definitions in

utils/mb/encnames.c it won't

be possible to select that encoding, right? Will that have

any other

unwanted effects (such as breaking client encodings)? If

not, I suggest

this is done.

I believe the subscripts in those arrays have to match

the encoding

enum type, so you can't just ifdef out individual entries.

(Or perhaps something can be done in

pg_valid_server_encoding?)

Making the valid_server_encoding function reject it might work.
Tatsuo-san would know for sure.

Should we also reject it as a client encoding, or does

that work OK?

regards, tom lane

---------------------------(end of

broadcast)---------------------------

TIP 1: subscribe and unsubscribe commands go to

majordomo@postgresql.org

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, 
Pennsylvania 19073