libpq Unicode support?

Started by Ale Razaalmost 21 years ago18 messagesgeneral
Jump to latest
#1Ale Raza
araza@esri.com

Wondering if libpq lib support unicode?

Ale.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ale Raza (#1)
Re: libpq Unicode support?

Ale Raza <araza@esri.com> writes:

Wondering if libpq lib support unicode?

What sort of "support" have you got in mind? It passes UTF-8 data
through just fine.

regards, tom lane

#3Ale Raza
araza@esri.com
In reply to: Tom Lane (#2)
Re: libpq Unicode support?

Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special
build of libpq for UTF-16. I did not build libpq locally.

Ale

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 22, 2005 11:10 AM
To: Ale Raza
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] libpq Unicode support?

Ale Raza <araza@esri.com> writes:

Wondering if libpq lib support unicode?

What sort of "support" have you got in mind? It passes UTF-8 data
through just fine.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ale Raza (#3)
Re: libpq Unicode support?

Ale Raza <araza@esri.com> writes:

Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special
build of libpq for UTF-16. I did not build libpq locally.

Nope, you're out of luck on UTF-16.

regards, tom lane

#5Bruce Momjian
bruce@momjian.us
In reply to: Ale Raza (#3)
Re: libpq Unicode support?

Ale Raza wrote:

Tom, Thanks for reply. I want to pass UTF-16 data. Is there any special
build of libpq for UTF-16. I did not build libpq locally.

We do not support UTF-16 at this time. Hopefully we will in 8.1.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#5)
Re: libpq Unicode support?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

We do not support UTF-16 at this time. Hopefully we will in 8.1.

Oh? Who's working on it, or even interested? Was there discussion
of adding it to TODO?

I think it would be an extremely nontrivial change, which is why
I am not pleased with making casual promises that it will appear
soon (or indeed at all).

regards, tom lane

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: libpq Unicode support?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

We do not support UTF-16 at this time. Hopefully we will in 8.1.

Oh? Who's working on it, or even interested? Was there discussion
of adding it to TODO?

I think it would be an extremely nontrivial change, which is why
I am not pleased with making casual promises that it will appear
soon (or indeed at all).

TODO has:

o Add support for Unicode

To fix this, the data needs to be converted to/from UTF16/UTF8
so the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering or character set classes.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: libpq Unicode support?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Oh? Who's working on it, or even interested? Was there discussion
of adding it to TODO?

TODO has:

o Add support for Unicode

To fix this, the data needs to be converted to/from UTF16/UTF8
so the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering or character set classes.

That's completely unrelated --- it's talking about making correct use of
Windows' locale support in one small bit inside the server.

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char. After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server). I don't
foresee anyone doing any of this, at least not in the near term.

Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
more practical.

regards, tom lane

#9Ale Raza
araza@esri.com
In reply to: Tom Lane (#8)
Re: libpq Unicode support?

Are we not going to lose some characters if we are putting a UTF-16 to UTF-8
translation in front of libpq?

Ale.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 22, 2005 12:14 PM
To: Bruce Momjian
Cc: Ale Raza; pgsql-general@postgresql.org
Subject: Re: [GENERAL] libpq Unicode support?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Oh? Who's working on it, or even interested? Was there discussion
of adding it to TODO?

TODO has:

o Add support for Unicode

To fix this, the data needs to be converted to/from UTF16/UTF8
so the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering or character set classes.

That's completely unrelated --- it's talking about making correct use of
Windows' locale support in one small bit inside the server.

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char. After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server). I don't
foresee anyone doing any of this, at least not in the near term.

Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
more practical.

regards, tom lane

#10Ben
bench@silentmedia.com
In reply to: Ale Raza (#9)
Re: libpq Unicode support?

Why would you? UTF-16 and UTF-8 are just different representations for the
same domain of characters.

On Fri, 22 Apr 2005, Ale Raza wrote:

Show quoted text

Are we not going to lose some characters if we are putting a UTF-16 to UTF-8
translation in front of libpq?

Ale.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 22, 2005 12:14 PM
To: Bruce Momjian
Cc: Ale Raza; pgsql-general@postgresql.org
Subject: Re: [GENERAL] libpq Unicode support?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Oh? Who's working on it, or even interested? Was there discussion
of adding it to TODO?

TODO has:

o Add support for Unicode

To fix this, the data needs to be converted to/from UTF16/UTF8
so the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering or character set classes.

That's completely unrelated --- it's talking about making correct use of
Windows' locale support in one small bit inside the server.

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char. After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server). I don't
foresee anyone doing any of this, at least not in the near term.

Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
more practical.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Ale Raza (#9)
Re: libpq Unicode support?

Ale Raza wrote:

Are we not going to lose some characters if we are putting a UTF-16
to UTF-8 translation in front of libpq?

No, they are just different encodings of the same character set.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#12Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Ale Raza (#9)
Re: libpq Unicode support?

Tom Lane wrote:

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char. After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server). I don't
foresee anyone doing any of this, at least not in the near term.

Is there any *real* loss of functionality in not supporting
UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
should there be a FAQ item saying why not ?

Thanks for a great database,
Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: libpq Unicode support?

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Oh? Who's working on it, or even interested? Was there discussion
of adding it to TODO?

TODO has:

o Add support for Unicode

To fix this, the data needs to be converted to/from UTF16/UTF8
so the Win32 wcscoll() can be used, and perhaps other functions
like towupper(). However, UTF8 already works with normal
locales but provides no ordering or character set classes.

That's completely unrelated --- it's talking about making correct use of
Windows' locale support in one small bit inside the server.

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char. After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server). I don't
foresee anyone doing any of this, at least not in the near term.

Putting a UTF-16 to UTF-8 translation in front of libpq seems a lot
more practical.

So the Win32 fix and the libpq translation are two different issues.
Hmm.

Agreed we don't want to support both UTF8 and UTF16 in the backend.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Bruce Momjian
bruce@momjian.us
In reply to: Karsten Hilbert (#12)
Re: libpq Unicode support?

Karsten Hilbert wrote:

Tom Lane wrote:

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char. After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server). I don't
foresee anyone doing any of this, at least not in the near term.

Is there any *real* loss of functionality in not supporting

UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
should there be a FAQ item saying why not ?

Is there a reason you have to use UTF16? Can't you convert to UTF8 on
input? (I have no idea myself.) Do other databases support both UTf8
and UTF16?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#15Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Bruce Momjian (#14)
Re: libpq Unicode support?

On Fri, Apr 22, 2005 at 05:28:28PM -0400, Bruce Momjian wrote:

UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
should there be a FAQ item saying why not ?

Is there a reason you have to use UTF16?

No. I don't currently use either one (that is I am using a
"unicode" database with appropriate "set client_encoding"s
which works as expected. I am just wondering whether we should
add a FAQ item why UTF16 doesn't need to be supported.

Can't you convert to UTF8 on input?

I likely could would I have to.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

#16Bruce Momjian
bruce@momjian.us
In reply to: Karsten Hilbert (#15)
Re: libpq Unicode support?

Karsten Hilbert wrote:

On Fri, Apr 22, 2005 at 05:28:28PM -0400, Bruce Momjian wrote:

UTF-16 ? If so *should* it be supported in, say, 9.0 ? If not,
should there be a FAQ item saying why not ?

Is there a reason you have to use UTF16?

No. I don't currently use either one (that is I am using a
"unicode" database with appropriate "set client_encoding"s
which works as expected. I am just wondering whether we should
add a FAQ item why UTF16 doesn't need to be supported.

Well, we need to support UTF16 on Win32 only because Win32 libc
libraries doesn't support UTF8, but other than that UTF16 isn't much of
an issue for our users.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Karsten Hilbert (#12)
Re: libpq Unicode support?

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:

Tom Lane wrote:

To make libpq UTF-16 capable, we'd have to change its API for all
strings; either make the strings counted rather than null-terminated,
or make the string elements wchar instead of char. After that we'd
have to hack the FE/BE protocol too (or more likely, require libpq
to translate UTF-16 to UTF-8 before sending to the server). I don't
foresee anyone doing any of this, at least not in the near term.

Is there any *real* loss of functionality in not supporting
UTF-16 ?

Functionality, no: UTF-16 and UTF-8 are functionally equivalent by definition.

I think the reason that it's started to come up lately is that Windows
supports UTF-16 better than UTF-8 (whereas the reverse is true on most
Unixish platforms).

If libpq were the only available API then I'd be more concerned about
making it handle this somehow. But if you're working in, say, Java
then this issue is all taken care of for you anyway. There are enough
other Unix-centricities in libpq that this hardly seems the worst.

Possibly someone will be motivated to start a project to design a
Windows client library from scratch ...

regards, tom lane

#18David Roussel
pgsql-performance@diroussel.xsmail.com
In reply to: Bruce Momjian (#14)
Re: libpq Unicode support?

Do other databases support both UTf8 and UTF16?

Oracle supports UTF-8, UTF-16 an some other special UFT encodings. I
think some of them are pre UTF-8 becoming ratified, hence they are
partially compatible.

It's an install time option for an Oracle database. ASCII databases
can be upgraded to UTF-8, but not vice versa, and it affects all
schema's in the database.

I had an oracle system that was non-unicode, some body wanted to
support the euro currency symbol. We tried it, it inserted fine, but
came back in a select as another character. The only option was custom
escaping all over the place, or migrating oracle. Given the amount of
regression testing that would be needed for all the apps on the oracle
system (200 users, 16 processor box, billions of dollars worth of
transactions) it was not worth the effort. People had to type 'EUR'
instead of €.