client encodings

Started by Dennis Björklundover 22 years ago3 messages
#1Dennis Björklund
db@zigo.dhs.org

I've fixed the problems in psql that was there before:

* psql alters the strings in a PQresult
* psql sends non validating strings to server

This is however not the solution to the general problem with client
encodings. When you normally run psql in a terminal, the encoding used by
that terminal is the only reasonable encoding one can use. However, if you
redirect the output you very well might want to produce a utf-8 file even
if the terminal does not suppert it. So it could be usable to change the
client encoding in psql.

However, if you want to produce a utf-8 file, how should that work with
respect to gettext()? If the message catalog is in latin1 then we need to
know that and convert that into utf-8.

The easiest way as I see it is to demand that all po files are stored in
utf-8 and then you can convert that into whatever client encoding you have
set in psql. Of course you can't make that translation lossfree in
general, but if you have a language that demands some characters that
don't exist in the target charset you have lost anyway. The best you can
do is to convert it to something similar (or even just through it away).

To store all po files as utf-8 is not a big problem. The translator can
very well still work using some other charset and then you use iconv to
convert it before checking it in. As long as you don't change that file
(and use other characters) the translator can later on use iconv again to
get it back to his charset. The good thing about this is that psql knows
what charset all strings are in and can convert when needed.

Would it be acceptable to have all po-files as utf-8?

--
/Dennis

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Dennis Björklund (#1)
Re: client encodings

Dennis Bjᅵrklund writes:

However, if you want to produce a utf-8 file, how should that work with
respect to gettext()? If the message catalog is in latin1 then we need to
know that and convert that into utf-8.

I don't think all gettext implementations support automatic character set
conversion. We might have to roll our own sometime, but for now it's not
an option.

--
Peter Eisentraut peter_e@gmx.net

#3Dennis Björklund
db@zigo.dhs.org
In reply to: Peter Eisentraut (#2)
Re: client encodings

On Mon, 16 Jun 2003, Peter Eisentraut wrote:

However, if you want to produce a utf-8 file, how should that work with
respect to gettext()? If the message catalog is in latin1 then we need to
know that and convert that into utf-8.

I don't think all gettext implementations support automatic character set
conversion.

I agree. They don't.

We might have to roll our own sometime

That was why I asked if we could simply have all message catalogs as
utf-8, then we know what charset the strings are in and can easily convert
it to whatever we have set our client encoding to.

but for now it's not an option.

What has to be decided is if we are going to generate output that is only
in the client encoding or not. If you just output the strings in the
message catalog then we will not produce validating output. Then the best
thing we can do is simply to take the message catalog string and discard
everything that does not work in the current client encoding.

--
/Dennis