Determining client_encoding from client locale
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.
client_encoding defaults to server_encoding, which is correct in the
typical environment where the client and the server have identical
locale settings, which I believe is why we don't see more confused users
on mailing lists. However, a partner of ours was recently bitten by
this. That was on Windows; I'm not 100% sure if LC_CTYPE is set
correctly there by default, but this seems like a good idea nevertheless.
We could expand that to datestyle and the user-settable lc_* settings,
but I don't want to go that far. In case the server lc_ctype/collate
settings don't match the client's locale, you would end up with mixed
lc_* settings which might be more confusing than helpful.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
client_encoding defaults to server_encoding, which is correct in the
typical environment where the client and the server have identical
locale settings, which I believe is why we don't see more confused
users on mailing lists. However, a partner of ours was recently bitten
by this. That was on Windows; I'm not 100% sure if LC_CTYPE is set
correctly there by default, but this seems like a good idea nevertheless.
IIRC Windows locales are not set via the environment. We've had to do
some special hackery in a few placed to deal with that.
cheers
andrew
On Wednesday 17 June 2009 14:29:26 Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.
I have been requesting that for years, but the Japanese users/developers
typically objected to that. I think it's time to relaunch the campain,
though.
Peter Eisentraut <peter_e@gmx.net> writes:
On Wednesday 17 June 2009 14:29:26 Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.
I have been requesting that for years, but the Japanese users/developers
typically objected to that. I think it's time to relaunch the campain,
though.
I think at least part of the issue is lack of confidence in our code for
extracting an encoding setting from the locale environment. Do we
really think it's solid now, on all platforms? The current uses of
pg_get_encoding_from_locale are all designed to put little faith in it,
and what's more it's had exactly zero non-beta field experience.
regards, tom lane
Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.
+1
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Jun 17, 2009 at 4:54 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:
Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.+1
I wonder if isatty() is true and we have terminfo information if
there's a terminfo capability to query the terminal for the correct
encoding.
But yeah, +1 to automatically using the user's current encoding from LC_CTYPE.
--
Gregory Stark
Peter Eisentraut <peter_e@gmx.net> wrote:
On Wednesday 17 June 2009 14:29:26 Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.
+1 for psql, but -1 for libpq.
I think automatic determination is good for psql because it is
an end-user application, but is not always acceptable for middlewares.
Please imagine:
Web Server <- Application Server <- Database Server
---------- ------------------ ---------------
UTF-8 Non-UTF8 env. UTF-8
The Application Server might run on non-UTF8 environment
but it should send outputs in UTF8 encoding. Automatic
encoding determination might break existing services.
I have been requesting that for years, but the Japanese users/developers
typically objected to that. I think it's time to relaunch the campain,
though.
I assume that it is not a Japanese-specific problem and just because
they use multiple encodings. Encodings of OSes in Japan are often SJIS
or EUC_JP, but UTF8 is well-used in web-services and databases.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
Peter Eisentraut <peter_e@gmx.net> wrote:
On Wednesday 17 June 2009 14:29:26 Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.
+1 for psql, but -1 for libpq.
What would make sense to me is for libpq to provide the *code* for this,
but then leave it up to the client application whether to actually call
it; if not the behavior stays the same as before. Aside from
Itagaki-san's objections, that eliminates backwards-compatibility issues
for other applications.
regards, tom lane
Itagaki Takahiro wrote:
Peter Eisentraut <peter_e@gmx.net> wrote:
On Wednesday 17 June 2009 14:29:26 Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.+1 for psql, but -1 for libpq.
I think automatic determination is good for psql because it is
an end-user application, but is not always acceptable for middlewares.Please imagine:
Web Server <- Application Server <- Database Server
---------- ------------------ ---------------
UTF-8 Non-UTF8 env. UTF-8The Application Server might run on non-UTF8 environment
but it should send outputs in UTF8 encoding. Automatic
encoding determination might break existing services.
As soon as someone creates a database in non-UTF-8 encoding in the
cluster, it would stop working anyway. Setting client_encoding=utf8
manually would be a lot safer in a situation like that.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote:
Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
Peter Eisentraut <peter_e@gmx.net> wrote:
On Wednesday 17 June 2009 14:29:26 Heikki Linnakangas wrote:
We currently require that you set client_encoding correctly, or you get
garbage in psql and any other tool using libpq. How about setting
client_encoding automatically to match the client's locale? We have
pg_get_encoding_from_locale() function that we can use to extract the
encoding from LC_CTYPE. We could call that in libpq.+1 for psql, but -1 for libpq.
What would make sense to me is for libpq to provide the *code* for this,
but then leave it up to the client application whether to actually call
it; if not the behavior stays the same as before. Aside from
Itagaki-san's objections, that eliminates backwards-compatibility issues
for other applications.
Added to TODO:
Add code to detect client encoding and locale from the operating system
environment
* http://archives.postgresql.org/pgsql-hackers/2009-06/msg01040.php
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
What would make sense to me is for libpq to provide the *code* for this,
but then leave it up to the client application whether to actually call
it; if not the behavior stays the same as before. Aside from
Itagaki-san's objections, that eliminates backwards-compatibility issues
for other applications.
Added to TODO:
BTW, something that occurred to me later is that the details of this
could easily be got wrong. If libpq is indeed told to get
client_encoding from the client environment, it should arrange to do so
*before* opening the connection, and send the encoding request as part
of the startup packet. The alternative of providing a function to
adjust the encoding for an already-opened connection is inferior for
a couple of reasons:
* extra network round trip required
* we lose any chance at ensuring that connection failure messages come
back in the client's desired encoding.
(The latter business was already discussed a bit IIRC, but I'm too lazy
to check the archives right now.)
So that means that the API for this should probably involve some
addition to the PQconnectdb parameter string, not a separate function.
regards, tom lane
Here's my first attempt at setting client_encoding automatically from
locale.
It adds a new conninfo parameter to libpq, "client_encoding". If set to
"auto", libpq uses the encoding as returned by
pg_get_encoding_from_locale(). Any other value is passed through to the
server as is.
psql is modified to set "client_encoding=auto", unless overridden by
PGCLIENTENCODING.
BTW, I had to modify psql to use PQconnectdb() instead of
PQsetdblogin(), so that it can pass the extra parameter. I found it a
bit laboursome to construct the conninfo string with proper escaping,
just to have libpq parse and split it into components again. Could we
have a version of PQconnectdb() with an API more suited for setting the
params programmatically? The PQsetdbLogin() approach doesn't scale as
parameters are added/removed in future versions, but we could have
something like this:
PGconn *PQconnectParams(const char **params)
Where "params" is an array with an even number of parameters, forming
key/value pairs. Usage example:
char *connparams[] = {
"dbname", "mydb",
"user", username,
NULL /* terminate with NULL */
};
conn = PQconnectParams(connparams);
This is similar to what I did internally in psql in the attached patch.
Another idea is to use an array of PQconninfoOption structs:
PQconn *PQconnectParams(PQconninfoOption *params);
This would be quite natural since that's the format returned by
PQconnDefaults() and PQconninfoParse(), but a bit more cumbersome to use
in applications that don't use those functions, as in the previous example.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
client_encoding-from-locale-1.patchtext/x-diff; name=client_encoding-from-locale-1.patchDownload+196-18
On Mon, Jul 6, 2009 at 10:00 AM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
Here's my first attempt at setting client_encoding automatically from
locale.It adds a new conninfo parameter to libpq, "client_encoding". If set to
"auto", libpq uses the encoding as returned by
pg_get_encoding_from_locale(). Any other value is passed through to the
server as is.
i was trying to test this and make a simple program based on the first
libpq example that only shows client_encoding
this little test compiles fine until i applied your patch :(
postgres@casanova1:~/pg_releases/pgtests$ gcc -o test-libpq
test-libpq.o -L/usr/local/pgsql/head/lib -lpq
/usr/local/pgsql/head/lib/libpq.so: undefined reference to
`pg_get_encoding_from_locale'
collect2: ld returned 1 exit status
just in case i attached the test program.
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
Attachments:
test-libpq.ctext/x-csrc; charset=US-ASCII; name=test-libpq.cDownload
Jaime Casanova wrote:
this little test compiles fine until i applied your patch :(
postgres@casanova1:~/pg_releases/pgtests$ gcc -o test-libpq
test-libpq.o -L/usr/local/pgsql/head/lib -lpq
/usr/local/pgsql/head/lib/libpq.so: undefined reference to
`pg_get_encoding_from_locale'
Do you have an older version of libpq.so around?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Jul 22, 2009 at 7:30 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:
Jaime Casanova wrote:
this little test compiles fine until i applied your patch :(
postgres@casanova1:~/pg_releases/pgtests$ gcc -o test-libpq
test-libpq.o -L/usr/local/pgsql/head/lib -lpq
/usr/local/pgsql/head/lib/libpq.so: undefined reference to
`pg_get_encoding_from_locale'Do you have an older version of libpq.so around?
the one that installed with 8.4.0 but i thougth that when you specify
-L to gcc you're telling it where to pick libraries from, no?
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
On Wed, Jul 22, 2009 at 9:58 PM, Jaime
Casanova<jcasanov@systemguards.com.ec> wrote:
On Wed, Jul 22, 2009 at 7:30 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:Jaime Casanova wrote:
this little test compiles fine until i applied your patch :(
postgres@casanova1:~/pg_releases/pgtests$ gcc -o test-libpq
test-libpq.o -L/usr/local/pgsql/head/lib -lpq
/usr/local/pgsql/head/lib/libpq.so: undefined reference to
`pg_get_encoding_from_locale'Do you have an older version of libpq.so around?
the one that installed with 8.4.0 but i thougth that when you specify
-L to gcc you're telling it where to pick libraries from, no?
more to the point when i used unpatched 8.5 tree it works just fine
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
On Thursday 23 July 2009 02:29:23 Jaime Casanova wrote:
this little test compiles fine until i applied your patch :(
postgres@casanova1:~/pg_releases/pgtests$ gcc -o test-libpq
test-libpq.o -L/usr/local/pgsql/head/lib -lpq
/usr/local/pgsql/head/lib/libpq.so: undefined reference to
`pg_get_encoding_from_locale'
collect2: ld returned 1 exit status
libpq fails to link in chklocale.c.
Jaime Casanova <jcasanov@systemguards.com.ec> writes:
On Wed, Jul 22, 2009 at 7:30 PM, Alvaro
Herrera<alvherre@commandprompt.com> wrote:Do you have an older version of libpq.so around?
the one that installed with 8.4.0 but i thougth that when you specify
-L to gcc you're telling it where to pick libraries from, no?
On most Linux systems, -L doesn't have any effect on what happens at
runtime --- the dynamic linker's search path will determine that.
Try "ldd" on the executable to see which shlibs really get picked up.
regards, tom lane
On Thu, Jul 23, 2009 at 11:02 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
On most Linux systems, -L doesn't have any effect on what happens at
runtime --- the dynamic linker's search path will determine that.
Try "ldd" on the executable to see which shlibs really get picked up.
yeah! it's using the one that ships with 8.4.0
postgres@casanova1:~/pg_releases/pgtests$ ldd test-libpq
[...other no related libraries...]
libpq.so.5 => /opt/PostgreSQL/8.4/lib/libpq.so.5 (0x00007f7ef6db2000)
The only way i can compile with the patched version of libpq is with this
gcc -o test-libpq test-libpq.o -L../pgsql/src/port -lpgport
-L../pgsql/src/interfaces/libpq -lpq -L../pgsql/src/port
-Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/head/lib' -lpgport
BTW, i can compile with the unpatched version if i add -lpgport (seems
like this patch is adding a dependency)
gcc -o test-libpq test-libpq.o -L/usr/local/pgsql/head/lib -lpq -lpgport
--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157
On Thursday 23 July 2009 20:16:39 Jaime Casanova wrote:
On Thu, Jul 23, 2009 at 11:02 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
On most Linux systems, -L doesn't have any effect on what happens at
runtime --- the dynamic linker's search path will determine that.
Try "ldd" on the executable to see which shlibs really get picked up.yeah! it's using the one that ships with 8.4.0
postgres@casanova1:~/pg_releases/pgtests$ ldd test-libpq
[...other no related libraries...]
libpq.so.5 => /opt/PostgreSQL/8.4/lib/libpq.so.5 (0x00007f7ef6db2000)The only way i can compile with the patched version of libpq is with this
gcc -o test-libpq test-libpq.o -L../pgsql/src/port -lpgport
-L../pgsql/src/interfaces/libpq -lpq -L../pgsql/src/port
-Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/head/lib' -lpgportBTW, i can compile with the unpatched version if i add -lpgport (seems
like this patch is adding a dependency)
gcc -o test-libpq test-libpq.o -L/usr/local/pgsql/head/lib -lpq -lpgport
Which proves my point, because libpgport includes chkconfig.c. But this is
just a workaround.