garbage in psql -l

Started by Oleg Bartunovover 16 years ago32 messageshackers
Jump to latest
#1Oleg Bartunov
oleg@sai.msu.su

Hi there,

I have problem with CVS HEAD (noticed a week or so ago) -
psql -l show garbage instead of -|+. Looks, like utf-8 symbols used
instead that ascii characters.

List of databases
Name Б■┌ Owner Б■┌ Encoding Б■┌ Collation Б■┌ Ctype Б■┌ Access privileges
Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■
╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─
contrib_regression Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌
nomao Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌
postgres Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌
template0 Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌ =c/postgres
Б∙╥ Б∙╥ Б∙╥ Б∙╥ Б∙▌ postgres=CTc/postgres

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#1)
Re: garbage in psql -l

Oleg Bartunov <oleg@sai.msu.su> writes:

I have problem with CVS HEAD (noticed a week or so ago) -
psql -l show garbage instead of -|+. Looks, like utf-8 symbols used
instead that ascii characters.

Hm, you only see it for -l and not for all tabular output? That's
a bit strange.

regards, tom lane

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#2)
Re: garbage in psql -l

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

I have problem with CVS HEAD (noticed a week or so ago) -
psql -l show garbage instead of -|+. Looks, like utf-8 symbols used
instead that ascii characters.

Hm, you only see it for -l and not for all tabular output? That's
a bit strange.

yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.

pg-head@zen:~/cvs/HEAD/pgsql$ ldd /usr/local/pgsql-head/bin/psql
linux-gate.so.1 => (0xffffe000)
libpq.so.5 => /usr/local/pgsql-head/lib/libpq.so.5 (0xb7f33000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7ef8000)
libreadline.so.5 => /usr/lib/libreadline.so.5 (0xb7ec8000)
libtermcap.so.2 => /lib/libtermcap.so.2 (0xb7ec4000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7e92000)
libdl.so.2 => /lib/libdl.so.2 (0xb7e8d000)
libm.so.6 => /lib/libm.so.6 (0xb7e67000)
libc.so.6 => /lib/libc.so.6 (0xb7d07000)
/lib/ld-linux.so.2 (0xb7f4f000)

regards, tom lane

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#3)
Re: garbage in psql -l

Oleg Bartunov <oleg@sai.msu.su> writes:

On Tue, 24 Nov 2009, Tom Lane wrote:

Hm, you only see it for -l and not for all tabular output? That's
a bit strange.

yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.

What's your locale environment? ("env | grep ^L" would help.)

regards, tom lane

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#4)
Re: garbage in psql -l

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

On Tue, 24 Nov 2009, Tom Lane wrote:

Hm, you only see it for -l and not for all tabular output? That's
a bit strange.

yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.

What's your locale environment? ("env | grep ^L" would help.)

LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-R

I had no problem with this.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#5)
Re: garbage in psql -l

Oleg Bartunov <oleg@sai.msu.su> writes:

On Tue, 24 Nov 2009, Tom Lane wrote:

What's your locale environment? ("env | grep ^L" would help.)

LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-R

Hmm, I can duplicate the fact that psql -l uses utf8 characters
(because it connects to the postgres DB which has utf8 encoding)
but for me, ordinary selects within psql use the utf8 characters
too. Do you perhaps have something in ~/.psqlrc to force a different
client encoding?

regards, tom lane

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Oleg Bartunov (#5)
Re: garbage in psql -l

On tis, 2009-11-24 at 21:32 +0300, Oleg Bartunov wrote:

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

On Tue, 24 Nov 2009, Tom Lane wrote:

Hm, you only see it for -l and not for all tabular output? That's
a bit strange.

yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.

What's your locale environment? ("env | grep ^L" would help.)

LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-R

I had no problem with this.

Seems like a mismatch between client encoding and actual locale
environment.

#8Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#6)
Re: garbage in psql -l

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

On Tue, 24 Nov 2009, Tom Lane wrote:

What's your locale environment? ("env | grep ^L" would help.)

LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-R

Hmm, I can duplicate the fact that psql -l uses utf8 characters
(because it connects to the postgres DB which has utf8 encoding)
but for me, ordinary selects within psql use the utf8 characters
too. Do you perhaps have something in ~/.psqlrc to force a different
client encoding?

yes,
set client_encoding to KOI8;

but it never hurts me ! I tried to comment it, but it doesn't helped.
Notice, psql from 8.4 works nice.

regards, tom lane

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#9Oleg Bartunov
oleg@sai.msu.su
In reply to: Peter Eisentraut (#7)
Re: garbage in psql -l

On Tue, 24 Nov 2009, Peter Eisentraut wrote:

On tis, 2009-11-24 at 21:32 +0300, Oleg Bartunov wrote:

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

On Tue, 24 Nov 2009, Tom Lane wrote:

Hm, you only see it for -l and not for all tabular output? That's
a bit strange.

yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.

What's your locale environment? ("env | grep ^L" would help.)

LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-R

I had no problem with this.

Seems like a mismatch between client encoding and actual locale
environment.

why 8.4 has no real problem ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Oleg Bartunov (#9)
Re: garbage in psql -l

On tis, 2009-11-24 at 21:55 +0300, Oleg Bartunov wrote:

Seems like a mismatch between client encoding and actual locale
environment.

why 8.4 has no real problem ?

Because table formatting with Unicode characters is a new feature.

Anyway, that patch to set the client encoding automatically from the
locale sounds even more useful now.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#9)
Re: garbage in psql -l

Oleg Bartunov <oleg@sai.msu.su> writes:

why 8.4 has no real problem ?

Because we never tried to use utf8 table decoration before. This
is collateral damage from Roger Leigh's recent patches.

The problem is evidently that Oleg is depending on ~/.psqlrc to
set client_encoding the way he wants it, but that file does not
get read for a "psql -l" invocation. (Probably not for -c either.)

The locale environment really isn't at issue because we do not look
at it to establish client encoding. Perhaps Oleg should be setting
PGCLIENTENCODING instead of depending on ~/.psqlrc, but I suspect
he's not the only one doing it that way.

There has been some talk of altering the rules for setting psql's
default client_encoding. We could think about that, or we could
back off trying to use linestyle=unicode without an explicit setting.
If we do neither, I suspect we'll be hearing more complaints. I'll
bet there are lots of people who are using database encoding = UTF8
but don't actually have unicode-capable terminal setups. It's never
hurt them before, especially not if they aren't really storing any
non-ASCII data.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#10)
Re: garbage in psql -l

Peter Eisentraut <peter_e@gmx.net> writes:

Anyway, that patch to set the client encoding automatically from the
locale sounds even more useful now.

I think you're being overoptimistic to assume that that's going to
eliminate the issue. It might patch things for Oleg's particular
configuration; but the real problem IMO is that people are depending
on ~/.psqlrc to set encoding/locale related behavior, and that file
isn't read before executing -l/-c (not to mention -X).

I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.

regards, tom lane

#13Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#11)
Re: garbage in psql -l

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

why 8.4 has no real problem ?

Because we never tried to use utf8 table decoration before. This
is collateral damage from Roger Leigh's recent patches.

The problem is evidently that Oleg is depending on ~/.psqlrc to
set client_encoding the way he wants it, but that file does not
get read for a "psql -l" invocation. (Probably not for -c either.)

The locale environment really isn't at issue because we do not look
at it to establish client encoding. Perhaps Oleg should be setting
PGCLIENTENCODING instead of depending on ~/.psqlrc, but I suspect
he's not the only one doing it that way.

yes, PGCLIENTENCODING=KOI8 psql -l works as it should be

There has been some talk of altering the rules for setting psql's
default client_encoding. We could think about that, or we could
back off trying to use linestyle=unicode without an explicit setting.
If we do neither, I suspect we'll be hearing more complaints. I'll
bet there are lots of people who are using database encoding = UTF8
but don't actually have unicode-capable terminal setups. It's never
hurt them before, especially not if they aren't really storing any
non-ASCII data.

what's benefit of using linestyle=unicode ? I like old ASCII style
for console.

regards, tom lane

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#13)
Re: garbage in psql -l

Oleg Bartunov <oleg@sai.msu.su> writes:

what's benefit of using linestyle=unicode ? I like old ASCII style
for console.

Well, I have to grant that it looks pretty spiffy on a unicode-enabled
display. Whether that's enough reason to risk breaking things for
people with non-unicode-enabled displays is certainly worth debating.

Maybe we should just make the default be linestyle=ascii all the time,
and tell people to turn it on in their ~/.psqlrc if they want it.

regards, tom lane

#15Oleg Bartunov
oleg@sai.msu.su
In reply to: Tom Lane (#14)
Re: garbage in psql -l

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

what's benefit of using linestyle=unicode ? I like old ASCII style
for console.

Well, I have to grant that it looks pretty spiffy on a unicode-enabled
display. Whether that's enough reason to risk breaking things for
people with non-unicode-enabled displays is certainly worth debating.

Maybe we should just make the default be linestyle=ascii all the time,
and tell people to turn it on in their ~/.psqlrc if they want it.

+1

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#16Robert Haas
robertmhaas@gmail.com
In reply to: Oleg Bartunov (#15)
Re: garbage in psql -l

On Tue, Nov 24, 2009 at 4:49 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Tue, 24 Nov 2009, Tom Lane wrote:

Oleg Bartunov <oleg@sai.msu.su> writes:

what's benefit of using linestyle=unicode ? I like old ASCII style
for console.

Well, I have to grant that it looks pretty spiffy on a unicode-enabled
display.  Whether that's enough reason to risk breaking things for
people with non-unicode-enabled displays is certainly worth debating.

Maybe we should just make the default be linestyle=ascii all the time,
and tell people to turn it on in their ~/.psqlrc if they want it.

+1

+1.

...Robert

#17Roger Leigh
rleigh@codelibre.net
In reply to: Tom Lane (#12)
Re: garbage in psql -l

On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Anyway, that patch to set the client encoding automatically from the
locale sounds even more useful now.

I think you're being overoptimistic to assume that that's going to
eliminate the issue. It might patch things for Oleg's particular
configuration; but the real problem IMO is that people are depending
on ~/.psqlrc to set encoding/locale related behavior, and that file
isn't read before executing -l/-c (not to mention -X).

I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.

This problem is caused when there's a mismatch between the
client encoding and the user's locale. We can detect this at
runtime and fall back to ASCII if we know they are incompatible.

Why don't we combine the two approaches we looked at so far:
1) The PG client encoding is UTF-8
2) The user's locale codeset (from nl_langinfo(CODESET)) is UTF-8

If *both* the conditions are satisfied simultaneously then we
are guaranteed that things will display correctly given what
the user has told us they wanted. If only one is satisfied then
we remain using ASCII and problems such as the non-UTF-8-locale
mis-display seen here are avoided, while still allowing Unicode
display for users who have a UTF-8 locale as well as a UTF-8
client encoding (such as myself ;-)

This should be a one-liner patch to update the existing check.

Regards,
Roger

--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roger Leigh (#17)
Re: garbage in psql -l

Roger Leigh <rleigh@codelibre.net> writes:

On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:

I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.

This problem is caused when there's a mismatch between the
client encoding and the user's locale. We can detect this at
runtime and fall back to ASCII if we know they are incompatible.

Well, no, that is *one* of the possible failure modes. I've hit others
already in the short time that the patch has been installed. The one
that's bit me most is that the locale environment seen by psql doesn't
necessarily match what my xterm at the other end of an ssh connection
is prepared to do --- which is something that psql simply doesn't have
a way to detect. Again, this is something that's never mattered before
unless one was really pushing non-ASCII data around, and even then it
was often possible to be sloppy.

I'd be more excited about finding a way to use linestyle=unicode by
default if it had anything beyond cosmetic benefits. But it doesn't,
and it's hard to justify ratcheting up the requirements for users to get
their configurations exactly straight when that's all they'll get for it.

regards, tom lane

#19Roger Leigh
rleigh@codelibre.net
In reply to: Tom Lane (#18)
Re: garbage in psql -l

On Tue, Nov 24, 2009 at 05:43:00PM -0500, Tom Lane wrote:

Roger Leigh <rleigh@codelibre.net> writes:

On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:

I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.

This problem is caused when there's a mismatch between the
client encoding and the user's locale. We can detect this at
runtime and fall back to ASCII if we know they are incompatible.

Well, no, that is *one* of the possible failure modes. I've hit others
already in the short time that the patch has been installed. The one
that's bit me most is that the locale environment seen by psql doesn't
necessarily match what my xterm at the other end of an ssh connection
is prepared to do --- which is something that psql simply doesn't have
a way to detect. Again, this is something that's never mattered before
unless one was really pushing non-ASCII data around, and even then it
was often possible to be sloppy.

Sure, but this type of misconfiguration is entirely outside the
purview of psql. Everything else on the system, from man(1) to gcc
emacs and vi will be sending UTF-8 codes to your terminal for any
non-ASCII character they display. While psql using UTF-8 for its
tables is certainly exposing the problem, in reality it was already
broken, and it's not psql's "fault" for using functionality the
system said was available. It would equally break if you stored
non-ASCII characters in your UTF-8-encoded database and then ran
a SELECT query, since UTF-8 codes would again be sent to the
terminal.

For the specific case here, where the locale is KOI8-R, we can
determine at runtime that this isn't a UTF-8 locale and stay
using ASCII. I'll be happy to send a patch in to correct this
specific case.

At least on GNU/Linux, checking nl_langinfo(CODESET) is considered
definitive for testing which character set is available, and it's
the standard SUS/POSIX interface for querying the locale.

I'd be more excited about finding a way to use linestyle=unicode by
default if it had anything beyond cosmetic benefits. But it doesn't,
and it's hard to justify ratcheting up the requirements for users to get
their configurations exactly straight when that's all they'll get for it.

Bar the lack of nl_langinfo checking, once this is added we will go
out of our way to make sure that the system is capable of handling
UTF-8. This is, IMHO, the limit of how far i/any/ tool should go to
handle things. Worrying about misconfigured terminals, something
which is entirely the user's responsiblility, is I think a step too
far--going down this road means you'll be artificially limited to
ASCII, and the whole point of using nl_langinfo is to allow sensible
autoconfiguation, which almost all programs do nowadays. I don't
think it makes sense to "penalise" the majority of users with
correctly-configured systems because a small minority have a
misconfigured terminal input encoding. It is 2009, and all
contemporary systems support Unicode, and for the majority it is the
default.

Every one of the GNU utilities, plus most other free software,
localises itself using gettext, which in a UTF-8 locale, even
English locales, will transparently recode its output into the
locale codeset. This hasn't resulted in major problems for
people using these tools; it's been like this way for years now.

Regards,
Roger

--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

#20Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#12)
Re: garbage in psql -l

On tis, 2009-11-24 at 14:19 -0500, Tom Lane wrote:

I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.

More generally, it would probably be safer if we used linestyle=unicode
only if the client encoding has been set on the client by some explicit
action, that is, either via PGCLIENTENCODING or an \encoding statement,
but *not* when it is just defaulted from the server encoding. Can we
easily detect this difference?

#21Peter Eisentraut
peter_e@gmx.net
In reply to: Roger Leigh (#19)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#12)
#23Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#14)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#22)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#23)
#26Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#27)
#29Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Andrew Dunstan (#28)
#30Roger Leigh
rleigh@codelibre.net
In reply to: Tom Lane (#24)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roger Leigh (#30)
#32Roger Leigh
rleigh@codelibre.net
In reply to: Tom Lane (#31)