garbage in psql -l
Hi there,
I have problem with CVS HEAD (noticed a week or so ago) -
psql -l show garbage instead of -|+. Looks, like utf-8 symbols used
instead that ascii characters.
List of databases
Name Б■┌ Owner Б■┌ Encoding Б■┌ Collation Б■┌ Ctype Б■┌ Access privileges
Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■
╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■╪Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─Б■─
contrib_regression Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌
nomao Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌
postgres Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌
template0 Б■┌ postgres Б■┌ UTF8 Б■┌ ru_RU.UTF-8 Б■┌ ru_RU.UTF-8 Б■┌ =c/postgres
Б∙╥ Б∙╥ Б∙╥ Б∙╥ Б∙▌ postgres=CTc/postgres
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
I have problem with CVS HEAD (noticed a week or so ago) -
psql -l show garbage instead of -|+. Looks, like utf-8 symbols used
instead that ascii characters.
Hm, you only see it for -l and not for all tabular output? That's
a bit strange.
regards, tom lane
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
I have problem with CVS HEAD (noticed a week or so ago) -
psql -l show garbage instead of -|+. Looks, like utf-8 symbols used
instead that ascii characters.Hm, you only see it for -l and not for all tabular output? That's
a bit strange.
yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.
pg-head@zen:~/cvs/HEAD/pgsql$ ldd /usr/local/pgsql-head/bin/psql
linux-gate.so.1 => (0xffffe000)
libpq.so.5 => /usr/local/pgsql-head/lib/libpq.so.5 (0xb7f33000)
libz.so.1 => /usr/lib/libz.so.1 (0xb7ef8000)
libreadline.so.5 => /usr/lib/libreadline.so.5 (0xb7ec8000)
libtermcap.so.2 => /lib/libtermcap.so.2 (0xb7ec4000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7e92000)
libdl.so.2 => /lib/libdl.so.2 (0xb7e8d000)
libm.so.6 => /lib/libm.so.6 (0xb7e67000)
libc.so.6 => /lib/libc.so.6 (0xb7d07000)
/lib/ld-linux.so.2 (0xb7f4f000)
regards, tom lane
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
On Tue, 24 Nov 2009, Tom Lane wrote:
Hm, you only see it for -l and not for all tabular output? That's
a bit strange.
yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.
What's your locale environment? ("env | grep ^L" would help.)
regards, tom lane
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
On Tue, 24 Nov 2009, Tom Lane wrote:
Hm, you only see it for -l and not for all tabular output? That's
a bit strange.yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.What's your locale environment? ("env | grep ^L" would help.)
LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-R
I had no problem with this.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
On Tue, 24 Nov 2009, Tom Lane wrote:
What's your locale environment? ("env | grep ^L" would help.)
LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-R
Hmm, I can duplicate the fact that psql -l uses utf8 characters
(because it connects to the postgres DB which has utf8 encoding)
but for me, ordinary selects within psql use the utf8 characters
too. Do you perhaps have something in ~/.psqlrc to force a different
client encoding?
regards, tom lane
On tis, 2009-11-24 at 21:32 +0300, Oleg Bartunov wrote:
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
On Tue, 24 Nov 2009, Tom Lane wrote:
Hm, you only see it for -l and not for all tabular output? That's
a bit strange.yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.What's your locale environment? ("env | grep ^L" would help.)
LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-RI had no problem with this.
Seems like a mismatch between client encoding and actual locale
environment.
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
On Tue, 24 Nov 2009, Tom Lane wrote:
What's your locale environment? ("env | grep ^L" would help.)
LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-RHmm, I can duplicate the fact that psql -l uses utf8 characters
(because it connects to the postgres DB which has utf8 encoding)
but for me, ordinary selects within psql use the utf8 characters
too. Do you perhaps have something in ~/.psqlrc to force a different
client encoding?
yes,
set client_encoding to KOI8;
but it never hurts me ! I tried to comment it, but it doesn't helped.
Notice, psql from 8.4 works nice.
regards, tom lane
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On Tue, 24 Nov 2009, Peter Eisentraut wrote:
On tis, 2009-11-24 at 21:32 +0300, Oleg Bartunov wrote:
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
On Tue, 24 Nov 2009, Tom Lane wrote:
Hm, you only see it for -l and not for all tabular output? That's
a bit strange.yes, I'm surprising myself. Teodor has no problem, but he is under FreeBSD,
while I use slackware linux. Here is ldd output.What's your locale environment? ("env | grep ^L" would help.)
LC_COLLATE=ru_RU.KOI8-R
LANG=C
LC_CTYPE=ru_RU.KOI8-RI had no problem with this.
Seems like a mismatch between client encoding and actual locale
environment.
why 8.4 has no real problem ?
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On tis, 2009-11-24 at 21:55 +0300, Oleg Bartunov wrote:
Seems like a mismatch between client encoding and actual locale
environment.why 8.4 has no real problem ?
Because table formatting with Unicode characters is a new feature.
Anyway, that patch to set the client encoding automatically from the
locale sounds even more useful now.
Oleg Bartunov <oleg@sai.msu.su> writes:
why 8.4 has no real problem ?
Because we never tried to use utf8 table decoration before. This
is collateral damage from Roger Leigh's recent patches.
The problem is evidently that Oleg is depending on ~/.psqlrc to
set client_encoding the way he wants it, but that file does not
get read for a "psql -l" invocation. (Probably not for -c either.)
The locale environment really isn't at issue because we do not look
at it to establish client encoding. Perhaps Oleg should be setting
PGCLIENTENCODING instead of depending on ~/.psqlrc, but I suspect
he's not the only one doing it that way.
There has been some talk of altering the rules for setting psql's
default client_encoding. We could think about that, or we could
back off trying to use linestyle=unicode without an explicit setting.
If we do neither, I suspect we'll be hearing more complaints. I'll
bet there are lots of people who are using database encoding = UTF8
but don't actually have unicode-capable terminal setups. It's never
hurt them before, especially not if they aren't really storing any
non-ASCII data.
regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes:
Anyway, that patch to set the client encoding automatically from the
locale sounds even more useful now.
I think you're being overoptimistic to assume that that's going to
eliminate the issue. It might patch things for Oleg's particular
configuration; but the real problem IMO is that people are depending
on ~/.psqlrc to set encoding/locale related behavior, and that file
isn't read before executing -l/-c (not to mention -X).
I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.
regards, tom lane
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
why 8.4 has no real problem ?
Because we never tried to use utf8 table decoration before. This
is collateral damage from Roger Leigh's recent patches.The problem is evidently that Oleg is depending on ~/.psqlrc to
set client_encoding the way he wants it, but that file does not
get read for a "psql -l" invocation. (Probably not for -c either.)The locale environment really isn't at issue because we do not look
at it to establish client encoding. Perhaps Oleg should be setting
PGCLIENTENCODING instead of depending on ~/.psqlrc, but I suspect
he's not the only one doing it that way.
yes, PGCLIENTENCODING=KOI8 psql -l works as it should be
There has been some talk of altering the rules for setting psql's
default client_encoding. We could think about that, or we could
back off trying to use linestyle=unicode without an explicit setting.
If we do neither, I suspect we'll be hearing more complaints. I'll
bet there are lots of people who are using database encoding = UTF8
but don't actually have unicode-capable terminal setups. It's never
hurt them before, especially not if they aren't really storing any
non-ASCII data.
what's benefit of using linestyle=unicode ? I like old ASCII style
for console.
regards, tom lane
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes:
what's benefit of using linestyle=unicode ? I like old ASCII style
for console.
Well, I have to grant that it looks pretty spiffy on a unicode-enabled
display. Whether that's enough reason to risk breaking things for
people with non-unicode-enabled displays is certainly worth debating.
Maybe we should just make the default be linestyle=ascii all the time,
and tell people to turn it on in their ~/.psqlrc if they want it.
regards, tom lane
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
what's benefit of using linestyle=unicode ? I like old ASCII style
for console.Well, I have to grant that it looks pretty spiffy on a unicode-enabled
display. Whether that's enough reason to risk breaking things for
people with non-unicode-enabled displays is certainly worth debating.Maybe we should just make the default be linestyle=ascii all the time,
and tell people to turn it on in their ~/.psqlrc if they want it.
+1
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On Tue, Nov 24, 2009 at 4:49 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Tue, 24 Nov 2009, Tom Lane wrote:
Oleg Bartunov <oleg@sai.msu.su> writes:
what's benefit of using linestyle=unicode ? I like old ASCII style
for console.Well, I have to grant that it looks pretty spiffy on a unicode-enabled
display. Whether that's enough reason to risk breaking things for
people with non-unicode-enabled displays is certainly worth debating.Maybe we should just make the default be linestyle=ascii all the time,
and tell people to turn it on in their ~/.psqlrc if they want it.+1
+1.
...Robert
On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Anyway, that patch to set the client encoding automatically from the
locale sounds even more useful now.I think you're being overoptimistic to assume that that's going to
eliminate the issue. It might patch things for Oleg's particular
configuration; but the real problem IMO is that people are depending
on ~/.psqlrc to set encoding/locale related behavior, and that file
isn't read before executing -l/-c (not to mention -X).I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.
This problem is caused when there's a mismatch between the
client encoding and the user's locale. We can detect this at
runtime and fall back to ASCII if we know they are incompatible.
Why don't we combine the two approaches we looked at so far:
1) The PG client encoding is UTF-8
2) The user's locale codeset (from nl_langinfo(CODESET)) is UTF-8
If *both* the conditions are satisfied simultaneously then we
are guaranteed that things will display correctly given what
the user has told us they wanted. If only one is satisfied then
we remain using ASCII and problems such as the non-UTF-8-locale
mis-display seen here are avoided, while still allowing Unicode
display for users who have a UTF-8 locale as well as a UTF-8
client encoding (such as myself ;-)
This should be a one-liner patch to update the existing check.
Regards,
Roger
--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
Roger Leigh <rleigh@codelibre.net> writes:
On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:
I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.
This problem is caused when there's a mismatch between the
client encoding and the user's locale. We can detect this at
runtime and fall back to ASCII if we know they are incompatible.
Well, no, that is *one* of the possible failure modes. I've hit others
already in the short time that the patch has been installed. The one
that's bit me most is that the locale environment seen by psql doesn't
necessarily match what my xterm at the other end of an ssh connection
is prepared to do --- which is something that psql simply doesn't have
a way to detect. Again, this is something that's never mattered before
unless one was really pushing non-ASCII data around, and even then it
was often possible to be sloppy.
I'd be more excited about finding a way to use linestyle=unicode by
default if it had anything beyond cosmetic benefits. But it doesn't,
and it's hard to justify ratcheting up the requirements for users to get
their configurations exactly straight when that's all they'll get for it.
regards, tom lane
On Tue, Nov 24, 2009 at 05:43:00PM -0500, Tom Lane wrote:
Roger Leigh <rleigh@codelibre.net> writes:
On Tue, Nov 24, 2009 at 02:19:27PM -0500, Tom Lane wrote:
I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.This problem is caused when there's a mismatch between the
client encoding and the user's locale. We can detect this at
runtime and fall back to ASCII if we know they are incompatible.Well, no, that is *one* of the possible failure modes. I've hit others
already in the short time that the patch has been installed. The one
that's bit me most is that the locale environment seen by psql doesn't
necessarily match what my xterm at the other end of an ssh connection
is prepared to do --- which is something that psql simply doesn't have
a way to detect. Again, this is something that's never mattered before
unless one was really pushing non-ASCII data around, and even then it
was often possible to be sloppy.
Sure, but this type of misconfiguration is entirely outside the
purview of psql. Everything else on the system, from man(1) to gcc
emacs and vi will be sending UTF-8 codes to your terminal for any
non-ASCII character they display. While psql using UTF-8 for its
tables is certainly exposing the problem, in reality it was already
broken, and it's not psql's "fault" for using functionality the
system said was available. It would equally break if you stored
non-ASCII characters in your UTF-8-encoded database and then ran
a SELECT query, since UTF-8 codes would again be sent to the
terminal.
For the specific case here, where the locale is KOI8-R, we can
determine at runtime that this isn't a UTF-8 locale and stay
using ASCII. I'll be happy to send a patch in to correct this
specific case.
At least on GNU/Linux, checking nl_langinfo(CODESET) is considered
definitive for testing which character set is available, and it's
the standard SUS/POSIX interface for querying the locale.
I'd be more excited about finding a way to use linestyle=unicode by
default if it had anything beyond cosmetic benefits. But it doesn't,
and it's hard to justify ratcheting up the requirements for users to get
their configurations exactly straight when that's all they'll get for it.
Bar the lack of nl_langinfo checking, once this is added we will go
out of our way to make sure that the system is capable of handling
UTF-8. This is, IMHO, the limit of how far i/any/ tool should go to
handle things. Worrying about misconfigured terminals, something
which is entirely the user's responsiblility, is I think a step too
far--going down this road means you'll be artificially limited to
ASCII, and the whole point of using nl_langinfo is to allow sensible
autoconfiguation, which almost all programs do nowadays. I don't
think it makes sense to "penalise" the majority of users with
correctly-configured systems because a small minority have a
misconfigured terminal input encoding. It is 2009, and all
contemporary systems support Unicode, and for the majority it is the
default.
Every one of the GNU utilities, plus most other free software,
localises itself using gettext, which in a UTF-8 locale, even
English locales, will transparently recode its output into the
locale codeset. This hasn't resulted in major problems for
people using these tools; it's been like this way for years now.
Regards,
Roger
--
.''`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
On tis, 2009-11-24 at 14:19 -0500, Tom Lane wrote:
I wonder whether the most prudent solution wouldn't be to prevent
default use of linestyle=unicode if ~/.psqlrc hasn't been read.
More generally, it would probably be safer if we used linestyle=unicode
only if the client encoding has been set on the client by some explicit
action, that is, either via PGCLIENTENCODING or an \encoding statement,
but *not* when it is just defaulted from the server encoding. Can we
easily detect this difference?