Character encodings...
I am trying to fill up a database using psql program. A file I have prepared
contains Russian in KOI8-R encoding. When I try to process this file using
`psql -f file db', it fails: no diagnostics, nothing; it just shows that EOF is
reached. When I replace Russian letters with something in ASCII, it works just
fine. The main problem is that my second file gets processed just fine.
Where to look to? What additional information is needed? :)
Thanks,
--
Mike
On Thu, 13 Apr 2000, Michael Sobolev wrote:
I am trying to fill up a database using psql program. A file I have prepared
contains Russian in KOI8-R encoding. When I try to process this file using
`psql -f file db', it fails: no diagnostics, nothing; it just shows that EOF is
reached. When I replace Russian letters with something in ASCII, it works just
fine. The main problem is that my second file gets processed just fine.Where to look to? What additional information is needed? :)
OS, locale, Postgres version, whether Postgres was compiled with locale,
multibyte...
Oleg.
----
Oleg Broytmann http://members.xoom.com/phd2.1/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.
On Thu, Apr 13, 2000 at 10:20:39AM +0000, Oleg Broytmann wrote:
OS, locale, Postgres version, whether Postgres was compiled with locale,
multibyte...
Debian GNU/Linux (frozen), 6.5.3-17 (-17 -- debian revision), yes, =UNICODE.
--
Mike
Michael Sobolev wrote:
On Thu, Apr 13, 2000 at 10:20:39AM +0000, Oleg Broytmann wrote:
OS, locale, Postgres version, whether Postgres was compiled with locale
,
multibyte...
Debian GNU/Linux (frozen), 6.5.3-17 (-17 -- debian revision), yes, =UNICODE.
Turn on logging in the backend (edit /etc/postgresql/postmaster.init) and
restart the postmaster (/etc/init.d/postgresql restart). See what you get
in the log.
--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP key from public servers; key ID 32B8FAA1
========================================
"I sought the LORD, and he heard me, and delivered me
from all my fears." Psalms 34:41
On Thu, 13 Apr 2000, Michael Sobolev wrote:
OS, locale, Postgres version, whether Postgres was compiled with locale,
multibyte...Debian GNU/Linux (frozen), 6.5.3-17 (-17 -- debian revision), yes, =UNICODE.
Not sure how well Postgres works with UNICODE. It works pretty well with
KOI8-R and Windows-1251 encodings...
Oleg.
----
Oleg Broytmann http://members.xoom.com/phd2.1/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.
On Thu, Apr 13, 2000 at 11:54:17AM +0100, Oliver Elphick wrote:
Turn on logging in the backend (edit /etc/postgresql/postmaster.init) and
restart the postmaster (/etc/init.d/postgresql restart). See what you get
in the log.
What level of debug should be sufficient?
I've got an impression that it's psql that does not process correctly the
stuff.
I have a very simple statement:
insert into news values ('2000-04-13', NULL, '');
This works just fine. Now I replace '' with 'A' (A -- 65). It still works
just fine. Now I replace this latin A with Russian A. And psql shows:
$ psql -f test.sql stuff
insert into news values ('2000-04-12', NULL, 'О©╫');
EOF
--
Mike
Michael Sobolev wrote:
On Thu, Apr 13, 2000 at 11:54:17AM +0100, Oliver Elphick wrote:
Turn on logging in the backend (edit /etc/postgresql/postmaster.init) and
restart the postmaster (/etc/init.d/postgresql restart). See what you get
in the log.What level of debug should be sufficient?
2
Also set PGECHO in postmaster.init, so that queries are echoed in the log.
I've got an impression that it's psql that does not process correctly the
stuff.I have a very simple statement:
insert into news values ('2000-04-13', NULL, '');
This works just fine. Now I replace '' with 'A' (A -- 65). It still works
just fine. Now I replace this latin A with Russian A. And psql shows:$ psql -f test.sql stuff
insert into news values ('2000-04-12', NULL, '�');
EOF
The trouble is, I don't know how to test this. How do I produce Russian
characters on an English keyboard?
--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP key from public servers; key ID 32B8FAA1
========================================
"I sought the LORD, and he heard me, and delivered me
from all my fears." Psalms 34:41
On Thu, Apr 13, 2000 at 02:52:13PM +0100, Oliver Elphick wrote:
What level of debug should be sufficient?
2
Also set PGECHO in postmaster.init, so that queries are echoed in the log.
OK. I'll try.
The trouble is, I don't know how to test this. How do I produce Russian
characters on an English keyboard?
I am almost sure that this may fail if it's just a character from the upper
half of 256. In vim: ^V240 :)
--
Mike
On Thu, Apr 13, 2000 at 02:52:13PM +0100, Oliver Elphick wrote:
2
Also set PGECHO in postmaster.init, so that queries are echoed in the log.
Here it goes. I would not say it's very useful... Russian a has code 225
(decimal).
--
Mike
binding ShmemCreate(key=52e2c1, size=2006016)
/usr/lib/postgresql/bin/postmaster: ServerLoop: handling reading 4
/usr/lib/postgresql/bin/postmaster: ServerLoop: handling reading 4
/usr/lib/postgresql/bin/postmaster: ServerLoop: handling writing 4
/usr/lib/postgresql/bin/postmaster: BackendStartup: pid 30613 user mss db stuff socket 4
/usr/lib/postgresql/bin/postmaster child[30613]: starting with (/usr/lib/postgresql/bin/postgres -d2 -B 128 -E -v131072 -p stuff )
FindExec: found "/usr/lib/postgresql/bin/postgres" using argv[0]
debug info:
User = mss
RemoteHost = localhost
RemotePort = 0
DatabaseName = stuff
Verbose = 2
Noversion = f
timings = f
dates = European
bufsize = 128
sortmem = 512
query echo = t
InitPostgres
reset_client_encoding()..
reset_client_encoding() done.
StartTransactionCommand
query: select getdatabaseencoding()
ProcessQuery
CommitTransactionCommand
StartTransactionCommand
query: SET client_encoding = 'UNICODE'
ProcessUtility: SET client_encoding = 'UNICODE'
CommitTransactionCommand
proc_exit(0) [#0]
shmem_exit(0) [#0]
exit(0)
/usr/lib/postgresql/bin/postmaster: reaping dead processes...
/usr/lib/postgresql/bin/postmaster: CleanupProc: pid 30613 exited with status 0
On Thu, 13 Apr 2000, Michael Sobolev wrote:
Here it goes. I would not say it's very useful... Russian a has code 225
(decimal).
StartTransactionCommand
query: SET client_encoding = 'UNICODE'
ProcessUtility: SET client_encoding = 'UNICODE'
CommitTransactionCommand
proc_exit(0) [#0]
shmem_exit(0) [#0]
exit(0)
/usr/lib/postgresql/bin/postmaster: reaping dead processes...
/usr/lib/postgresql/bin/postmaster: CleanupProc: pid 30613 exited with status 0
That looks like the query never got to the backend. This is either a bug
in psql or the multibyte suite. I seem to recall that Unicode isn't fully
supported, so I'd go for the latter. Can Tatsuo comment?
--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden
On Thu, 13 Apr 2000, Michael Sobolev wrote:
Here it goes. I would not say it's very useful... Russian a has code 225
(decimal).StartTransactionCommand
query: SET client_encoding = 'UNICODE'
ProcessUtility: SET client_encoding = 'UNICODE'
CommitTransactionCommand
proc_exit(0) [#0]
shmem_exit(0) [#0]
exit(0)
/usr/lib/postgresql/bin/postmaster: reaping dead processes...
/usr/lib/postgresql/bin/postmaster: CleanupProc: pid 30613 exited with status 0That looks like the query never got to the backend. This is either a bug
in psql or the multibyte suite. I seem to recall that Unicode isn't fully
supported, so I'd go for the latter. Can Tatsuo comment?
Oh, he is using the multibyte support and expects an automatic code
conversion between KOI8-R and UNICODE that is not supported yet.
What he need to do is creating a database with encoding KOI8-R or
ISO-8859-5.
# make a KOI8-R database
$ createdb -E KOI8
or
# make a ISO-8859-5 database
$ createdb -E LATIN5
In the next case, he might want to set PGCLIENTENCODING environment
variable so that a conversion between KOI8-R and ISO-8859-5
automatically performed.
# if you want to use KOI8-R on your client.
$ export PGCLIENTENCODING=KOI8
or
% setenv PGCLIENTENCODING KOI8
--
Tatsuo Ishii
On Fri, Apr 14, 2000 at 03:44:09PM +0900, Tatsuo Ishii wrote:
Oh, he is using the multibyte support and expects an automatic code
conversion between KOI8-R and UNICODE that is not supported yet.
Not exactly. If you had a look on my first message, you would see that the
problem I see that the behaviour is not consistent. Some time this data gets
through, and sometimes it does not. I'd say that an arbitrary text in KOI8-R
can hardly be something reasonable in UTF-8, so I'd see that all (yes, ALL) my
requests would fail (and preferably with correct diagnostics).
# make a KOI8-R database
$ createdb -E KOI8
Thanks. I was looking for something like this in man page, but unfortunately
it does not have this information.
In the next case, he might want to set PGCLIENTENCODING environment
variable so that a conversion between KOI8-R and ISO-8859-5
automatically performed.
What are the requirements for this to work?
Thanks,
--
Mike
On Fri, Apr 14, 2000 at 03:44:09PM +0900, Tatsuo Ishii wrote:
Oh, he is using the multibyte support and expects an automatic code
conversion between KOI8-R and UNICODE that is not supported yet.Not exactly. If you had a look on my first message, you would see that the
problem I see that the behaviour is not consistent. Some time this data gets
through, and sometimes it does not. I'd say that an arbitrary text in KOI8-R
can hardly be something reasonable in UTF-8, so I'd see that all (yes, ALL) my
requests would fail (and preferably with correct diagnostics).
Sorry. I don't understand your point. What I wanted to say was KOI8-R
and UTF-8 are totally different encodings (except ASCII part).
# make a KOI8-R database
$ createdb -E KOI8Thanks. I was looking for something like this in man page, but unfortunately
it does not have this information.
Please look at doc/README.mb.
In the next case, he might want to set PGCLIENTENCODING environment
variable so that a conversion between KOI8-R and ISO-8859-5
automatically performed.What are the requirements for this to work?
Please explain your backgrounds. If you need KOI8-R only, you could
forget about ISO-8859-5.
--
Tatsuo Ishii