Re: another seemingly simple encoding question
this is a forward of my problem from April.
I have this time gone all the way and re-inited a DB from scratch,
created a new database, documented the import procedure, set the locale
to match but I am still having problems.
For example, look at this match count~
mod=# select count(*) from korean_english;
count
--------
205323
(1 row)
mod=#
mod=# select count(*) from korean_english where word='占싫놂옙';
count
-------
40332
(1 row)
mod=# \set
VERSION = 'PostgreSQL 8.0.0beta3 on i686-pc-linux-gnu, compiled by GCC
gcc (GCC) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)'
AUTOCOMMIT = 'on'
VERBOSITY = 'default'
DBNAME = 'mod'
USER = 'postgres'
PORT = '5432'
ENCODING = 'UNICODE'
PROMPT1 = '%/%R%# '
PROMPT2 = '%/%R%# '
PROMPT3 = '>> '
HISTSIZE = '500'
mod=#
I documented the import procedure and put it at
http://www.myowndictionary.com/design.htm
if there is anybody out there who has any idea, i would be very grateful
for help. I have to move my database to postgres from mysql, and it
has this big problem with the encoding.
thank you .
joseph.
-------- Forwarded Message --------
占쏙옙占쏙옙 占쏙옙占�: joseph <kmh496@kornet.net>
占쌨댐옙 占쏙옙占�: pgsql-general@postgresql.org
占쏙옙占쏙옙: another seemingly simple encoding question
占쏙옙짜: Fri, 24 Mar 2006 22:27:06 +0900
maybe a routine question here ... .... i hope i can understand the
answer.
[postgres@www ~]$ pg_ctl --version
pg_ctl (PostgreSQL) 8.0.0beta3
[postgres@www ~]$
i have a problem matching a utf8 string with a field in a database
encoded in utf8.
i read the documentation, checked the following, and don't know where i
went astray, trying to match ...
1) i am almost 100% sure the data is correctly utf8. i just dumped and
loaded into postgres.
2)
utf8db -> \l
List of databases
Name | Owner | Encoding
--------------+----------+-----------
utf8db | postgres | UNICODE
3) postgresql.conf
# These settings are initialized by initdb -- they might be changed
lc_messages = 'en_US.utf8' # locale for system error
message strings
lc_monetary = 'en_US.utf8' # locale for monetary formatting
lc_numeric = 'en_US.utf8' # locale for number formatting
lc_time = 'en_US.utf8' # locale for time formatting
# - Other Defaults -
4) set client encoding in client (psql or php, either one, both same
mismatch)
LOG: statement: select wordid,word from korean_english where word='占썩르
占쏙옙' limit 10;
LOG: statement: show client_encoding;
LOG: statement: set client_encoding to 'utf8';
LOG: statement: select wordid,word from korean_english where word='占썩르
占쏙옙' limit 10;
LOG: statement: show client_encoding;
5) locale -a | grep en
<snip>
en_US.utf8
</snip>
ohhh, where is my mistake, please!
--
my site <a href="http://www.myowndictionary.com">myowndictionary</a> was
made to help students of many languages learn them faster.
kmh496 wrote:
this is a forward of my problem from April.
I have this time gone all the way and re-inited a DB from scratch,
created a new database, documented the import procedure, set the locale
to match but I am still having problems.
For example, look at this match count~
mod=# select count(*) from korean_english;
count
--------
205323
(1 row)mod=#
mod=# select count(*) from korean_english where word='占싫놂옙';
count
-------
40332
(1 row)
You seem to be implying there is something wrong with the above results,
but you haven't given us enough information to have any idea why that's
a problem. AFAICT, it's perfectly plausible that 40332 out of the 205323
rows in that table have that particular value of the word column. If
that's not correct, you need to tell us how, otherwise no-one can help you.
One clue is that you appear to have your mail client set to use EUC-KR
encoding, not UTF-8. Perhaps whatever client you're using to put data
into your database is using that encoding too?
Tim
--
-----------------------------------------------
Tim Allen tim@proximity.com.au
Proximity Pty Ltd http://www.proximity.com.au/