CP1250 to and from Unicode conversion, how?

Started by Nikola Milutinovicover 24 years ago3 messagesgeneral
Jump to latest
#1Nikola Milutinovic
Nikola.Milutinovic@ev.co.yu

Hi all.

I have a database with text fields containing text with Windows CP-1250 encoding. How can I convert it to Unicode? I have build the database with

--enable-recode enable character set recode support
--enable-multibyte enable multibyte character support
--enable-unicode-conversion enable unicode conversion support

Also, how can I enter a string containing Unicode chars from "psql"? What is the Unicode escape sequence?

I mean, if all else fails, I'll dump database, run the dump through script/Java/C program to convert all CP-1250 chars to their Unicode equivalents and import it again.

Hope someone will answer my question.

Nix.

#2Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Nikola Milutinovic (#1)
Re: CP1250 to and from Unicode conversion, how?

I have a database with text fields containing text with Windows CP-1250 encoding. How can I convert it to Unicode? I have build the database with

Sorry, the conversion between CP-1250 and Unicode is not currently
supported, nor in 7.2. Actually adding that would be pretty easy, but
we are in the beta freeze phase and can not add a new functionality.

BTW, CP-1250 is equivalent to ISO-8859-2? If so, you could use the
encoding name "LATIN2" instead of WIN1250 and it supports the
converion to/from UNICODE.

Also, how can I enter a string containing Unicode chars from "psql"? What is the Unicode escape sequence?

No idea. Why not using Unicode aware terminals? I use emacs + mule-ucs.
--
Tatsuo Ishii

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tatsuo Ishii (#2)
Re: CP1250 to and from Unicode conversion, how?

It might be a JDBC driver issue. Ask the JDBC gurus.

If you believe it's the problem of the backend, please give me
reproducible examples using psql.
--
Tatsuo Ishii

Show quoted text

Hi.

Problems again.

I have created a DB with encoding set to LATIN2, created tables.
Connected to the database with psql, set encoding to WIN1250, imported data ("\copy ...")
The data is there, definitely. Encoding is different from WIN1250, so I guess the encoding is really Latin-2.

Now comes my creeping horror. I have a test Java application which connects to the database, taking one argument; ENCODING.

This is what comes out:

<NO ENCODING>
---------------------------------------------------------------------
Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury

ID: 39 NAME: Anica SURNAME: Ivkovi?
ID: 87 NAME: Sa?a SURNAME: Ivkovi?
ID: 130 NAME: Ljubica SURNAME: Ivkovi?
---------------------------------------------------------------------

<LATIN-1>
---------------------------------------------------------------------
Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury?charSet=LATIN1

ID: 39 NAME: Anica SURNAME: Ivkovic
ID: 87 NAME: Saaa SURNAME: Ivkovic
ID: 130 NAME: Ljubica SURNAME: Ivkovic
---------------------------------------------------------------------

<LATIN-2>
---------------------------------------------------------------------
Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury?charSet=LATIN2

ID: 39 NAME: Anica SURNAME: Ivkovi?
ID: 87 NAME: Sa?a SURNAME: Ivkovi?
ID: 130 NAME: Ljubica SURNAME: Ivkovi?
---------------------------------------------------------------------

<UTF-8>
---------------------------------------------------------------------
Connecting with: jdbc:postgresql://legba.ev.co.yu/mercury?charSet=UNICODE

Exception in thread "main" java.sql.SQLException:
at org.postgresql.Connection.ExecSQL(Connection.java, Compiled Code)
at org.postgresql.jdbc2.Statement.execute(Statement.java, Compiled Code)
at org.postgresql.jdbc2.Statement.executeQuery(Statement.java, Compiled Code)
at test2PostgreSQL.main(test2PostgreSQL.java, Compiled Code)
---------------------------------------------------------------------

So, <No encoding> and <Latin-2> give me "?", <Latin-1> gives me what looks like Latin-2 output and <Unicode> crashes JDBC connection.

:-(

Looks like I'm in for some serious learning...

If it is of any help, on the "Legba.ev.co.yu", for <Unicode> case, which crashed JDBC, PostMaster is spitting out:

ERROR: parser: parse error at or near "t?"
FATAL 1: Socket command type S unknown

I'm taking my "mining helmet" out, getting an axe from the closet and preparing to dig into the source. Before I commit such an act, could you enlighten me? What is going on?

Nix.