JDBC to load UTF8@psql to latin1@mysql
Good morning,
Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?
Thanks a lot!
Emi
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?
JAVA codes work for most of characters, but not "-È". Someone knows why
the following codes cannot load "-È" to mysql@latin1?
Thanks a lot!
--
public static String utf8_to_latin1(String str)
throws Exception
{
try
{
String stringToConvert = str;
byte[] convertStringToByte = stringToConvert.getBytes("UTF-8");
return new String(convertStringToByte, "ISO-8859-1");
}catch(Exception e)
{
log.error("utf8_to_latin1 Error: " + e.getMessage());
log.error(e);
throw e;
}
}
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I don't think your Java code does what you think it does. You should read
some more about how Java handles string encodings. Here is a method I wrote
some years ago that might also help you. It converts streams, not strings,
but what you need should be pretty close (and simpler):
/**
* Interprets in according to encIn, and converts it to encOut,
* writing to out. Allocates buffer for the buffer size.
*
* @param encIn The input encoding.
* @param encOut The output encoding.
* @param in The data to convert.
* @param out Where to send the converted data.
* @param buffer The size of the buffer or 0 for the default.
*
* @throws IOException
*/
public void run(String encIn, String encOut, InputStream in, OutputStream
out, int buffer) throws IOException {
Reader r = null;
Writer w = null;
int len;
char[] b;
try {
if (buffer > 0) {
r = new BufferedReader(new InputStreamReader(in, encIn), buffer);
w = new BufferedWriter(new OutputStreamWriter(out, encOut), buffer);
} else {
r = new BufferedReader(new InputStreamReader(in, encIn));
w = new BufferedWriter(new OutputStreamWriter(out, encOut));
buffer = DEFAULT_BUFFER_SIZE;
}
b = new char[buffer];
while ((len = r.read(b, 0, buffer)) != -1) {
w.write(b, 0, len);
}
} finally {
try {
if (r != null) r.close();
} finally {
if (w != null) w.close();
}
}
}
Btw, none of this has anything to do with Postgres. :-)
Paul
On Wed, Dec 12, 2012 at 10:19 AM, Emi Lu <emilu@encs.concordia.ca> wrote:
Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?
JAVA codes work for most of characters, but not "-È". Someone knows why
the following codes cannot load "-È" to mysql@latin1?
Thanks a lot!
--
public static String utf8_to_latin1(String str)
throws Exception
{
try
{
String stringToConvert = str;
byte[] convertStringToByte = stringToConvert.getBytes("UTF-**8");
return new String(convertStringToByte, "ISO-8859-1");
}catch(Exception e)
{
log.error("utf8_to_latin1 Error: " + e.getMessage());
log.error(e);
throw e;}
}--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>
--
_________________________________
Pulchritudo splendor veritatis.
Emi Lu wrote on 12.12.2012 17:17:
Good morning,
Is there a simple way to load UTF8 data in psql to mysql(with latin1 encoding) through JDBC?
All you need to do is to query the source database, then use ResultSet.getString() to obtain the data and use a PreparedStatement and PreparedStatement.setString() to insert/update the data on the target database.
The JDBC drivers will handle all the conversion.
Do NOT manually convert the data.
getString() and setString() will do everything correctly.
Thomas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?All you need to do is to query the source database, then use
ResultSet.getString() to obtain the data and use a PreparedStatement and
PreparedStatement.setString() to insert/update the data on the target
database.The JDBC drivers will handle all the conversion.
Do NOT manually convert the data.getString() and setString() will do everything correctly.
I am not using stmt directly but through Mybatis for all db transactions.
So, this approach will not work.
Thanks.
--
Emi
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Em 13/12/2012 12:00, Emi Lu escreveu:
Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?All you need to do is to query the source database, then use
ResultSet.getString() to obtain the data and use a PreparedStatement and
PreparedStatement.setString() to insert/update the data on the target
database.The JDBC drivers will handle all the conversion.
Do NOT manually convert the data.getString() and setString() will do everything correctly.
I am not using stmt directly but through Mybatis for all db transactions.
Should not this a Mybatis problem instead?
As stated, JDBC drivers does all the conversion needed automatically,
but if you have a middleware messing with your enconding, then the
problem is the middleware, not databases or drivers.
Edson
So, this approach will not work.
Thanks.
--
Emi
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I don't think your Java code does what you think it does. You should
read some more about how Java handles string encodings. Here is a method
I wrote some years ago that might also help you. It converts streams,
not strings, but what you need should be pretty close (and simpler):
/**
* Interprets in according to encIn, and converts it to encOut,
* writing to out. Allocates buffer for the buffer size.
* @param encIn The input encoding.
* @param encOut The output encoding.
* @param in The data to convert.
* @param out Where to send the converted data.
* @param buffer The size of the buffer or 0 for the default.
* @throws IOException
*/
public void run(String encIn, String encOut, InputStream in,
OutputStream out, int buffer) throws IOException {
Reader r = null;
Writer w = null;
int len;
char[] b;
try {
if (buffer > 0) {
r = new BufferedReader(new InputStreamReader(in, encIn), buffer);
w = new BufferedWriter(new OutputStreamWriter(out, encOut),
buffer);
} else {
r = new BufferedReader(new InputStreamReader(in, encIn));
w = new BufferedWriter(new OutputStreamWriter(out, encOut));
buffer = DEFAULT_BUFFER_SIZE;
}
b = new char[buffer];while ((len = r.read(b, 0, buffer)) != -1) {
w.write(b, 0, len);
}
} finally {
try {
if (r != null) r.close();
} finally {
if (w != null) w.close();
}
}
}
Btw, none of this has anything to do with Postgres. :-)
Thank you for the code first. I will try it later. The problem I had as
mentioned in the subject is:
(1) psql@utf8
(2) mysql@latin1
When I load data from (1) to (2) through Mybatis, french characters
could not be mapped correctly in (2). I was thinking that psql may have
methods could help this. But it seems that I have to try from java
coding side :-(
--
Emi
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
The JDBC drivers will handle all the conversion.
Do NOT manually convert the data.
Yeah, I agree this is the right answer here, since you're using JDBC. By
the time you get a String from the MySQL driver, it's already in Java's
2-bytes-per-char format. And the Postgres driver will deal with the
encoding on the output side. So the code I provided won't help you. I'm
afraid I don't know about Mybatis, but if it's built on JDBC I'd think
you've just got a configuration problem with what encoding the client
expects at either end.
Paul
--
_________________________________
Pulchritudo splendor veritatis.
Emi Lu wrote on 13.12.2012 15:00:
Is there a simple way to load UTF8 data in psql to mysql(with latin1
encoding) through JDBC?All you need to do is to query the source database, then use
ResultSet.getString() to obtain the data and use a PreparedStatement and
PreparedStatement.setString() to insert/update the data on the target
database.The JDBC drivers will handle all the conversion.
Do NOT manually convert the data.getString() and setString() will do everything correctly.
I am not using stmt directly but through Mybatis for all db transactions.
So, this approach will not work.
Then it's a problem of that MyBatis thing.
Thomas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Paul Jungwirth <pj@illuminatedcomputing.com> writes:
Yeah, I agree this is the right answer here, since you're using JDBC. By
the time you get a String from the MySQL driver, it's already in Java's
2-bytes-per-char format. And the Postgres driver will deal with the
encoding on the output side. So the code I provided won't help you. I'm
afraid I don't know about Mybatis, but if it's built on JDBC I'd think
you've just got a configuration problem with what encoding the client
expects at either end.
I was wondering if the problem wasn't lots simpler than that. Is the
character the OP is trying to convert actually part of LATIN1?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yeah, I agree this is the right answer here, since you're using JDBC. By
the time you get a String from the MySQL driver, it's already in Java's
2-bytes-per-char format. And the Postgres driver will deal with the
encoding on the output side. So the code I provided won't help you. I'm
afraid I don't know about Mybatis, but if it's built on JDBC I'd think
you've just got a configuration problem with what encoding the client
expects at either end.From: Tom lane
I was wondering if the problem wasn't lots simpler than that. Is the
character the OP is trying to convert actually part of LATIN1?
First, the data loading is from psql(unix) to mysql(Unix). Second, DB
transactions are through JAVA+MyBatis.
Steps:
(1) Read utf8.data@psql from psql.xml into java.ArrayList<Bean>
(2) For each list.rec, save into mysql@latin through mysql.xml
Tried "jdbc:mysql://.../mysql_db?...unicode...encoding...=ISO..." No.
This does not work.
For now, through the following method, all letters are correctly
transformed except "È".
What does OP stand for?
Emi
--
public static String utf8_to_latin1(String str)
throws Exception
{
try
{
if(str.indexOf("È")>=0)
{
str = str.replaceAll("È", "E");
}
byte[] convertStringToByte = str.getBytes("UTF-8");
str = new String(convertStringToByte, "ISO-8859-1");
return str;
}catch(Exception e)
{
log.error("utf8_to_latin1 Error: " + e.getMessage());
log.error(e);
throw e;
}
}
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12/14/2012 06:06 AM, Emi Lu wrote:
What does OP stand for?
Original Poster.
Emi
--
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12/14/2012 09:49 AM, Adrian Klaver wrote:
Original Poster
Thanks. And more info:
Mysql
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
SHOW VARIABLES LIKE "character\_set\_database";
+------------------------+--------+
| Variable_name | Value |
+------------------------+--------+
| character_set_database | latin1 |
+------------------------+--------+
Psql everywhere is utf8.
Where could be the problem located?
Also by using the java encoding methods, all characters except "È" are
transformed correctly.
Thanks alot!
Emi
--
public static String utf8_to_latin1(String str)
throws Exception
{
try
{
if(str.indexOf("È")>=0)
{
str = str.replaceAll("È", "E");
}
byte[] convertStringToByte = str.getBytes("UTF-8");
str = new String(convertStringToByte,
"ISO-8859-1");
return str;
}catch(Exception e)
{
log.error("utf8_to_latin1 Error: " + e.getMessage());
log.error(e);
throw e;
}
}
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Emi Lu <emilu@encs.concordia.ca> writes:
For now, through the following method, all letters are correctly
transformed except "�".
Meh. That character renders as \310 in your mail, which is not an
assigned code in ISO 8859-1. The numerically corresponding Unicode
value would be U+0090, which is an unspecified control character.
I surmise that your source data is not actually either Unicode or
ISO 8859-1, but one of the random "extended" character sets that
Microsoft has loosed upon the world, perhaps windows-1252
http://en.wikipedia.org/wiki/Windows-1252
The conversion code that you're using is quite right to reject the
character as not being valid LATIN1. What you need to do is figure out
what the data actually is and correct its encoding. It's evidently
stored wrong in the UTF8 data, if you believe that this code is a
letter.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I wrote:
Meh. That character renders as \310 in your mail, which is not an
assigned code in ISO 8859-1. The numerically corresponding Unicode
value would be U+0090, which is an unspecified control character.
Oh, scratch that, apparently I can't do hex/octal arithmetic in my
head first thing in the morning. It's really U+00C8 which is perfectly
valid. I can't see a reason why that character and only that character
would be problematic --- have you done systematic testing to confirm
that that's the only should-be-LATIN1 character that fails?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12/14/2012 07:35 AM, Tom Lane wrote:
I wrote:
Meh. That character renders as \310 in your mail, which is not an
assigned code in ISO 8859-1. The numerically corresponding Unicode
value would be U+0090, which is an unspecified control character.Oh, scratch that, apparently I can't do hex/octal arithmetic in my
head first thing in the morning. It's really U+00C8 which is perfectly
valid. I can't see a reason why that character and only that character
would be problematic --- have you done systematic testing to confirm
that that's the only should-be-LATIN1 character that fails?
This is where I am confused, in one of the original posts the OP said:
"JAVA codes work for most of characters, but not "-�"."
regards, tom lane
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hello All,
Meh. That character renders as \310 in your mail, which is not an
assigned code in ISO 8859-1. The numerically corresponding Unicode
value would be U+0090, which is an unspecified control character.Oh, scratch that, apparently I can't do hex/octal arithmetic in my
head first thing in the morning. It's really U+00C8 which is perfectly
valid. I can't see a reason why that character and only that character
would be problematic --- have you done systematic testing to confirm
that that's the only should-be-LATIN1 character that fails?
Finally, the problem is resolved:
SHOW VARIABLES LIKE "character\_set\_%";
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 | -- here mysql uses utf8 for
character_set_system.
Change my java code to:
========================
public static String utf8_to_mysql(String str)
throws Exception
{
try
{
byte[] convertStringToByte = str.getBytes("UTF-8");
str = new String(convertStringToByte,
"UTF-8");
return str;
}catch(Exception e)
{
log.error("utf8_to_latin1 Error: " + e.getMessage());
log.error(e);
throw e;
}
Have to explicitly specify "UTF-8", but cannot leave as empty.
Larry's comments(from MyBatis mailing list) and I tried both "from/to"
by "UTF8". It works. This is still little bit strange to me. But it works!
My guess is that it's correct but the client you're using is messing
it up. If not, then you need to look at your connection strings to
the 2 databases to make sure they are handling the encodings
correctly.Unless you set them specifically, I suspect they are using
your default system encoding - so both may be using utf8 or iso8859.
Thank you very much for all of your help for this!
Emi
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12/14/2012 01:37 PM, Emi Lu wrote:
Hello All,
Meh. That character renders as \310 in your mail, which is not an
assigned code in ISO 8859-1. The numerically corresponding Unicode
value would be U+0090, which is an unspecified control character.Oh, scratch that, apparently I can't do hex/octal arithmetic in my
head first thing in the morning. It's really U+00C8 which is perfectly
valid. I can't see a reason why that character and only that character
would be problematic --- have you done systematic testing to confirm
that that's the only should-be-LATIN1 character that fails?Finally, the problem is resolved:
SHOW VARIABLES LIKE "character\_set\_%";
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 | -- here mysql uses utf8 for
character_set_system.
Another try is that if I change my client tool encoding set, I do not
even need my java transition. All right, good to learn from this.
Emi
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general