Encoding Conversion

Started by beer@cmu.edualmost 20 years ago10 messagesgeneral
Jump to latest
#1beer@cmu.edu
beer@cmu.edu

Hello All

So I have an old database that is ASCII_SQL encoded. For a variety of reasons I need to convert the database to UNICODE. I did some googling on this but have yet to find anything that looked like a viable option, so i thought I'd post to the group and see what sort of advice might arise. :)

TIA

-b

#2jef peeraer
jef.peeraer@pandora.be
In reply to: beer@cmu.edu (#1)
Re: Encoding Conversion

beer schreef:

Hello All

So I have an old database that is ASCII_SQL encoded. For a variety of reasons I need to convert the database to UNICODE. I did some googling on this but have yet to find anything that looked like a viable option, so i thought I'd post to the group and see what sort of advice might arise. :)

well i recently struggled with the same problem. After a lot of trial
and error and reading, it seems that an ascii encoded database can't use
its client encoding capabilities ( set client_encoding to utf8 ).
i think the easist solution is to do a dump, recreate the database with
a proper encoding, and restore the dump.

jef peeraer

Show quoted text

TIA

-b

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#3jef peeraer
jef.peeraer@telenet.be
In reply to: jef peeraer (#2)
Re: Encoding Conversion

beer schreef:

Hello All

So I have an old database that is ASCII_SQL encoded. For a variety

of reasons I need to convert the database to UNICODE. I did some
googling on this but have yet to find anything that looked like a viable
option, so i thought I'd post to the group and see what sort of advice
might arise. :)
well i recently struggled with the same problem. After a lot of trial
and error and reading, it seems that an ascii encoded database can't use
its client encoding capabilities ( set client_encoding to utf8 ).
i think the easist solution is to do a dump, recreate the database with
a proper encoding, and restore the dump.

jef peeraer

Show quoted text

TIA

-b

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#4Alan Hodgson
ahodgson@simkin.ca
In reply to: jef peeraer (#3)
Re: Encoding Conversion

On May 9, 2006 01:03 pm, jef peeraer <jef.peeraer@telenet.be> wrote:

well i recently struggled with the same problem. After a lot of trial
and error and reading, it seems that an ascii encoded database can't use
its client encoding capabilities ( set client_encoding to utf8 ).
i think the easist solution is to do a dump, recreate the database with
a proper encoding, and restore the dump.

You also need to convert any non-ASCII encoded characters present in the
dump to UTF-8, prior to restoring it. If you're lucky, and they're all the
same (ie. LATIN1 or something), you can use iconv to easily do that.
Remember to change the set client_encoding line in the dump file, too.

--
Alan

#5Adam
adam@spatialsystems.org
In reply to: beer@cmu.edu (#1)
What's wrong with this SQL?

I'm trying to create a table and I'm getting this error:

SQL error:

ERROR: syntax error at or near "(" at character 39
In statement:
CREATE TABLE "users" ("user_ID" SERIAL(12), "first_name" character
varying(40) NOT NULL, "last_name" character varying(40) NOT NULL, "password"
character varying(16) NOT NULL, "email" character varying(100) NOT NULL,
"privilege" integer(2) NOT NULL, PRIMARY KEY ("user_ID"))

What's wrong with this SQL?

#6Dann Corbit
DCorbit@connx.com
In reply to: Adam (#5)
Re: What's wrong with this SQL?

If you count over 39 characters, you will see the parser is barking at
this:

"user_ID" SERIAL(12)
^

See:
http://www.postgresql.org/docs/8.1/static/datatype.html#DATATYPE-SERIAL

to find out why your definition is confusing Mr. SQL-parser.

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
owner@postgresql.org] On Behalf Of Adam
Sent: Tuesday, May 09, 2006 9:06 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] What's wrong with this SQL?

I'm trying to create a table and I'm getting this error:

SQL error:

ERROR: syntax error at or near "(" at character 39
In statement:
CREATE TABLE "users" ("user_ID" SERIAL(12), "first_name"

character

varying(40) NOT NULL, "last_name" character varying(40) NOT NULL,
"password"
character varying(16) NOT NULL, "email" character varying(100) NOT

NULL,

"privilege" integer(2) NOT NULL, PRIMARY KEY ("user_ID"))

What's wrong with this SQL?

---------------------------(end of

broadcast)---------------------------

Show quoted text

TIP 6: explain analyze is your friend

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam (#5)
Re: What's wrong with this SQL?

"Adam" <adam@spatialsystems.org> writes:

I'm trying to create a table and I'm getting this error:
ERROR: syntax error at or near "(" at character 39
In statement:
CREATE TABLE "users" ("user_ID" SERIAL(12),

SERIAL doesn't take a parameter.

regards, tom lane

#8Chris
dmagick@gmail.com
In reply to: Adam (#5)
Re: What's wrong with this SQL?

Adam wrote:

I'm trying to create a table and I'm getting this error:

SQL error:

ERROR: syntax error at or near "(" at character 39
In statement:
CREATE TABLE "users" ("user_ID" SERIAL(12), "first_name" character
varying(40) NOT NULL, "last_name" character varying(40) NOT NULL,
"password" character varying(16) NOT NULL, "email" character
varying(100) NOT NULL, "privilege" integer(2) NOT NULL, PRIMARY KEY
("user_ID"))

"privilege" integer(2) NOT NULL

Postgres doesn't support integers of different sizes like this.

http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-INT

--
Postgresql & php tutorials
http://www.designmagick.com/

#9John DeSoi
desoi@pgedit.com
In reply to: Adam (#5)
Re: What's wrong with this SQL?

On May 10, 2006, at 12:06 AM, Adam wrote:

CREATE TABLE "users" ("user_ID" SERIAL(12), "first_name" character
varying(40) NOT NULL, "last_name" character varying(40) NOT NULL,
"password" character varying(16) NOT NULL, "email" character varying
(100) NOT NULL, "privilege" integer(2) NOT NULL, PRIMARY KEY
("user_ID"))

PostgreSQL integers don't have a size property. So SERIAL(12) and
integer(2) are wrong. See the type choices here:

http://www.postgresql.org/docs/8.1/interactive/datatype.html#DATATYPE-
INT

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

#10Rick Gigger
rick@alpinenetworking.com
In reply to: jef peeraer (#2)
Re: Encoding Conversion

jef peeraer wrote:

beer schreef:

Hello All

So I have an old database that is ASCII_SQL encoded. For a variety
of reasons I need to convert the database to UNICODE. I did some
googling on this but have yet to find anything that looked like a
viable option, so i thought I'd post to the group and see what sort
of advice might arise. :)

well i recently struggled with the same problem. After a lot of trial
and error and reading, it seems that an ascii encoded database can't
use its client encoding capabilities ( set client_encoding to utf8 ).
i think the easist solution is to do a dump, recreate the database
with a proper encoding, and restore the dump.

jef peeraer

TIA

-b

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

In my experience ASCII_SQL will let you put anything in there. You need
to figure out the actual encoding of the data. Is it LATIN1? Is it
UTF-8? UTF-16? I found that my old ASCII_SQL dbs, before they were
converted to unicode, contained 99.9% LATIN1 chars but also had a few
random weird characters thrown in from people copying and pasting from
office. For instance MS Word uses these non-ascii standard characters
to implement it's "magic quotes" or whatever they call it where the
quotes curl in towards each other.

I had to identify what the bad chars were. I think that viewing the
dump in vi showed me the hex codes for the non-ascii chars. Then I
changed the encoding specified at the top of the dump as LATIN1. Then I
used sed to remove them as I piped it into a postgres unicode db.

Rick