Encoding and Conversion Question(s)

Started by Dave Lazarover 20 years ago2 messagesgeneral
Jump to latest
#1Dave Lazar
hunkybill@gmail.com

Hi,

I have a database that was created with the encoding set to SQL_ASCII.
A lot of data comes with accented characters. When reading this data
with PHP, and using utf-8 as my broweser output charset, any accented
characters are displayed as weird symbols. If I use the PHP function
utf8_encode() around the data, it all looks fine again.

So, I have decided to simply change the encoding of my database from
SQL_ASCII to UNICODE so that I do not need to use utf8_enocde() in
PHP.

I did a pg_dump of my database. I then created a blank database with
UNICODE as the encoding. However, pg_restore chokes with a message
about not being able to convert a multibyte character properly.

My server settings are en-US.UTF-8 for lc_collate and server encoding
is set to UNICODE.

How can I reload all my data into the UNICODE database I have created?
Is there something to do with the dump? I hope not!! Any tips on this
most appreciated!!

TIA

Dave

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Lazar (#1)
Re: Encoding and Conversion Question(s)

Dave Lazar <hunkybill@gmail.com> writes:

I have a database that was created with the encoding set to SQL_ASCII.
A lot of data comes with accented characters.

You need to figure out what encoding that data is actually in (hint:
it's not ASCII) and specify that encoding as the client_encoding in
the restore script. Postgres will then be able to convert the data
to UTF-8 correctly.

If the data is actually all in one encoding, this shouldn't be too
painful. If it's in a mishmash of different encodings, you are in
for some pain getting things fixed up :-(

regards, tom lane