COPY doesn't works when containing 'ñ' or 'à' characters on db

Started by Jaume Teixialmost 25 years ago10 messages
#1Jaume Teixi
teixi@6tems.com

I finally percated that when data contains '�' or '�' it's impossible to
parse trought:

COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g

it causes:

SELECT edicion FROM products;
edicion
-----------------
Espa�a|Nacional <-------puts on the same cell either there's an '|' in
the middle!!!

but changing '�' for n

SELECT edicion FROM products;
edicion
-----------------
Espana <---------------it separates cells ok

so what's my solution for a text to COPY containing such characters?

best regards,
jaume

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jaume Teixi (#1)
Re: COPY doesn't works when containing ' ' or ' ' characters on db

Jaume Teixi <teixi@6tems.com> writes:

I finally percated that when data contains '�' or '�' it's impossible to
parse trought:

COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g

it causes:

SELECT edicion FROM products;
edicion
-----------------
Espa�a|Nacional <-------puts on the same cell either there's an '|' in
the middle!!!

Very odd. What LOCALE and multibyte encodings are you using, if any?
This seems like it must be a multibyte issue, but I can't guess what.

Also, which Postgres version are you running? If you said, I missed it.

regards, tom lane

#3Jaume Teixi
teixi@6tems.com
In reply to: Tom Lane (#2)
Re: SOLVED: COPY doesn't works when containing ' ' or ' ' characters on db

On Mon, 26 Feb 2001 22:16:35 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jaume Teixi <teixi@6tems.com> writes:

I finally percated that when data contains '�' or '�' it's impossible

to

parse trought:

COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|'

\g

it causes:

SELECT edicion FROM products;
edicion
-----------------
Espa�a|Nacional <-------puts on the same cell either there's an '|'

in

the middle!!!

I finally, thanks to Oliver Elphick,

managed to create database with:
CREATE DATABASE "demo" WITH ENCODING = 'SQL_ASCII'

and data was imported OK, great, thanks!

#4Oliver Elphick
olly@lfix.co.uk
In reply to: Tom Lane (#2)
Re: COPY doesn't works when containing ' ' or ' ' characters on db

Tom Lane wrote:

Jaume Teixi <teixi@6tems.com> writes:

I finally percated that when data contains '' or '' it's impossible to
parse trought:

COPY products FROM '/var/lib/postgres/dadesi.txt' USING DELIMITERS '|' \g

it causes:

SELECT edicion FROM products;
edicion
-----------------
Espaa|Nacional <-------puts on the same cell either there's an '|' in
the middle!!!

Very odd. What LOCALE and multibyte encodings are you using, if any?
This seems like it must be a multibyte issue, but I can't guess what.

Also, which Postgres version are you running? If you said, I missed it.

I think this happens when the front-end encoding is SQL_ASCII and the
database is using UNICODE. Then, there are misunderstandings between
front-end and back-end, so that a single character with the eighth bit
set may be sent by the front-end and interpreted by the back-end as the
first half of a UNICODE two-byte character.

--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
PGP: 1024R/32B8FAA1: 97 EA 1D 47 72 3F 28 47 6B 7E 39 CC 56 E4 C1 47
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C
========================================
"If we confess our sins, he is faithful and just to
forgive us our sins, and to cleanse us from all
unrighteousness." I John 1:9

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oliver Elphick (#4)
Re: COPY doesn't works when containing ' ' or ' ' characters on db

"Oliver Elphick" <olly@lfix.co.uk> writes:

I think this happens when the front-end encoding is SQL_ASCII and the
database is using UNICODE. Then, there are misunderstandings between
front-end and back-end, so that a single character with the eighth bit
set may be sent by the front-end and interpreted by the back-end as the
first half of a UNICODE two-byte character.

I wondered about that, but his examples had one or more characters
between the eighth-bit-set character and the '|', so this doesn't seem
to explain the problem.

Still, if it went away after moving to ASCII encoding, it clearly is
a multibyte issue of some sort.

regards, tom lane

#6Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#5)
Re: [HACKERS] Re: COPY doesn't works when containing ' ' or ' ' characters on db

"Oliver Elphick" <olly@lfix.co.uk> writes:

I think this happens when the front-end encoding is SQL_ASCII and the
database is using UNICODE. Then, there are misunderstandings between
front-end and back-end, so that a single character with the eighth bit
set may be sent by the front-end and interpreted by the back-end as the
first half of a UNICODE two-byte character.

I wondered about that, but his examples had one or more characters
between the eighth-bit-set character and the '|', so this doesn't seem
to explain the problem.

No.

From Jaume's example:

SELECT edicion FROM products;
edicion
-----------------
Espa���a|Nacional <-------puts on the same cell either there's an '|' in
the middle!!!

\361 == 0xf1. UTF-8 assumes that:

if (the first byte) & 0xe0 == 0xe0, then the letter consists of 3
bytes.

So PostgreSQL believes that "���a|" is one UTF-8 letter and eat up
'|'.

My guess is Jaume made an UNICODE database but provided it ISO 8859-1
or that kind of single-byte latin encoding data.

I'm wondering why so many people are using UTF-8 database even he does
not understand what UTF-8 is:-) I hope 7.1 would solve this kind of
confusion by enabling an automatic encoding conversion between UTF-8
and others.
--
Tatsuo Ishii

#7Rainer Mager
rmager@vgkk.com
In reply to: Tom Lane (#2)
RE: COPY doesn't works when containing ' ' or ' ' characters on db

I haven't been following this thread very carefully but I just remembered a
similar problem we had that is probably related. We did a dump from a UTF-8
db containind English, Japanese, and Korean data. When the dump was done in
the default mode (e.g., via COPY statements) then we could no restore it. It
would die on certain characters. We then tried dumping in with -nd flags.
This fixed the problem for us although the restore is a lot slower.

--Rainer

Show quoted text

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Tuesday, February 27, 2001 12:17 PM
To: Jaume Teixi
Cc: pgsql-hackers@postgresql.org; pgsql-admin@postgresql.org; Richard T.
Robino; Stefan Huber
Subject: Re: [ADMIN] COPY doesn't works when containing ' ' or ' '
characters on db

Jaume Teixi <teixi@6tems.com> writes:

I finally percated that when data contains '・ or '・ it's impossible to
parse trought:

COPY products FROM '/var/lib/postgres/dadesi.txt' USING

DELIMITERS '|' \g

it causes:

SELECT edicion FROM products;
edicion
-----------------
Espa����|Nacional <-------puts on the same cell either there's an '|' in
the middle!!!

Very odd. What LOCALE and multibyte encodings are you using, if any?
This seems like it must be a multibyte issue, but I can't guess what.

Also, which Postgres version are you running? If you said, I missed it.

regards, tom lane

#8Rainer Mager
rmager@vgkk.com
In reply to: Jaume Teixi (#3)
log files

Hi all,

Is there anyway to get the debug (-d2) log files to mark each transaction
with a unique ID. We're trying to debug dead locks and the transactions seem
to be mixed together somewhat.

Thanks,

--Rainer

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rainer Mager (#8)
Re: log files

"Rainer Mager" <rmager@vgkk.com> writes:

Is there anyway to get the debug (-d2) log files to mark each transaction
with a unique ID.

Not per-transaction, but there's an option to include the backend PID,
which should help.

regards, tom lane

#10Rainer Mager
rmager@vgkk.com
In reply to: Tom Lane (#9)
Postgres <-> Oracle

Hi all,

We have an application that runs on both Postgres and Oracle. One problem
we've been facing as maintaining the the installed/default database for the
application. Once it is up and running, things are fine, but since we
primarily develop on Postgres we sometimes hit problems when it is time to
convert all of our work to Oracle. I was wondering if anyone knows of any
tools that take a Postgres dump and convert it to something Oracle can
accept?

Thanks,

--Rainer