Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

Started by Zach Seamanover 13 years ago12 messagesgeneral

znseaman@gmail.com

over 13 years ago

I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this
one</messages/by-id/4dda42060512140509xe8b130as@mail.gmail.com>,
so I have encoded a database with LATIN-1 as suggested but can't copy a CSV
file into a table within the database.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

Googling doesn't get me anywhere and I am working with Spanish characters.

Thanks again all,

Zach Seaman

Gurjeet Singh

gurjeet@singh.im

over 13 years ago

In reply to: Zach Seaman (#1)

Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:

I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this one</messages/by-id/4dda42060512140509xe8b130as@mail.gmail.com>,
so I have encoded a database with LATIN-1 as suggested but can't copy a CSV
file into a table within the database.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

Googling doesn't get me anywhere and I am working with Spanish characters.

I think the data in your CSV file should match the client_encoding
parameter.

What is your client_encoding parameter set to?

show client_encoding;

--
Gurjeet Singh

http://gurjeet.singh.im/

Gavan Schneider

pg-gts@snkmail.com

over 13 years ago

In reply to: Zach Seaman (#1)

Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

On Wednesday, February 6, 2013 at 11:56,
2jt6w5k7mt@sneakemail.com (Zach Seaman znseaman-at-gmail.com
|pg-gts/Basic|) wrote:

This a similar question to this one
</messages/by-id/4dda42060512140509xe8b130as@mail.gmail.com>,
so I have encoded a database with LATIN-1 as suggested
but can't copy a CSV file into a table within the database.

I may have missed something here... why would anyone suggest
LATIN-1 in modern times?

UTF-8 will do all of LATIN-1 and everything else as well. Except
for legacy support why would anyone use anything other than
UTF-8? (Of course there are those where UTF-16 is a better
choice for their dominant language use, e.g. chinese.)

Suggest you use UTF-8 database encoding and if there are no
problems importing the .csv stay with UTF-8. OTOH if there are
still problems when using UTF-8, stay with UTF-8 while you work
out what it is in the .csv file that's causing the problem.

Regards
Gavan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jaime Casanova

jcasanov@systemguards.com.ec

over 13 years ago

In reply to: Zach Seaman (#1)

Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:

I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this one, so I have encoded a database with
LATIN-1 as suggested but can't copy a CSV file into a table within the
database.

well, that mail is from 2005... what version of postgres are you running at?

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

run:

SET client_encoding TO UTF8;

before running the copy command, or maybe set to LATIN1

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Ken Benson

Ken@infowerks.com

over 13 years ago

In reply to: Jaime Casanova (#4)

Re: Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

I think the problem may be that specific character translation.

The chart I typically use is here:
http://www.utf8-chartable.de/unicode-utf8-table.pl

The 'valid' UTF-8 codes jump from /*0x e0 bf bf*/ (at the bottom of this
page: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=3840 )
To: /*0x e1 80 80*/ (at the top of this page:
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4096

So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.

Jut my thoughts.

Ken

Show quoted text

On 2/7/2013 7:03 AM, Jaime Casanova wrote:

On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:

I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this one, so I have encoded a database with
LATIN-1 as suggested but can't copy a CSV file into a table within the
database.

well, that mail is from 2005... what version of postgres are you running at?

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

run:

SET client_encoding TO UTF8;

before running the copy command, or maybe set to LATIN1

Zach Seaman

znseaman@gmail.com

over 13 years ago

In reply to: Jaime Casanova (#4)

Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

I'm running PostgreSQL 9.1

On Thu, Feb 7, 2013 at 9:03 AM, Jaime Casanova <jaime@2ndquadrant.com>wrote:

On Wed, Feb 6, 2013 at 7:56 PM, Zach Seaman <znseaman@gmail.com> wrote:

I'm fairly new to PostgreSQL 9.1 but I need it, so here I am.

This a similar question to this one, so I have encoded a database with
LATIN-1 as suggested but can't copy a CSV file into a table within the
database.

well, that mail is from 2005... what version of postgres are you running
at?

ERROR: invalid byte sequence for encoding "UTF8": 0xe17371

run:

SET client_encoding TO UTF8;

before running the copy command, or maybe set to LATIN1

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157

--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Ken Benson (#5)

Re: Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

Ken Benson <ken@infowerks.com> writes:

So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.

Of course it isn't, which is why Postgres is complaining. Presumably
what that data really is is three characters (looks like "�sq") in
LATIN1. But Postgres is trying to interpret it in UTF8. As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.

regards, tom lane

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Zach Seaman

znseaman@gmail.com

over 13 years ago

In reply to: Tom Lane (#7)

Re: [NOVICE] Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]

Is it a trial and error type problem now?

On Thu, Feb 7, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ken Benson <ken@infowerks.com> writes:

So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.

Of course it isn't, which is why Postgres is complaining. Presumably
what that data really is is three characters (looks like "ásq") in
LATIN1. But Postgres is trying to interpret it in UTF8. As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.

regards, tom lane

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*

Zach Seaman

znseaman@gmail.com

over 13 years ago

In reply to: Zach Seaman (#8)

Re: [NOVICE] Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

Keeping the names, in tact, would be helpful. Whatever I change it to, I
receive the same error because of the first entry.

I've encoded the csv using Notepad++ to UTF8 and still no luck.

I think "á" followed by the next 2 characters causes the problem. Is there
a better encoding for special characters? Is this possible in WIN-1252?

On Thu, Feb 7, 2013 at 10:51 AM, Zach Seaman <znseaman@gmail.com> wrote:

I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]

Is it a trial and error type problem now?

On Thu, Feb 7, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ken Benson <ken@infowerks.com> writes:

So - the problem may be that /*truly**0x e1 73 71*/ is not a valid UTF-8
character in the current iteration of PostgreSQL - or at all.

Of course it isn't, which is why Postgres is complaining. Presumably
what that data really is is three characters (looks like "ásq") in
LATIN1. But Postgres is trying to interpret it in UTF8. As mentioned
upthread, the solution is to adjust the client_encoding setting before
running the COPY command.

regards, tom lane

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*

--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*

#10

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Zach Seaman (#8)

Re: Re: [NOVICE] Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

Zach Seaman <znseaman@gmail.com> writes:

I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
�s[space]

No, the client encoding needs to be LATIN1 to read this file.

regards, tom lane

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

#11

Zach Seaman

znseaman@gmail.com

over 13 years ago

In reply to: Tom Lane (#10)

Re: [NOVICE] Re: [NOVICE] Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

Ok, client encoding is back to LATIN1.

Do I have to sacrifice the readability of these names or is there a way to
work around this invalid byte sequence problem?

On Thu, Feb 7, 2013 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Zach Seaman <znseaman@gmail.com> writes:

I changed from LATIN1, set my database to UTF8, and my client_encoding is
UTF8.

ERROR: invalid byte sequence for encoding "UTF8": 0xe17320
ás[space]

No, the client encoding needs to be LATIN1 to read this file.

regards, tom lane

--
*Zach Seaman****
GIS Expert, IRRI-México*
*Master of Regional & Community Planning
*
*m 55.2247.1740 (México)
m 01.913.4860.832 (U.S.)
*

#12

Michael Swierczek

mike.swierczek@gmail.com

over 13 years ago

In reply to: Zach Seaman (#9)

Re: [NOVICE] Re: [NOVICE] Re: [NOVICE] Problems with ñ and tildes / CSV import problems in PostgreSQL 9.1

On Thu, Feb 7, 2013 at 12:05 PM, Zach Seaman <znseaman@gmail.com> wrote:

Keeping the names, in tact, would be helpful. Whatever I change it to, I receive the same error because of the first entry.

I've encoded the csv using Notepad++ to UTF8 and still no luck.

I think "á" followed by the next 2 characters causes the problem. Is there a better encoding for special characters? Is this possible in WIN-1252?

Zach,
I've been bitten by this misunderstanding myself. Changing the file
encoding in Notepad++ just changes a few bytes at the very beginning
of the file to indicate that it's supposed to be read as your new
encoding. It does not automatically go through the file converting
character like "à" from its 224 (decimal) character value in LATIN1
encoding to the U+00E0 UTF-8 equivalent. Maybe some other text
editors support actually re-encoding the characters in the file for
you, I don't know.

Good luck,
-Mike Swierczek

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice