invalid byte sequence for encoding "UTF8": 0xab

Started by Grand, Mark D.almost 17 years ago6 messagesgeneral
Jump to latest
#1Grand, Mark D.
mgrand@emory.edu

I am having a vexing problem with a script I am writing to populate reference tables in a new database.

I am running postgreSQL 8.3 with psql 8.3.7.
Psql reads this SQL statement:
INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
VALUES ('Super-User Authorization',
'This allows a super-user to administer all meta-data.',
'UserID <Administer> ()');

and I get this message:
ERROR: invalid byte sequence for encoding "UTF8": 0xab
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

It is complaining about the '<' character. I do not understand why. The database is created the commands
CREATE DATABASE mayyou
WITH OWNER=meta_auth ENCODING='UTF8';
ALTER DATABASE mayyou SET client_encoding = 'UTF8';

When I give psql the \encoding command, it replies
UTF8

Why is it complaining about this valid character code?

________________________________
This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Grand, Mark D. (#1)
Re: invalid byte sequence for encoding "UTF8": 0xab

"Grand, Mark D." <mgrand@emory.edu> writes:

... I get this message:
ERROR: invalid byte sequence for encoding "UTF8": 0xab
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

It is complaining about the '<' character. I do not understand why.

The ASCII code for '<' is 0x3c, not 0xab. I am not sure what you are
actually typing; although it's suggestive that the LATIN1 code 0xab
corresponds to a symbol that looks approximately like '<<'. The most
likely bet is that you are typing the wrong thing and using a terminal
emulator that is not set to generate UTF8-encoded characters. You
should try to make sure that client_encoding is set to match what your
keyboard actually generates.

regards, tom lane

#3Vick Khera
vivek@khera.org
In reply to: Tom Lane (#2)
Re: invalid byte sequence for encoding "UTF8": 0xab

On Fri, Jun 5, 2009 at 9:57 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

The ASCII code for '<' is 0x3c, not 0xab.  I am not sure what you are
actually typing; although it's suggestive that the LATIN1 code 0xab
corresponds to a symbol that looks approximately like '<<'.  The most
likely bet is that you are typing the wrong thing and using a terminal

Must be something with your mail program, because in the version I am
reading postgres is complaining about the "approximately like '<<'"
symbol.

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Grand, Mark D. (#1)
Re: invalid byte sequence for encoding "UTF8": 0xab

Mark D. Grand wrote:

I am having a vexing problem with a script I am writing to
populate reference tables in a new database.

I am running postgreSQL 8.3 with psql 8.3.7.

Psql reads this SQL statement:

INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
VALUES ('Super-User Authorization',
'This allows a super-user to administer all meta-data.',
'UserID «Administer» ()');

and I get this message:

ERROR: invalid byte sequence for encoding "UTF8": 0xab

HINT: This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".

It is complaining about the '«' character. I do not
understand why. The database is created the commands

CREATE DATABASE mayyou
WITH OWNER=meta_auth ENCODING='UTF8';

ALTER DATABASE mayyou SET client_encoding = 'UTF8';

When I give psql the \encoding command, it replies
UTF8

Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo "«" | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe

#5Grand, Mark D.
mgrand@emory.edu
In reply to: Laurenz Albe (#4)
Re: invalid byte sequence for encoding "UTF8": 0xab

It turns out that my problem was that the editor I was using (emacs) does not properly support utf8 encoding.

-----Original Message-----
From: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
Sent: Monday, June 08, 2009 5:59 AM
To: Grand, Mark D.; pgsql-general@postgresql.org
Subject: RE: [GENERAL] invalid byte sequence for encoding "UTF8": 0xab

Mark D. Grand wrote:

I am having a vexing problem with a script I am writing to
populate reference tables in a new database.

I am running postgreSQL 8.3 with psql 8.3.7.

Psql reads this SQL statement:

INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
VALUES ('Super-User Authorization',
'This allows a super-user to administer all meta-data.',
'UserID <Administer> ()');

and I get this message:

ERROR: invalid byte sequence for encoding "UTF8": 0xab

HINT: This error can also happen if the byte sequence does
not match the encoding expected by the server, which is
controlled by "client_encoding".

It is complaining about the '<' character. I do not
understand why. The database is created the commands

CREATE DATABASE mayyou
WITH OWNER=meta_auth ENCODING='UTF8';

ALTER DATABASE mayyou SET client_encoding = 'UTF8';

When I give psql the \encoding command, it replies
UTF8

Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo "<" | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

#6Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Grand, Mark D. (#5)
Re: invalid byte sequence for encoding "UTF8": 0xab

"Grand, Mark D." <mgrand@emory.edu> writes:

It turns out that my problem was that the editor I was using (emacs)
does not properly support utf8 encoding.

Emacs does support utf8 properly.
http://www.emacswiki.org/emacs/ChangingEncodings

It could be I'm biased because I use emacs from CVS, which is going to
be emacs23, and is as stable as emacs has always been for me.
http://emacs.orebokech.com/
http://atomized.org/wp-content/cocoa-emacs-nightly/

From within emacs, to get a ton of information about char under point,
try C-x = (one line version) or M-x describe-char (full version): <
Char: < (60, #o74, #x3c) point=1312 of 4162 (31%) <301-4163> column=66

character: < (60, #o74, #x3c)
preferred charset: ascii (ASCII (ISO646 IRV))
code point: 0x3C
syntax: . which means: punctuation
category: .:Base, a:ASCII, l:Latin, r:Roman
buffer code: #x3C
file code: #x3C (encoded by coding system utf-8-emacs)
display: by this font (glyph code)
xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-16-*-*-*-m-0-iso10646-1 (#x1F)

But I guess we're off topic now.

HTH, regards,
--
dim