BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

Started by PG Bug reporting formover 3 years ago7 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17615
Logged by: Souvik Chattopadhyay
Email address: chatterjeesouvik.besu@gmail.com
PostgreSQL version: 10.21
Operating system: CentOS 7.9
Description:

Hi,

Getting the below error while inserting records into the table:
invalid byte sequence for encoding "UTF8": 0xae

Insert statement:
insert into xx_test values ('Remmo® 20 Tablet');

Regards,
Souvik

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

PG Bug reporting form <noreply@postgresql.org> writes:

Getting the below error while inserting records into the table:
invalid byte sequence for encoding "UTF8": 0xae

That is, in fact, an invalidly-encoded character per UTF8 rules,
so I see no reason to think there is any Postgres bug here.
What's more likely is that you haven't set client_encoding to
match the encoding of the data you're trying to insert.

regards, tom lane

#3Souvik Chatterjee
chatterjeesouvik.besu@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

We have set the client encoding to UTF-8, but still error is coming.

This is getting saved properly in Oracle databases, then what's the issue
postgres?

regards,
Souvik Chattopadhyay

On Fri, 16 Sept 2022, 01:03 Tom Lane, <tgl@sss.pgh.pa.us> wrote:

Show quoted text

PG Bug reporting form <noreply@postgresql.org> writes:

Getting the below error while inserting records into the table:
invalid byte sequence for encoding "UTF8": 0xae

That is, in fact, an invalidly-encoded character per UTF8 rules,
so I see no reason to think there is any Postgres bug here.
What's more likely is that you haven't set client_encoding to
match the encoding of the data you're trying to insert.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Souvik Chatterjee (#3)
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes:

We have set the client encoding to UTF-8, but still error is coming.

That is exactly what you *shouldn't* do, because the data you are sending
is evidently not in UTF8. It's probably some LATINn variant.

This is getting saved properly in Oracle databases, then what's the issue
postgres?

[ shrug... ] It's likely a matter of what software stack you have on
the client side, not which server you're using exactly.

regards, tom lane

#5Francisco Olarte
folarte@peoplecall.com
In reply to: Souvik Chatterjee (#3)
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

On Fri, 16 Sept 2022 at 09:37, Souvik Chatterjee
<chatterjeesouvik.besu@gmail.com> wrote:

We have set the client encoding to UTF-8, but still error is coming.

It seems you have got it backwards. From your description it seems
like your client encoding is utf-8 ( the other usual encodings do not
have this kind of problems, as all the byte sequences are valid in
them ) and you are sending the data in a different one. (set client
encoding means you tell the server "I am going to send you utf8", then
you send invalid utf-8 ( my bet is on windows-1252, if client on
windows ( the usual suspect ), or latin-1 if client on *ix ( rarer, as
nearly all unix work in utf-8 these days ) and the server tells you
so.

Try what you are doing with client encoding win-1252 ( look up the
exact name in the manual, I may be wrong ) to see if it does what you
want.

This is getting saved properly in Oracle databases, then what's the issue postgres?

These seem like pilot error to me. Probably oracle tools use another
encoding by default, so you are not doing the same thing here and
comparing apples to oranges.

BTW, this does not even remotely look like a bug to me, you will
probably get more enthusiastic and / or detailed responses in one of
the general lists, I replied to this because it was the first message
and I thought I had oppened the general list, and only noticed it was
a bug report when I hit your bottom quote, had I noticed it I would
probably just have answered "Does not look like a bug, but pilot
error".

Francisco Olarte.

#6Souvik Chatterjee
chatterjeesouvik.besu@gmail.com
In reply to: Tom Lane (#4)
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

So you meant to say registered trademark: ®
is not a valid UTF-8 character?

Seems strange to me.

regards,
Souvik Chattopadhyay

On Fri, 16 Sept 2022, 08:39 Tom Lane, <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes:

We have set the client encoding to UTF-8, but still error is coming.

That is exactly what you *shouldn't* do, because the data you are sending
is evidently not in UTF8. It's probably some LATINn variant.

This is getting saved properly in Oracle databases, then what's the issue
postgres?

[ shrug... ] It's likely a matter of what software stack you have on
the client side, not which server you're using exactly.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Souvik Chatterjee (#6)
Re: BUG #17615: Getting error while inserting records in the table: invalid byte sequence for encoding "UTF8": 0xae

Souvik Chatterjee <chatterjeesouvik.besu@gmail.com> writes:

So you meant to say registered trademark: ®
is not a valid UTF-8 character?

I'm sure that there is such a Unicode character, but the way you
are presenting it to the database is not UTF-8. It's some other
character encoding, probably a single-byte encoding such as a
member of the ISO 8859 family [1]https://en.wikipedia.org/wiki/ISO/IEC_8859. I see in the table there
that code 0xAE is the trademark symbol in 8859-1 (LATIN1) and
some but not all of the other variants. You need to arrange
for the proper encoding conversion to happen. Perhaps reading [2]https://www.postgresql.org/docs/current/multibyte.html
would help.

regards, tom lane

[1]: https://en.wikipedia.org/wiki/ISO/IEC_8859
[2]: https://www.postgresql.org/docs/current/multibyte.html