Hex escapes in strings only support ASCII range

Started by Vilem Benjamin Liepeltabout 5 years ago3 messagesbugs
Jump to latest
#1Vilem Benjamin Liepelt
vileml@sqreamtech.com

Escaped strings [1] fail on hex values greater than \x7f with a factually incorrect error message:

$ psql --command "select E'\x80';"
ERROR: invalid byte sequence for encoding "UTF8": 0x80

$ psql --version
psql (PostgreSQL) 13.1

I was able to reproduce this bug on psql 10.

Workaround:

$ psql --command "select E'\u0080';"
?column?
----------
\u0080
(1 row)

1: https://www.postgresql.org/docs/13/sql-syntax-lexical.html#id-1.5.3.5.9.5.3

#2Sven Klemm
sven@timescale.com
In reply to: Vilem Benjamin Liepelt (#1)
Re: Hex escapes in strings only support ASCII range

On Mon, Mar 8, 2021 at 2:43 PM Vilem Benjamin Liepelt
<vileml@sqreamtech.com> wrote:

Escaped strings [1] fail on hex values greater than \x7f with a factually incorrect error message:

$ psql --command "select E'\x80';"
ERROR: invalid byte sequence for encoding "UTF8": 0x80

$ psql --version
psql (PostgreSQL) 13.1

I was able to reproduce this bug on psql 10.

Workaround:

$ psql --command "select E'\u0080';"
?column?
----------
\u0080
(1 row)

0x80 is not equivalent to \u0080 in UTF8. You may use hex values
greater than 0x7F but you still have to produce valid byte sequences
for your target encoding or switch to an encoding that does not
validate those sequences.

postgres=# select E'\xc2\x80';

?column?
----------
\u0080
(1 row)

--
Regards, Sven Klemm

#3Vilem Benjamin Liepelt
vileml@sqreamtech.com
In reply to: Sven Klemm (#2)
Re: Hex escapes in strings only support ASCII range

Thank you for the clarification, I misunderstood the documentation. Your explanation makes a lot of sense, my bad for opening a bug about this.

Show quoted text

On 8 Mar 2021, at 15:25, Sven Klemm <sven@timescale.com> wrote:

On Mon, Mar 8, 2021 at 2:43 PM Vilem Benjamin Liepelt
<vileml@sqreamtech.com> wrote:

Escaped strings [1] fail on hex values greater than \x7f with a factually incorrect error message:

$ psql --command "select E'\x80';"
ERROR: invalid byte sequence for encoding "UTF8": 0x80

$ psql --version
psql (PostgreSQL) 13.1

I was able to reproduce this bug on psql 10.

Workaround:

$ psql --command "select E'\u0080';"
?column?
----------
\u0080
(1 row)

0x80 is not equivalent to \u0080 in UTF8. You may use hex values
greater than 0x7F but you still have to produce valid byte sequences
for your target encoding or switch to an encoding that does not
validate those sequences.

postgres=# select E'\xc2\x80';

?column?
----------
\u0080
(1 row)

--
Regards, Sven Klemm