Hex escapes in strings only support ASCII range
Escaped strings [1] fail on hex values greater than \x7f with a factually incorrect error message:
$ psql --command "select E'\x80';"
ERROR: invalid byte sequence for encoding "UTF8": 0x80
$ psql --version
psql (PostgreSQL) 13.1
I was able to reproduce this bug on psql 10.
Workaround:
$ psql --command "select E'\u0080';"
?column?
----------
\u0080
(1 row)
1: https://www.postgresql.org/docs/13/sql-syntax-lexical.html#id-1.5.3.5.9.5.3
On Mon, Mar 8, 2021 at 2:43 PM Vilem Benjamin Liepelt
<vileml@sqreamtech.com> wrote:
Escaped strings [1] fail on hex values greater than \x7f with a factually incorrect error message:
$ psql --command "select E'\x80';"
ERROR: invalid byte sequence for encoding "UTF8": 0x80$ psql --version
psql (PostgreSQL) 13.1I was able to reproduce this bug on psql 10.
Workaround:
$ psql --command "select E'\u0080';"
?column?
----------
\u0080
(1 row)
0x80 is not equivalent to \u0080 in UTF8. You may use hex values
greater than 0x7F but you still have to produce valid byte sequences
for your target encoding or switch to an encoding that does not
validate those sequences.
postgres=# select E'\xc2\x80';
?column?
----------
\u0080
(1 row)
--
Regards, Sven Klemm
Thank you for the clarification, I misunderstood the documentation. Your explanation makes a lot of sense, my bad for opening a bug about this.
Show quoted text
On 8 Mar 2021, at 15:25, Sven Klemm <sven@timescale.com> wrote:
On Mon, Mar 8, 2021 at 2:43 PM Vilem Benjamin Liepelt
<vileml@sqreamtech.com> wrote:Escaped strings [1] fail on hex values greater than \x7f with a factually incorrect error message:
$ psql --command "select E'\x80';"
ERROR: invalid byte sequence for encoding "UTF8": 0x80$ psql --version
psql (PostgreSQL) 13.1I was able to reproduce this bug on psql 10.
Workaround:
$ psql --command "select E'\u0080';"
?column?
----------
\u0080
(1 row)0x80 is not equivalent to \u0080 in UTF8. You may use hex values
greater than 0x7F but you still have to produce valid byte sequences
for your target encoding or switch to an encoding that does not
validate those sequences.postgres=# select E'\xc2\x80';
?column?
----------
\u0080
(1 row)--
Regards, Sven Klemm