Invalid Unicode escape value at or near "\u0000"

Started by Japin Liabout 4 years ago5 messages
#1Japin Li
japinli@hotmail.com

Hi, hackers

When I try to insert an Unicode "\u0000", there is an error $subject.

postgres=# CREATE TABLE tbl (s varchar(10));
CREATE TABLE
postgres=# INSERT INTO tbl VALUES (E'\u0000');
ERROR: invalid Unicode escape value at or near "\u0000"
LINE 1: INSERT INTO tbl VALUES (E'\u0000');
^

"\u0000" is valid unicode [1]https://www.unicode.org/charts/PDF/U0000.pdf, why not we cannot insert it?

[1]: https://www.unicode.org/charts/PDF/U0000.pdf

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Japin Li (#1)
Re: Invalid Unicode escape value at or near "\u0000"

On Sat, Nov 13, 2021 at 4:32 PM Japin Li <japinli@hotmail.com> wrote:

When I try to insert an Unicode "\u0000", there is an error $subject.

postgres=# CREATE TABLE tbl (s varchar(10));
CREATE TABLE
postgres=# INSERT INTO tbl VALUES (E'\u0000');
ERROR: invalid Unicode escape value at or near "\u0000"
LINE 1: INSERT INTO tbl VALUES (E'\u0000');
^

"\u0000" is valid unicode [1], why not we cannot insert it?

Yes, it is a valid codepoint, but unfortunately PostgreSQL can't
support it because it sometimes deals in null terminated string, even
though internally it does track string data and length separately. We
have to do that to use libc facilities like strcoll_r(), and probably
many other things.

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Thomas Munro (#2)
Re: Invalid Unicode escape value at or near "\u0000"

On 11/13/21 00:40, Thomas Munro wrote:

On Sat, Nov 13, 2021 at 4:32 PM Japin Li <japinli@hotmail.com> wrote:

When I try to insert an Unicode "\u0000", there is an error $subject.

postgres=# CREATE TABLE tbl (s varchar(10));
CREATE TABLE
postgres=# INSERT INTO tbl VALUES (E'\u0000');
ERROR: invalid Unicode escape value at or near "\u0000"
LINE 1: INSERT INTO tbl VALUES (E'\u0000');
^

"\u0000" is valid unicode [1], why not we cannot insert it?

Yes, it is a valid codepoint, but unfortunately PostgreSQL can't
support it because it sometimes deals in null terminated string, even
though internally it does track string data and length separately. We
have to do that to use libc facilities like strcoll_r(), and probably
many other things.

And it's documented at
<https://www.postgresql.org/docs/current/datatype-character.html&gt;:

The characters that can be stored in any of these data types are
determined by the database character set, which is selected when the
database is created. Regardless of the specific character set, the
character with code zero (sometimes called NUL) cannot be stored.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#4Japin Li
japinli@hotmail.com
In reply to: Andrew Dunstan (#3)
Re: Invalid Unicode escape value at or near "\u0000"

On Sat, 13 Nov 2021 at 21:52, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/13/21 00:40, Thomas Munro wrote:

On Sat, Nov 13, 2021 at 4:32 PM Japin Li <japinli@hotmail.com> wrote:

When I try to insert an Unicode "\u0000", there is an error $subject.

postgres=# CREATE TABLE tbl (s varchar(10));
CREATE TABLE
postgres=# INSERT INTO tbl VALUES (E'\u0000');
ERROR: invalid Unicode escape value at or near "\u0000"
LINE 1: INSERT INTO tbl VALUES (E'\u0000');
^

"\u0000" is valid unicode [1], why not we cannot insert it?

Yes, it is a valid codepoint, but unfortunately PostgreSQL can't
support it because it sometimes deals in null terminated string, even
though internally it does track string data and length separately. We
have to do that to use libc facilities like strcoll_r(), and probably
many other things.

And it's documented at
<https://www.postgresql.org/docs/current/datatype-character.html&gt;:

The characters that can be stored in any of these data types are
determined by the database character set, which is selected when the
database is created. Regardless of the specific character set, the
character with code zero (sometimes called NUL) cannot be stored.

Thanks Thomas and Andrew. Sorry for my ignore reading of the documentation.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

#5Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Thomas Munro (#2)
Re: Invalid Unicode escape value at or near "\u0000"

On 13.11.21 06:40, Thomas Munro wrote:

Yes, it is a valid codepoint, but unfortunately PostgreSQL can't
support it because it sometimes deals in null terminated string, even
though internally it does track string data and length separately. We
have to do that to use libc facilities like strcoll_r(), and probably
many other things.

By the way, I think the server-side issues around this are fixable. The
real problem in my mind is how to present such values in client APIs
such as libpq.