Support for \u0000?

Started by Matthew Byrnealmost 9 years ago5 messagesgeneral
Jump to latest
#1Matthew Byrne
mjw.byrne@gmail.com

Are there any plans to support \u0000 in JSONB and, relatedly, UTF code
point 0 in TEXT? To the best of my knowledge \u0000 is valid in JSON and
code point 0 is valid in UTF-8 but Postgres rejects both, which severely
limits its usefulness in many cases.

I am currently working around the issue by using the JSON type, which
allows \u0000 to be stored, but this is far from ideal because it can't be
cast to TEXT or JSONB and can't even be accessed:

mydb=# select '{"thing":"\u0000"}'::json->>'thing';
ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: {"thing":...

Regards,

Matt

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew Byrne (#1)
Re: Support for \u0000?

Matthew Byrne <mjw.byrne@gmail.com> writes:

Are there any plans to support \u0000 in JSONB and, relatedly, UTF code
point 0 in TEXT?

No. It's basically never going to happen because of the widespread use
of C strings (nul-terminated strings) inside the backend. Making \0 a
legal member of strings would break all those internal APIs, requiring
touching far more code than anyone would want to do. It'd likely break
a great deal of client-side code as well.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Matthew Byrne
mjw.byrne@gmail.com
In reply to: Tom Lane (#2)
Re: Support for \u0000?

Thanks for the response Tom. I understand this would be a mammoth task.

Would a more feasible approach be to introduce new types (say, TEXT2 and
JSONB2 - or something better-sounding) which are the same as the old ones
but add for support \u0000 and UTF 0? This would isolate nul-containing
byte arrays to the implementations of those types and keep backward
compatibility by leaving TEXT and JSONB alone.

Matt

On Wed, Jul 19, 2017 at 7:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Matthew Byrne <mjw.byrne@gmail.com> writes:

Are there any plans to support \u0000 in JSONB and, relatedly, UTF code
point 0 in TEXT?

No. It's basically never going to happen because of the widespread use
of C strings (nul-terminated strings) inside the backend. Making \0 a
legal member of strings would break all those internal APIs, requiring
touching far more code than anyone would want to do. It'd likely break
a great deal of client-side code as well.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew Byrne (#3)
Re: Support for \u0000?

Matthew Byrne <mjw.byrne@gmail.com> writes:

Would a more feasible approach be to introduce new types (say, TEXT2 and
JSONB2 - or something better-sounding) which are the same as the old ones
but add for support \u0000 and UTF 0? This would isolate nul-containing
byte arrays to the implementations of those types and keep backward
compatibility by leaving TEXT and JSONB alone.

The problem is not inside those datatypes; either text or jsonb could
trivially store \0 bytes. The problem is passing such values through
APIs that don't support it. Changing those APIs would affect *all*
datatypes.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Matthew Byrne
mjw.byrne@gmail.com
In reply to: Tom Lane (#4)
Re: Support for \u0000?

I see. Thanks for the quick responses!

On Wed, Jul 19, 2017 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Matthew Byrne <mjw.byrne@gmail.com> writes:

Would a more feasible approach be to introduce new types (say, TEXT2 and
JSONB2 - or something better-sounding) which are the same as the old ones
but add for support \u0000 and UTF 0? This would isolate nul-containing
byte arrays to the implementations of those types and keep backward
compatibility by leaving TEXT and JSONB alone.

The problem is not inside those datatypes; either text or jsonb could
trivially store \0 bytes. The problem is passing such values through
APIs that don't support it. Changing those APIs would affect *all*
datatypes.

regards, tom lane