JSON validation behavior

Started by Sergei Kornilovabout 7 years ago6 messages

Hi

We have some json regression tests in src/test/regress/expected/json_encoding_1.out with \u0000 symbol

select json '{ "a": "null \u0000 escape" }' as not_unescaped;
not_unescaped
--------------------------------
{ "a": "null \u0000 escape" }
(1 row)

select json '{ "a": "null \u0000 escape" }' ->> 'a' as fails;
ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...

Well, requested text type can not have \u0000 byte. But seems strange: we test json type with this value but raise same error for -> operator:

melkij=> select json '{ "a": "null \u0000 escape"}' -> 'a' as fails;
ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...

Result was requested in json datatype, like 'SELECT '"\u0000"'::json;' from tests before.
Similar error with access by different key:

melkij=> select json '{ "a": "null \u0000 escape", "b":1 }' ->>'b' as fails;
ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: { "a":...

We allow write such json to table, we allow read whole json, but we can not use native operators. Is this behavior expected?

regards, Sergei

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Sergei Kornilov (#1)
Re: JSON validation behavior

On Wed, Oct 24, 2018 at 7:25 AM Sergei Kornilov <sk@zsrv.org> wrote:

DETAIL: \u0000 cannot be converted to text.

Well, requested text type can not have \u0000 byte. But seems strange: we
test json type with this value but raise same error for -> operator:

We allow write such json to table, we allow read whole json, but we can
not use native operators. Is this behavior expected?

It isn't that different than saying:

'123bcd'::integer -- error, invalid input for type integer

While text can hold just about everything it cannot contain an actual ASCII
NUL character and so a JSON value with a unicode represented NUL cannot be
converted to text. Text doesn't have a stored concept of escaped values,
using escape is only valid during entry.

The following does seem buggy though:

select json '{ "a": "null \u0000 escape"}' -> 'a' as fails;

The final result should be json yet somewhere it seems there is an
intermediate text being constructructed and that implementation detail is
causing an error in an otherwise valid situation.

David J.

In reply to: David G. Johnston (#2)
Re: JSON validation behavior

Hi

24.10.2018, 17:40, "David G. Johnston" <david.g.johnston@gmail.com>:

On Wed, Oct 24, 2018 at 7:25 AM Sergei Kornilov <sk@zsrv.org> wrote:

DETAIL:  \u0000 cannot be converted to text.

Well, requested text type can not have \u0000 byte. But seems strange: we test json type with this value but raise same error for -> operator:

We allow write such json to table, we allow read whole json, but we can not use native operators. Is this behavior expected?

It isn't that different than saying:

'123bcd'::integer -- error, invalid input for type integer

While text can hold just about everything it cannot contain an actual ASCII NUL character and so a JSON value with a unicode represented NUL cannot be converted to text.  Text doesn't have a stored concept of escaped values, using escape is only valid during entry.

Yes, it is reasonable for operators which returns text, such as ->> (and i do not have question on this)
I was surprised by operators with json type result

regards, Sergei

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: JSON validation behavior

"David G. Johnston" <david.g.johnston@gmail.com> writes:

The following does seem buggy though:
select json '{ "a": "null \u0000 escape"}' -> 'a' as fails;

Perhaps, but AFAICS it's entirely accidental that any variant of this
doesn't fail. Although this manages not to fail:

regression=# select json '{ "a": "null \u0000 escape"}' ;
json
------------------------------
{ "a": "null \u0000 escape"}
(1 row)

this does:

regression=# select jsonb '{ "a": "null \u0000 escape"}' ;
ERROR: unsupported Unicode escape sequence

The error message is actually being thrown in the JSON parser, and it
only doesn't get thrown if the parser knows that it's just error-checking
the data and not producing any converted output (cf the
"if (lex->strval != NULL)" block starting at json.c:832). It actually
seems to me that this behavior is a bug, in that there are a bunch of
error checks there (not only this one) that are skipped in the allegedly
"error checking only" path.

But anyway, making it work as suggested here would take a substantial
amount of refactoring, and it would not (I think) work anyway for jsonb,
so it doesn't quite seem worth a lot of work. I could get behind fixing
it to always throw the error, but that's not what Sergei was hoping for.

regards, tom lane

In reply to: Tom Lane (#4)
Re: JSON validation behavior

Hi

I could get behind fixing
it to always throw the error, but that's not what Sergei was hoping for.

On the contrary i think it is reasonable way. It is much better to have error on input value instead of finding wrong value during table processing. We always reject this value for jsonb and i expected the same behavior for json. Not sure about JSON specification compliance, didn't find anything about \u0000 in rfc8259.

regards, Sergei

#6Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: JSON validation behavior

On 10/24/2018 11:54 AM, Tom Lane wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

The following does seem buggy though:
select json '{ "a": "null \u0000 escape"}' -> 'a' as fails;

Perhaps, but AFAICS it's entirely accidental that any variant of this
doesn't fail. Although this manages not to fail:

regression=# select json '{ "a": "null \u0000 escape"}' ;
json
------------------------------
{ "a": "null \u0000 escape"}
(1 row)

this does:

regression=# select jsonb '{ "a": "null \u0000 escape"}' ;
ERROR: unsupported Unicode escape sequence

The error message is actually being thrown in the JSON parser, and it
only doesn't get thrown if the parser knows that it's just error-checking
the data and not producing any converted output (cf the
"if (lex->strval != NULL)" block starting at json.c:832). It actually
seems to me that this behavior is a bug, in that there are a bunch of
error checks there (not only this one) that are skipped in the allegedly
"error checking only" path.

But anyway, making it work as suggested here would take a substantial
amount of refactoring, and it would not (I think) work anyway for jsonb,
so it doesn't quite seem worth a lot of work. I could get behind fixing
it to always throw the error, but that's not what Sergei was hoping for.

I think we might be able to do something that doesn't have too high an
impact. I'll take a look.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services