ON ERROR in json_query and the like

Started by Markus Winandalmost 2 years ago31 messages
Jump to latest
#1Markus Winand
markus.winand@winand.at

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

-markus

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Markus Winand (#1)
Re: ON ERROR in json_query and the like

út 28. 5. 2024 v 11:29 odesílatel Markus Winand <markus.winand@winand.at>
napsal:

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

I remember a talk about this subject years ago. Originally the JSON_QUERY
was designed in similar like Oracle, and casting to jsonb was done inside.
If I remember this behave depends on the fact, so old SQL/JSON has not json
type and it was based just on processing of plain text. But Postgres has
JSON, and JSONB and then was more logical to use these types. And because
the JSON_QUERY uses these types, and the casting is done before the
execution of the function, then the clause ON ERROR cannot be handled.
Moreover, until soft errors Postgres didn't allow handling input errors in
common functions.

I think so this difference should be mentioned in documentation.

Regards

Pavel

Show quoted text

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON
ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

-markus

#3jian he
jian.universality@gmail.com
In reply to: Markus Winand (#1)
Re: ON ERROR in json_query and the like

On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote:

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

previous versions require SQL/JSON query function's context_item to
explicitly cast to jsonb,
if it is not it will error out.

previous version the following query will have a error
select json_value(text '"1"' , 'strict $[*]' DEFAULT 9 ON ERROR);

now it only requires that (context_item) casting to jsonb successfully.
I raise this issue separately at [1]/messages/by-id/CACJufxGWJTa-b0WjNH15otih42PA7SF+e7LbkAb0gThs7ojT5Q@mail.gmail.com

[1]: /messages/by-id/CACJufxGWJTa-b0WjNH15otih42PA7SF+e7LbkAb0gThs7ojT5Q@mail.gmail.com

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

I vaguely remember, we stumbled on ON ERROR, ON EMPTY several times.
i don't have a standard, but the doc seems not explicit enough for the
above example.

in json_query, maybe we can rephrase like:
--------------------
The ON EMPTY clause specifies the behavior if evaluating
path_expression yields no value at all. The default when ON EMPTY is
not specified
and ON ERROR not specified is to return a null value.

The ON ERROR clause specifies the behavior if an error occurs when
evaluating path_expression, including evaluation yields no value at
all and ON EMPTY is not specified, the operation to coerce the result
value to the output type, or during the execution of ON EMPTY behavior
(that is caused by empty result of path_expression evaluation). The
default when ON ERROR is not specified is to return a null value.

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

`
SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
`
I think these sentences addressed the above query.
<<<
or during the execution of ON EMPTY behavior (that is caused by empty
result of path_expression evaluation).
<<<
As you can see, in this context, "execution of ON EMPTY behavior"
works fine, successfully returned null,
so `EMPTY ARRAY ON ERROR` part was ignored.

#4jian he
jian.universality@gmail.com
In reply to: Markus Winand (#1)
Re: ON ERROR in json_query and the like

On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote:

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

json_query ( context_item, path_expression);

`SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);`
to make this return NULL, that means to catch all the errors that
happened while context_item evaluation.
otherwise, it would not be consistent?

Currently context_item expressions can be quite arbitrary.
considering the following examples.

create or replace function test(jsonb) returns jsonb as $$ begin raise
exception 'abort'; end $$ language plpgsql;
create or replace function test1(jsonb) returns jsonb as $$ begin
return $1; end $$ language plpgsql;
SELECT JSON_VALUE(test('1'), '$');
SELECT JSON_VALUE(test1('1'), '$');
SELECT JSON_VALUE((select '1'::jsonb), '$');
SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from cte), '$');
SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from
cte union all select s::jsonb from cte limit 1), '$');

Currently, I don't think we can make
SELECT JSON_VALUE(test('1'), '$' null on error);
return NULL.

#5Markus Winand
markus.winand@winand.at
In reply to: jian he (#4)
Re: ON ERROR in json_query and the like

On 11.06.2024, at 03:58, jian he <jian.universality@gmail.com> wrote:

On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote:

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

json_query ( context_item, path_expression);

`SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);`
to make this return NULL, that means to catch all the errors that
happened while context_item evaluation.
otherwise, it would not be consistent?

Currently context_item expressions can be quite arbitrary.
considering the following examples.

create or replace function test(jsonb) returns jsonb as $$ begin raise
exception 'abort'; end $$ language plpgsql;
create or replace function test1(jsonb) returns jsonb as $$ begin
return $1; end $$ language plpgsql;
SELECT JSON_VALUE(test('1'), '$');
SELECT JSON_VALUE(test1('1'), '$');
SELECT JSON_VALUE((select '1'::jsonb), '$');
SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from cte), '$');
SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from
cte union all select s::jsonb from cte limit 1), '$');

Currently, I don't think we can make
SELECT JSON_VALUE(test('1'), '$' null on error);
return NULL.

This is not how it is meant. Your example is not subject to the ON ERROR
clause because the error happens in a sub-expression. My point is that
ON ERROR includes the String to JSON conversion (the JSON parsing) that
— in the way the standard describes these functions — inside of them.

In the standard, JSON_VALUE & co accept string types as well as the type JSON:

10.14 SR 1: The declared type of the <value expression> simply contained in the <JSON input expression> immediately contained in the <JSON context item> shall be a string type or a JSON type.

It might be best to think of it as two separate functions, overloaded:

JSON_VALUE(context_item JSONB, path_expression …)
JSON_VALUE(context_item TEXT, path_expression …)

Now if you do this:
create function test2(text) returns text as $$ begin
return $1; end $$ language plpgsql;
create function test3(text) returns jsonb as $$ begin
return $1::jsonb; end $$ language plpgsql;

SELECT JSON_VALUE(test2('invalid'), '$' null on error);
SELECT JSON_VALUE(test3('invalid'), '$' null on error);

The first query should return NULL, while the second should (and does) fail.

This is how I understand it.

-markus

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Markus Winand (#5)
Re: ON ERROR in json_query and the like

On Wednesday, June 12, 2024, Markus Winand <markus.winand@winand.at> wrote:

10.14 SR 1: The declared type of the <value expression> simply contained
in the <JSON input expression> immediately contained in the <JSON context
item> shall be a string type or a JSON type.

It might be best to think of it as two separate functions, overloaded:

JSON_VALUE(context_item JSONB, path_expression …)
JSON_VALUE(context_item TEXT, path_expression …)

Yes, we need to document that we deviate from (fail to fully implement) the
standard here in that we only provide jsonb parameter functions, not text
ones.

David J.

#7Markus Winand
markus.winand@winand.at
In reply to: jian he (#3)
Re: ON ERROR in json_query and the like

On 04.06.2024, at 07:00, jian he <jian.universality@gmail.com> wrote:

On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote:

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

I vaguely remember, we stumbled on ON ERROR, ON EMPTY several times.
i don't have a standard,

In my understanding of the standard is that there is no distinction
between an explicit and implicit ON EMPTY clause.

E.g. clause 6.35 (json_query) Syntax Rule 4:

• If <JSON query empty behavior> is not specified, then NULL ON EMPTY is implicit.

General Rule 5ai then covers the NULL ON EMPTY case:

• i) If the length of SEQ2 is 0 (zero) and ONEMPTY is NULL, then let JV be the null value.

Neither of these make the ON EMPTY handling dependent on the presence of ON ERROR.

-markus

#8David G. Johnston
david.g.johnston@gmail.com
In reply to: Markus Winand (#1)
Re: ON ERROR in json_query and the like

On Tuesday, May 28, 2024, Markus Winand <markus.winand@winand.at> wrote:

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON
ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

The docs here don’t seem to cover the on empty clause at all nor fully
cover all options.

Where do you find the claim that the one implies the other? Is it a typo
that your examples says “implies null on empty” but the subject line says
“implies error on empty”?

Without those clauses a result is either empty or an error - they are
mutually exclusive (ignoring matches). I would not expect one clause to
imply or affect the behavior of the other. There is no chaining. The
original result is transformed to the new result specified by the clause.

I’d need to figure out whether the example you show is actually producing
empty or error; but it seems correct if the result is empty. The first
query ignores the error clause - the empty array row seems to be the
representation of empty here; the second one matches the empty clause and
outputs null instead of the empty array.

David J.

#9Markus Winand
markus.winand@winand.at
In reply to: David G. Johnston (#8)
Re: ON ERROR in json_query and the like

On 12.06.2024, at 15:31, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tuesday, May 28, 2024, Markus Winand <markus.winand@winand.at> wrote:

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

The docs here don’t seem to cover the on empty clause at all nor fully cover all options.

Where do you find the claim that the one implies the other? Is it a typo that your examples says “implies null on empty” but the subject line says “implies error on empty”?

I see the confusion caused — sorry. The headline was meant to describe the observed behaviour in 17beta1, while the content refers to how the standard defines it.

Without those clauses a result is either empty or an error - they are mutually exclusive (ignoring matches). I would not expect one clause to imply or affect the behavior of the other. There is no chaining. The original result is transformed to the new result specified by the clause.

Agreed, that’s why I found the 17beta1 behaviour surprising.

I’d need to figure out whether the example you show is actually producing empty or error; but it seems correct if the result is empty.

As I understand the standard, an empty result is not an error.

The first query ignores the error clause - the empty array row seems to be the representation of empty here; the second one matches the empty clause and outputs null instead of the empty array.

But the first should behave the same, as the standard implies NULL ON EMPTY if there is no explicit ON EMPTY clause. Oracle DB behaving differently here makes me wonder if there is something in the standard that I haven’t noticed yet...

-markus

#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Pavel Stehule (#2)
Re: ON ERROR in json_query and the like

Hi,

(apologies for not replying to this thread sooner)

On Tue, May 28, 2024 at 6:57 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:

út 28. 5. 2024 v 11:29 odesílatel Markus Winand <markus.winand@winand.at> napsal:

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

I remember a talk about this subject years ago. Originally the JSON_QUERY was designed in similar like Oracle, and casting to jsonb was done inside. If I remember this behave depends on the fact, so old SQL/JSON has not json type and it was based just on processing of plain text. But Postgres has JSON, and JSONB and then was more logical to use these types. And because the JSON_QUERY uses these types, and the casting is done before the execution of the function, then the clause ON ERROR cannot be handled. Moreover, until soft errors Postgres didn't allow handling input errors in common functions.

I think so this difference should be mentioned in documentation.

Agree that the documentation needs to be clear about this. I'll update
my patch at [1]/messages/by-id/CA+HiwqGdineyHfcTEe0=8jjXonH3qXi4vFB+gRxf1L+xR2v_Pw@mail.gmail.com to add a note next to table 9.16.3. SQL/JSON Query
Functions.

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

This behavior is a bug and result of an unintentional change that I
made at some point after getting involved with this patch set. So I'm
going to fix this so that the empty results of jsonpath evaluation use
NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present.
Attached a patch to do so.

--
Thanks, Amit Langote

[1]: /messages/by-id/CA+HiwqGdineyHfcTEe0=8jjXonH3qXi4vFB+gRxf1L+xR2v_Pw@mail.gmail.com

Attachments:

v1-0001-SQL-JSON-Correctly-enforce-the-default-ON-EMPTY-b.patchapplication/octet-stream; name=v1-0001-SQL-JSON-Correctly-enforce-the-default-ON-EMPTY-b.patchDownload+64-41
#11Chapman Flack
chap@anastigmatix.net
In reply to: Amit Langote (#10)
Re: ON ERROR in json_query and the like

Hi,

On 06/17/24 02:20, Amit Langote wrote:

Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I wonder, could prosupport rewriting be used to detect that the first
argument is supplied by a cast, and rewrite the expression to apply the
cast 'softly'? Or would that behavior be too magical?

Regards,
-Chap

#12Markus Winand
markus.winand@winand.at
In reply to: Amit Langote (#10)
Re: ON ERROR in json_query and the like

On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:

Hi,

(apologies for not replying to this thread sooner)

On Tue, May 28, 2024 at 6:57 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:

út 28. 5. 2024 v 11:29 odesílatel Markus Winand <markus.winand@winand.at> napsal:

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

I remember a talk about this subject years ago. Originally the JSON_QUERY was designed in similar like Oracle, and casting to jsonb was done inside. If I remember this behave depends on the fact, so old SQL/JSON has not json type and it was based just on processing of plain text. But Postgres has JSON, and JSONB and then was more logical to use these types. And because the JSON_QUERY uses these types, and the casting is done before the execution of the function, then the clause ON ERROR cannot be handled. Moreover, until soft errors Postgres didn't allow handling input errors in common functions.

I think so this difference should be mentioned in documentation.

Agree that the documentation needs to be clear about this. I'll update
my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
Functions.

Considering another branch of this thread [1]/messages/by-id/CAKFQuwb50BAaj83Np+1O6xe3_T6DO8w2mxtFbgSbbUng+abrqA@mail.gmail.com I think the
"Supported Features” appendix of the docs should mention that as well.

The way I see it is that the standards defines two overloaded
JSON_QUERY functions, of which PostgreSQL will support only one.
In case of valid JSON, the implied CAST makes it look as though
the second variant of these function was supported as well but that
illusion totally falls apart once the JSON is not valid anymore.

I think it affects the following feature IDs:

- T821, Basic SQL/JSON query operators
For JSON_VALUE, JSON_TABLE and JSON_EXISTS
- T828, JSON_QUERY

Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.

[1]: /messages/by-id/CAKFQuwb50BAaj83Np+1O6xe3_T6DO8w2mxtFbgSbbUng+abrqA@mail.gmail.com

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

This behavior is a bug and result of an unintentional change that I
made at some point after getting involved with this patch set. So I'm
going to fix this so that the empty results of jsonpath evaluation use
NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present.
Attached a patch to do so.

Tested: works.

Thanks :)

-markus

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Markus Winand (#12)
Re: ON ERROR in json_query and the like

po 17. 6. 2024 v 15:07 odesílatel Markus Winand <markus.winand@winand.at>
napsal:

On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:

Hi,

(apologies for not replying to this thread sooner)

On Tue, May 28, 2024 at 6:57 PM Pavel Stehule <pavel.stehule@gmail.com>

wrote:

út 28. 5. 2024 v 11:29 odesílatel Markus Winand <

markus.winand@winand.at> napsal:

Hi!

I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.

1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.

I remember a talk about this subject years ago. Originally the

JSON_QUERY was designed in similar like Oracle, and casting to jsonb was
done inside. If I remember this behave depends on the fact, so old SQL/JSON
has not json type and it was based just on processing of plain text. But
Postgres has JSON, and JSONB and then was more logical to use these types.
And because the JSON_QUERY uses these types, and the casting is done before
the execution of the function, then the clause ON ERROR cannot be handled.
Moreover, until soft errors Postgres didn't allow handling input errors in
common functions.

I think so this difference should be mentioned in documentation.

Agree that the documentation needs to be clear about this. I'll update
my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
Functions.

Considering another branch of this thread [1] I think the
"Supported Features” appendix of the docs should mention that as well.

The way I see it is that the standards defines two overloaded
JSON_QUERY functions, of which PostgreSQL will support only one.
In case of valid JSON, the implied CAST makes it look as though
the second variant of these function was supported as well but that
illusion totally falls apart once the JSON is not valid anymore.

I think it affects the following feature IDs:

- T821, Basic SQL/JSON query operators
For JSON_VALUE, JSON_TABLE and JSON_EXISTS
- T828, JSON_QUERY

Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.

It is easy to add the function that accepts text, but when you have
overloaded functions, then varchar or text is the preferred type, and then
the arguments will be casted to text by default instead of json. You can
have one function with argument of type "any", but then the
execution is a little bit slower (outer cast is faster than cast inside
function), and again the Postgres cannot deduce used argument types from
function's argument types.

Probably this can be solved if we introduce a new kind of type, where the
preferred type will be json, or jsonb.

So the problem is in the definition of implementation details about the
mechanism of type deduction (when you use string literal or when you use
string expression).

So now, when you will write json_table(x1 || x2), then and x1 and x2 are of
unknown type, then Postgres can know, so x1 and x2 will be jsonb, but when
there
will be secondary function json_table(text), then Postgres detects problem,
and use preferred type (that is text).

Generally, Postgres supports function overloading and it is working well
between text, int and numeric where constants have different syntax, but
when the constant
literal has the same syntax, there can be problems with hidden casts to
unwanted type, so not overloaded function is ideal. These issues can be
solved in the analysis stage by special code, but again it increases code
complexity.

Show quoted text

[1]
/messages/by-id/CAKFQuwb50BAaj83Np+1O6xe3_T6DO8w2mxtFbgSbbUng+abrqA@mail.gmail.com

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY

ON ERROR) a;

a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

This behavior is a bug and result of an unintentional change that I
made at some point after getting involved with this patch set. So I'm
going to fix this so that the empty results of jsonpath evaluation use
NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present.
Attached a patch to do so.

Tested: works.

Thanks :)

-markus

#14Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Markus Winand (#12)
Re: ON ERROR in json_query and the like

On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote:

On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:

2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY

17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)

As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:

17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---

(1 row)

Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.

I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.

This behavior is a bug and result of an unintentional change that I
made at some point after getting involved with this patch set. So I'm
going to fix this so that the empty results of jsonpath evaluation use
NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present.
Attached a patch to do so.

Tested: works.

Pushed, thanks for testing.

I'll work on the documentation updates that may be needed based on
this and nearby discussion(s).

--
Thanks, Amit Langote

#15jian he
jian.universality@gmail.com
In reply to: Markus Winand (#12)
Re: ON ERROR in json_query and the like

On Mon, Jun 17, 2024 at 9:07 PM Markus Winand <markus.winand@winand.at> wrote:

I think it affects the following feature IDs:

- T821, Basic SQL/JSON query operators
For JSON_VALUE, JSON_TABLE and JSON_EXISTS
- T828, JSON_QUERY

Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.

we have ExecEvalCoerceViaIOSafe, so it's doable.
I tried, but other things break. so it's not super easy i think.

because of eval_const_expressions_mutator, postgres will constantly
evaluate simple const expressions to simplify some expressions.
`SELECT JSON_QUERY('a', '$');`
postgres will try to do the cast coercion from text 'a' to jsonb. but
it will fail, but it's hard to send the cast failed information to
later code,
in ExecInterpExpr. in ExecEvalCoerceViaIOSafe, if we cast coercion
failed, then this function value is zero, isnull is set to true.

`SELECT JSON_QUERY('a', '$');`
will be equivalent to
`SELECT JSON_QUERY(NULL, '$');`

so making one of the next 2 examples to return jsonb 1 would be hard
to implement.
SELECT JSON_QUERY('a', '$' default '1' on empty);
SELECT JSON_QUERY('a', '$' default '1' on error);

--------------------------------------------------------------------------
If the standard says the context_item can be text string (cannot cast
to json successfully). future version we make it happen,
then it should be fine?
because it's like the previous version we are not fully compliant with
standard, now the new version is in full compliance with standard.

#16Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Chapman Flack (#11)
Re: ON ERROR in json_query and the like

Hi,

On Mon, Jun 17, 2024 at 9:47 PM Chapman Flack <jcflack@acm.org> wrote:

On 06/17/24 02:20, Amit Langote wrote:

Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.

17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid

Oracle DB and Db2 (LUW) both return NULL in that case.

I wonder, could prosupport rewriting be used to detect that the first
argument is supplied by a cast, and rewrite the expression to apply the
cast 'softly'? Or would that behavior be too magical?

I don't think prosupport rewriting can be used, because JSON_QUERY().

We could possibly use "runtime coercion" for context_item so that the
coercion errors can be "caught", which is how we coerce the jsonpath
result to the RETURNING type.

For now, I'm inclined to simply document the limitation that errors
when coercing string arguments to json are always thrown.

--
Thanks, Amit Langote

#17Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Markus Winand (#12)
Re: ON ERROR in json_query and the like

Hi,

On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote:

On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:
Agree that the documentation needs to be clear about this. I'll update
my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
Functions.

Considering another branch of this thread [1] I think the
"Supported Features” appendix of the docs should mention that as well.

The way I see it is that the standards defines two overloaded
JSON_QUERY functions, of which PostgreSQL will support only one.
In case of valid JSON, the implied CAST makes it look as though
the second variant of these function was supported as well but that
illusion totally falls apart once the JSON is not valid anymore.

I think it affects the following feature IDs:

- T821, Basic SQL/JSON query operators
For JSON_VALUE, JSON_TABLE and JSON_EXISTS
- T828, JSON_QUERY

Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.

Hmm, I'm starting to think that adding the implied cast to json wasn't
such a great idea after all, because it might mislead the users to
think that JSON parsing is transparent (respects ON ERROR), which is
what you are saying, IIUC.

I'm inclined to push the attached patch which puts back the
restriction to allow only jsonb arguments, asking users to add an
explicit cast if necessary.

--
Thanks, Amit Langote

Attachments:

v1-0001-SQL-JSON-Only-allow-arguments-of-jsonb-type.patchapplication/octet-stream; name=v1-0001-SQL-JSON-Only-allow-arguments-of-jsonb-type.patchDownload+90-66
#18David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Langote (#17)
Re: ON ERROR in json_query and the like

On Thu, Jun 20, 2024 at 2:19 AM Amit Langote <amitlangote09@gmail.com>
wrote:

Hi,

On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at>
wrote:

On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:
Agree that the documentation needs to be clear about this. I'll update
my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
Functions.

Considering another branch of this thread [1] I think the
"Supported Features” appendix of the docs should mention that as well.

The way I see it is that the standards defines two overloaded
JSON_QUERY functions, of which PostgreSQL will support only one.
In case of valid JSON, the implied CAST makes it look as though
the second variant of these function was supported as well but that
illusion totally falls apart once the JSON is not valid anymore.

I think it affects the following feature IDs:

- T821, Basic SQL/JSON query operators
For JSON_VALUE, JSON_TABLE and JSON_EXISTS
- T828, JSON_QUERY

Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.

Hmm, I'm starting to think that adding the implied cast to json wasn't
such a great idea after all, because it might mislead the users to
think that JSON parsing is transparent (respects ON ERROR), which is
what you are saying, IIUC.

Actually, the implied cast is exactly the correct thing to do here - the
issue is that we aren't using the newly added soft errors infrastructure
[1]: d9f7f5d32f201bec61fef8104aafcb77cecb4dcb
error tells us to produce. This seems to be in the realm of doability so
we should try in the interest of being standard conforming. I'd even argue
to make this an open item unless and until the attempt is agreed upon to
have failed (or it succeeds of course).

David J.

[1]: d9f7f5d32f201bec61fef8104aafcb77cecb4dcb

#19Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David G. Johnston (#18)
Re: ON ERROR in json_query and the like

On Fri, Jun 21, 2024 at 1:11 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Thu, Jun 20, 2024 at 2:19 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote:

On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:
Agree that the documentation needs to be clear about this. I'll update
my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
Functions.

Considering another branch of this thread [1] I think the
"Supported Features” appendix of the docs should mention that as well.

The way I see it is that the standards defines two overloaded
JSON_QUERY functions, of which PostgreSQL will support only one.
In case of valid JSON, the implied CAST makes it look as though
the second variant of these function was supported as well but that
illusion totally falls apart once the JSON is not valid anymore.

I think it affects the following feature IDs:

- T821, Basic SQL/JSON query operators
For JSON_VALUE, JSON_TABLE and JSON_EXISTS
- T828, JSON_QUERY

Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.

Hmm, I'm starting to think that adding the implied cast to json wasn't
such a great idea after all, because it might mislead the users to
think that JSON parsing is transparent (respects ON ERROR), which is
what you are saying, IIUC.

Actually, the implied cast is exactly the correct thing to do here - the issue is that we aren't using the newly added soft errors infrastructure [1] to catch the result of that cast and instead produce whatever output on error tells us to produce. This seems to be in the realm of doability so we should try in the interest of being standard conforming.

Soft error handling *was* used for catching cast errors in the very
early versions of this patch (long before I got involved and the
infrastructure you mention got added). It was taken out after Pavel
said [1]/messages/by-id/CAFj8pRCnzO2cnHi5ebXciV=tuGVvAQOW9uPU+DQV1GkL31R=-g@mail.gmail.com that he didn't like producing NULL instead of throwing an
error. Not sure if Pavel's around but it would be good to know why he
didn't like it at the time.

I can look into making that work again, but that is not going to make beta2.

I'd even argue to make this an open item unless and until the attempt is agreed upon to have failed (or it succeeds of course).

OK, adding an open item.

--
Thanks, Amit Langote
[1]: /messages/by-id/CAFj8pRCnzO2cnHi5ebXciV=tuGVvAQOW9uPU+DQV1GkL31R=-g@mail.gmail.com

#20David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Langote (#19)
Re: ON ERROR in json_query and the like

On Thu, Jun 20, 2024 at 5:22 PM Amit Langote <amitlangote09@gmail.com>
wrote:

Soft error handling *was* used for catching cast errors in the very
early versions of this patch (long before I got involved and the
infrastructure you mention got added). It was taken out after Pavel
said [1] that he didn't like producing NULL instead of throwing an
error. Not sure if Pavel's around but it would be good to know why he
didn't like it at the time.

I'm personally in the "make it error" camp but "make it conform to the
standard" is a stronger membership (in general).

I see this note in your linked thread:

By the standard, it is implementation-defined whether JSON parsing errors
should be caught by ON ERROR clause.

Absent someone contradicting that claim I retract my position here and am
fine with failing if these "functions" are supplied with something that
cannot be cast to json. I'd document them like functions that accept json
with the implications that any casting to json happens before the function
is called and thus its arguments do not apply to that step.

Given that we have "expression IS JSON" the ability to hack together a case
expression to get non-erroring behavior exists.

David J.

#21Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David G. Johnston (#20)
#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Amit Langote (#19)
#23Pavel Stehule
pavel.stehule@gmail.com
In reply to: Amit Langote (#21)
#24David G. Johnston
david.g.johnston@gmail.com
In reply to: Pavel Stehule (#23)
#25Markus Winand
markus.winand@winand.at
In reply to: David G. Johnston (#20)
#26Markus Winand
markus.winand@winand.at
In reply to: David G. Johnston (#24)
#27David G. Johnston
david.g.johnston@gmail.com
In reply to: Markus Winand (#26)
#28Markus Winand
markus.winand@winand.at
In reply to: David G. Johnston (#27)
#29Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Markus Winand (#28)
#30Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#29)
#31Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#30)