Patch bug: Fix jsonpath .* on Arrays
Hackers,
The behavior of the .* jpiAnyKey jsonpath selector seems incorrect.
```
select jsonb_path_query('[1,2,3]', '$.*');
jsonb_path_query
------------------
(0 rows)
select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', '$.*');
jsonb_path_query
------------------
[3, 4, 5]
```
The first example might be expected, since .* is intended for object keys, but the handing of `jpiAnyKey` has a branch for unwrapping arrays. The second example, however, just seems weird: this is .*, not .**.
The attached patch fixes it by passing the next node to `executeAnyItem()` (via `executeItemUnwrapTargetArray()`) and then properly setting `jperOk` when `executeAnyItem()` finds values and there is no current (next) node.
I took this approach given what appears to be the intended behavior or $* on arrays in lax mode. However, I could see an argument that .* should not apply to arrays at all. If so, I can submit a new patch removing the branch that unwraps an array with .*.
Best,
David
Attachments:
0001-Fix-behavior-of-jsonpath-.-on-arrays.patchapplication/octet-stream; name=0001-Fix-behavior-of-jsonpath-.-on-arrays.patch; x-unix-mode=0644Download+55-5
On Tuesday, June 4, 2024, David E. Wheeler <david@justatheory.com> wrote:
Hackers,
The behavior of the .* jpiAnyKey jsonpath selector seems incorrect.
```
select jsonb_path_query('[1,2,3]', '$.*');
jsonb_path_query
------------------
(0 rows)select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', '$.*');
jsonb_path_query
------------------
[3, 4, 5]
```The first example might be expected, since .* is intended for object keys,
but the handing of `jpiAnyKey` has a branch for unwrapping arrays. The
second example, however, just seems weird: this is .*, not .**.
This seems to be working correctly. Lax mode causes the first array level
to unwrap and produce new context item values. Then the wildcard member
accessor is applied to each. Numbers don’t have members so no matches
exist in the first example. The object in the second indeed has a single
member and so matches the wildcard and its value, the array, is returned.
David J.
On Jun 4, 2024, at 12:28 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
This seems to be working correctly. Lax mode causes the first array level to unwrap and produce new context item values. Then the wildcard member accessor is applied to each. Numbers don’t have members so no matches exist in the first example. The object in the second indeed has a single member and so matches the wildcard and its value, the array, is returned.
Oh FFS, unwrapping still breaks my brain. You’re right, of course. Here’s a new patch that demonstrates that behavior, since that code path is not currently represented in tests AFAICT (I would have expected to have broken it with this patch).
D
Attachments:
0001-Add-tests-for-jsonpath-.-on-arrays.patchapplication/octet-stream; name=0001-Add-tests-for-jsonpath-.-on-arrays.patch; x-unix-mode=0644Download+62-3
On Jun 4, 2024, at 20:45, David E. Wheeler <david@justatheory.com> wrote:
Here’s a new patch that demonstrates that behavior, since that code path is not currently represented in tests AFAICT (I would have expected to have broken it with this patch).
Commitfest link:
https://commitfest.postgresql.org/48/5017/
D
On Jun 4, 2024, at 20:45, David E. Wheeler <david@justatheory.com> wrote:
Oh FFS, unwrapping still breaks my brain. You’re right, of course. Here’s a new patch that demonstrates that behavior, since that code path is not currently represented in tests AFAICT (I would have expected to have broken it with this patch).
Rebased and moved the new tests to the end of the file.
D
Attachments:
v2-0001-Add-tests-for-jsonpath-.-on-arrays.patchapplication/octet-stream; name=v2-0001-Add-tests-for-jsonpath-.-on-arrays.patch; x-unix-mode=0644Download+62-3
On Jun 7, 2024, at 10:23, David E. Wheeler <david@justatheory.com> wrote:
Rebased and moved the new tests to the end of the file.
Bah, sorry, that was the previous patch. Here’s v3.
D
Attachments:
v3-0001-Add-tests-for-jsonpath-.-on-arrays.patchapplication/octet-stream; name=v3-0001-Add-tests-for-jsonpath-.-on-arrays.patch; x-unix-mode=0644Download+64-3
Вторник, 25 июня 2024, 11:17 +07:00 от David E. Wheeler <david@justatheory.com>:
On Jun 7, 2024, at 10:23, David E. Wheeler < david@justatheory.com > wrote:
Rebased and moved the new tests to the end of the file.
Bah, sorry, that was the previous patch. Here’s v3.
D
Hi! Looks good to me, but I have several comments.
Your patch improves tests, but why did you change formatting in jsonpath_exec.c? What's the motivation?
[1]: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*'); I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode is set by default. Odd empty result for the test: select jsonb '[1,2,3,{"b": [3,4,5]}]' @? 'strict $.*'; I expected an error like in test [1]. This behavior is not obvious to me. Everything else is cool. Thanks to the patch and the discussion above, I began to understand better how wildcards in JSON work. Best regards, Stepan Neretin.
I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode is set by default.
Odd empty result for the test: select jsonb '[1,2,3,{"b": [3,4,5]}]' @? 'strict $.*';
I expected an error like in test [1]select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*'); I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode is set by default. Odd empty result for the test: select jsonb '[1,2,3,{"b": [3,4,5]}]' @? 'strict $.*'; I expected an error like in test [1]. This behavior is not obvious to me. Everything else is cool. Thanks to the patch and the discussion above, I began to understand better how wildcards in JSON work. Best regards, Stepan Neretin. . This behavior is not obvious to me.
Everything else is cool. Thanks to the patch and the discussion above, I began to understand better how wildcards in JSON work.
Best regards, Stepan Neretin.
On Jun 25, 2024, at 12:46 AM, Степан Неретин <fenixrnd@mail.ru> wrote:
Hi! Looks good to me, but I have several comments.
Thanks for your review!
Your patch improves tests, but why did you change formatting in jsonpath_exec.c? What's the motivation?
It’s not just formatting. From the commit message:
While at it, teach `executeAnyItem()` to return `jperOk` when `found`
exist, not because it will be used (the result and `found` are inspected
by different functions), but because it seems like the proper thing to
return from `executeAnyItem()` when considered in isolation.
I have since realized it’s not a complete fix for the issue, and hacked around it in my Go version. Would be fine to remove that bit, but IIRC this was the only execution function that would return `jperNotFound` when it in fact adds items to the `found` list. The current implementation only looks at one or the other, so it’s not super important, but I found the inconsistency annoying and sometimes confusing.
[1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*');
I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode is set by default.
Very few of the other tests do so; I can add it if it’s important for this case, though.
Odd empty result for the test: select jsonb '[1,2,3,{"b": [3,4,5]}]' @? 'strict $.*';
I expected an error like in test [1]. This behavior is not obvious to me.
@? suppresses a number of errors. Perhaps I should add a variant of the error-raising query that passes the silent arg, which would also suppress the error.
Everything else is cool. Thanks to the patch and the discussion above, I began to understand better how wildcards in JSON work.
Yeah, it’s kind of wild TBH.
Best,
David
On Jun 25, 2024, at 13:48, David E. Wheeler <david@justatheory.com> wrote:
I have since realized it’s not a complete fix for the issue, and hacked around it in my Go version. Would be fine to remove that bit, but IIRC this was the only execution function that would return `jperNotFound` when it in fact adds items to the `found` list. The current implementation only looks at one or the other, so it’s not super important, but I found the inconsistency annoying and sometimes confusing.
I’ve removed this change.
[1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*');
I propose adding a similar test with explicitly specified lax mode: select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what lax mode is set by default.Very few of the other tests do so; I can add it if it’s important for this case, though.
Went ahead and added lax.
@? suppresses a number of errors. Perhaps I should add a variant of the error-raising query that passes the silent arg, which would also suppress the error.
Added a variant where the silent param suppresses the error, too.
V2 attached and the PR updated:
https://github.com/theory/postgres/pull/4/files
Best,
David
Attachments:
v2-0001-Add-tests-for-jsonpath-.-on-arrays.patchapplication/octet-stream; name=v2-0001-Add-tests-for-jsonpath-.-on-arrays.patch; x-unix-mode=0644Download+67-2
On Thu, Jun 27, 2024 at 1:16 AM David E. Wheeler <david@justatheory.com>
wrote:
On Jun 25, 2024, at 13:48, David E. Wheeler <david@justatheory.com> wrote:
I have since realized it’s not a complete fix for the issue, and hacked
around it in my Go version. Would be fine to remove that bit, but IIRC this
was the only execution function that would return `jperNotFound` when it in
fact adds items to the `found` list. The current implementation only looks
at one or the other, so it’s not super important, but I found the
inconsistency annoying and sometimes confusing.I’ve removed this change.
[1] select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'strict $.*');
I propose adding a similar test with explicitly specified lax mode:select jsonb_path_query('[1,2,3,{"b": [3,4,5]}]', 'lax $.*'); to show what
lax mode is set by default.Very few of the other tests do so; I can add it if it’s important for
this case, though.
Went ahead and added lax.
@? suppresses a number of errors. Perhaps I should add a variant of the
error-raising query that passes the silent arg, which would also suppress
the error.Added a variant where the silent param suppresses the error, too.
V2 attached and the PR updated:
https://github.com/theory/postgres/pull/4/files
Best,
David
HI! Now it looks good for me.
Best regards, Stepan Neretin.
On Thu, Jun 27, 2024 at 11:53:14AM +0700, Stepan Neretin wrote:
HI! Now it looks good for me.
The tests of jsonb_jsonpath.sql include a lot of patterns for @? and
jsonb_path_query with the lax and strict modes, so shouldn't these
additions be grouped closer to the existing tests rather than added at
the end of the file?
--
Michael
On Jun 27, 2024, at 04:17, Michael Paquier <michael@paquier.xyz> wrote:
The tests of jsonb_jsonpath.sql include a lot of patterns for @? and
jsonb_path_query with the lax and strict modes, so shouldn't these
additions be grouped closer to the existing tests rather than added at
the end of the file?
I think you could argue that they should go with other tests for array unwrapping, though it’s kind of mixed throughout. But that’s more the bit I was testing; almost all the tests are lax, and it’s less the strict behavior to test here than the explicit behavior of unwrapping in lax mode.
But ultimately I don’t care where they go, just that we have them.
D
On Jun 27, 2024, at 04:17, Michael Paquier <michael@paquier.xyz> wrote:
The tests of jsonb_jsonpath.sql include a lot of patterns for @? and
jsonb_path_query with the lax and strict modes, so shouldn't these
additions be grouped closer to the existing tests rather than added at
the end of the file?
I’ve moved them closer to other tests for unwrapping behavior in the attached updated and rebased v3 patch.
Best,
David
Attachments:
v3-0001-Add-tests-for-jsonpath-.-on-arrays.patchapplication/octet-stream; name=v3-0001-Add-tests-for-jsonpath-.-on-arrays.patch; x-unix-mode=0644Download+67-2
On Mon, Jul 8, 2024 at 11:09 PM David E. Wheeler <david@justatheory.com>
wrote:
On Jun 27, 2024, at 04:17, Michael Paquier <michael@paquier.xyz> wrote:
The tests of jsonb_jsonpath.sql include a lot of patterns for @? and
jsonb_path_query with the lax and strict modes, so shouldn't these
additions be grouped closer to the existing tests rather than added at
the end of the file?I’ve moved them closer to other tests for unwrapping behavior in the
attached updated and rebased v3 patch.Best,
David
Hi! Looks good to me now! Please, register a patch in CF.
Best regards, Stepan Neretin.
On Jul 15, 2024, at 07:07, Stepan Neretin <sncfmgg@gmail.com> wrote:
Hi! Looks good to me now! Please, register a patch in CF.
Best regards, Stepan Neretin.
It’s here:
https://commitfest.postgresql.org/48/5017/
Best,
David
On Mon, Jul 15, 2024 at 10:29:32AM -0400, David E. Wheeler wrote:
It’s here:
Sorry for the delay. Finally came back to it, and applied the tests.
Thanks!
It was fun to see that HEAD was silenced with the first patch of this
thread that tweaked the behavior with arrays.
Regarding the comments, I have left them out for now. That may be a
good start, but it also feels like we should do a much better job
overall with the area of jsonpath_exec.c. One thing that may help as
a start is to reorganize the routines of the file and divide them into
sub-categories, or even go through a deeper refactoring to help
readers go through the existing 4.5k lines of code that are in this
single file..
--
Michael
On Jul 19, 2024, at 01:42, Michael Paquier <michael@paquier.xyz> wrote:
Sorry for the delay. Finally came back to it, and applied the tests.
Thanks!
Awesome, thank you!
It was fun to see that HEAD was silenced with the first patch of this
thread that tweaked the behavior with arrays.
Uh, what? Sorry I don’t follow.
Regarding the comments, I have left them out for now. That may be a
good start, but it also feels like we should do a much better job
overall with the area of jsonpath_exec.c.
I put them in because it took me a bit to track down that they were among the implementors of JsonPathGetVarCallback as I was porting the code.
One thing that may help as
a start is to reorganize the routines of the file and divide them into
sub-categories, or even go through a deeper refactoring to help
readers go through the existing 4.5k lines of code that are in this
single file..
After I got all the tests passing in the port to Go, I split it up into 14 implementation and 15 test files (with all of jsonb_jsonpath.(sql.out) in one file). Was much easier to reason about that way.
https://github.com/theory/sqljson/tree/main/path/exec
Best,
David
On Fri, Jul 19, 2024 at 09:49:50AM -0400, David E. Wheeler wrote:
It was fun to see that HEAD was silenced with the first patch of this
thread that tweaked the behavior with arrays.Uh, what? Sorry I don’t follow.
What I mean is that the main regression test suite did not complain on
your original patch posted here:
/messages/by-id/A95346F9-6147-46E0-809E-532A485D71D6@justatheory.com
But the new tests showed a difference, and that's enough for me to see
value in the new tests.
--
Michael
On Jul 21, 2024, at 20:54, Michael Paquier <michael@paquier.xyz> wrote:
What I mean is that the main regression test suite did not complain on
your original patch posted here:
/messages/by-id/A95346F9-6147-46E0-809E-532A485D71D6@justatheory.comBut the new tests showed a difference, and that's enough for me to see
value in the new tests.
Oh, got it, nice!
Thanks,
David