SQL:2023 JSON simplified accessor support
Hello Hackers,
I’ve attached a patch to start adding SQL:2023 JSON simplified
accessor support. This allows accessing JSON or JSONB fields using dot
notation (e.g., colname.field.field...), similar to composite types.
Currently, PostgreSQL uses nonstandard syntax like colname->x->y for
JSON and JSONB, and colname['blah'] for JSONB. These existing syntaxes
predate the standard. Oracle already supports the standard dot
notation syntax [1]https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/simple-dot-notation-access-to-json-data.html#GUID-7249417B-A337-4854-8040-192D5CEFD576.
The full specification for the JSON simplified accessor format is as
follows:
<JSON simplified accessor> ::=
<value expression primary> <JSON simplified accessor op chain>
<JSON simplified accessor op chain> ::=
<JSON simplified accessor op>
| <JSON simplified accessor op chain> <JSON simplified accessor op>
<JSON simplified accessor op> ::=
<JSON member accessor>
| <JSON wildcard member accessor>
| <JSON array accessor>
| <JSON wildcard array accessor>
| <JSON item method>
I’ve implemented the member and array accessors and attached two
alternative patches:
1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch
enables dot access to JSON object fields and subscript access to
indexed JSON array elements by converting "." and "[]" indirection
into a JSON_QUERY JsonFuncExpr node.
2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This
alternative patch implements dot access to JSON object fields by
transforming the "." indirection into a "->" operator.
The upside of the v1 patch is that it strictly aligns with the SQL
standard, which specifies that the simplified access is equivalent to:
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)
However, the performance of JSON_QUERY might be suboptimal due to
function call overhead. Therefore, I implemented the v2 alternative
using the "->" operator.
There is some uncertainty about the semantics of conditional array
wrappers. Currently, there is at least one subtle difference between
the "->" operator and JSON_QUERY, as shown:
postgres=# select '{"a": 42}'::json->'a';
?column?
----------
42
(1 row)
postgres=# select json_query('{"a": 42}'::json, 'lax $.a' with
conditional array wrapper null on empty null on error);
json_query
------------
[42]: (1 row)
(1 row)
JSON_QUERY encloses the JSON value 42 in brackets, which may be a bug,
as Peter noted [2]/messages/by-id/8022e067-818b-45d3-8fab-6e0d94d03626@eisentraut.org. If there are no other semantic differences, we
could implement simple access without using JSON_QUERY to avoid
function call overhead.
I aim to first enable standard dot notation access to JSON object
fields. Both patches implement this, and I’m also open to alternative
approaches.
For subscripting access to jsonb array elements, jsonb already
supports this via the subscripting handler interface. In the v1 patch,
I added json support using JSON_QUERY, but I can easily adapt this for
the v2 patch using the -> operator. I did not leverage the
subscripting handler interface for json because implementing the
fetch/assign functions for json seems challenging for plain text. Let
me know if you have a different approach in mind.
Finally, I have not implemented wildcard or item method accessors yet
and would appreciate input on their necessity.
[1]: https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/simple-dot-notation-access-to-json-data.html#GUID-7249417B-A337-4854-8040-192D5CEFD576
[2]: /messages/by-id/8022e067-818b-45d3-8fab-6e0d94d03626@eisentraut.org
Attachments:
v2-0001-Transform-JSON-dot-access-to-arrow-operator.txttext/plain; charset=US-ASCII; name=v2-0001-Transform-JSON-dot-access-to-arrow-operator.txtDownload+214-8
v1-0001-Add-JSON-JSONB-simplified-accessor.patchapplication/octet-stream; name=v1-0001-Add-JSON-JSONB-simplified-accessor.patchDownload+283-5
On 29.08.24 18:33, Alexandra Wang wrote:
I’ve implemented the member and array accessors and attached two
alternative patches:1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch
enables dot access to JSON object fields and subscript access to
indexed JSON array elements by converting "." and "[]" indirection
into a JSON_QUERY JsonFuncExpr node.2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This
alternative patch implements dot access to JSON object fields by
transforming the "." indirection into a "->" operator.The upside of the v1 patch is that it strictly aligns with the SQL
standard, which specifies that the simplified access is equivalent to:JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)However, the performance of JSON_QUERY might be suboptimal due to
function call overhead. Therefore, I implemented the v2 alternative
using the "->" operator.
Using the operator approach would also allow taking advantage of
optimizations such as
</messages/by-id/CAKU4AWoqAVya6PBhn+BCbFaBMt3z-2=i5fKO3bW=6HPhbid2Dw@mail.gmail.com>.
There is some uncertainty about the semantics of conditional array
wrappers. Currently, there is at least one subtle difference between
the "->" operator and JSON_QUERY, as shown:
That JSON_QUERY bug has been fixed.
I suggest you rebase both of your patches over this, just to double
check everything. But then I think you can drop the v1 patch and just
submit a new version of v2.
The patch should eventually contain some documentation. It might be
good starting to look for a good spot where to put that documentation.
It might be either near the json types documentation or near the general
qualified identifier syntax, not sure.
Hi Peter,
Thank you so much for helping!
On Mon, Sep 16, 2024 at 12:44 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 29.08.24 18:33, Alexandra Wang wrote:
I’ve implemented the member and array accessors and attached two
alternative patches:1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch
enables dot access to JSON object fields and subscript access to
indexed JSON array elements by converting "." and "[]" indirection
into a JSON_QUERY JsonFuncExpr node.2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This
alternative patch implements dot access to JSON object fields by
transforming the "." indirection into a "->" operator.The upside of the v1 patch is that it strictly aligns with the SQL
standard, which specifies that the simplified access is equivalent to:JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)However, the performance of JSON_QUERY might be suboptimal due to
function call overhead. Therefore, I implemented the v2 alternative
using the "->" operator.Using the operator approach would also allow taking advantage of
optimizations such as
</messages/by-id/CAKU4AWoqAVya6PBhn+BCbFaBMt3z-2=i5fKO3bW=6HPhbid2Dw@mail.gmail.com>.
OK, that makes sense.
There is some uncertainty about the semantics of conditional array
wrappers. Currently, there is at least one subtle difference between
the "->" operator and JSON_QUERY, as shown:That JSON_QUERY bug has been fixed.
I suggest you rebase both of your patches over this, just to double
check everything. But then I think you can drop the v1 patch and just
submit a new version of v2.
Done. I rebased both patches and confirmed they have the same test
outputs. I attached v3, which also adds JSON subscript support on top
of v2.
The patch should eventually contain some documentation. It might be
good starting to look for a good spot where to put that documentation.
It might be either near the json types documentation or near the general
qualified identifier syntax, not sure.
Right, I’m not sure either. A third option, I think, would be to
include it in the JSON Functions and Operators section [1]https://www.postgresql.org/docs/devel/functions-json.html.
[1]: https://www.postgresql.org/docs/devel/functions-json.html
Best,
Alex
Attachments:
v3-0001-Add-JSON-JSONB-simplified-accessor.patchapplication/octet-stream; name=v3-0001-Add-JSON-JSONB-simplified-accessor.patchDownload+342-13
Hi,
I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!
Best,
Alex
Attachments:
v4-0001-Add-JSON-JSONB-simplified-accessor.patchapplication/octet-stream; name=v4-0001-Add-JSON-JSONB-simplified-accessor.patchDownload+347-13
On 2024-09-26 Th 11:45 AM, Alexandra Wang wrote:
Hi,
I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!
This is a really nice feature, and provides a lot of expressive power
for such a small piece of code.
I notice this doesn't seem to work for domains over json and jsonb.
andrew@~=# create domain json_d as json;
CREATE DOMAIN
andrew@~=# create table test_json_dot(id int, test_json json_d);
CREATE TABLE
andrew@~=# insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json;
INSERT 0 1 | |
andrew@~=# select (test_json_dot.test_json).b, json_query(test_json,
'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as
expected from test_json_dot;
ERROR: column notation .b applied to type json_d, which is not a
composite type
LINE 1: select (test_json_dot.test_json).b, json_query(test_json, 'l...
I'm not sure that's a terribly important use case, but we should
probably make it work. If it's a domain we should get the basetype of
the domain. There's some example code in src/backend/utils/adt/jsonfuncs.c
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!
Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want the text output far more often than a JSON scalar.
Best,
David
On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote:
On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want the text output far more often than a JSON scalar.
That would defeat being able to chain these.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On Sep 27, 2024, at 12:07, Andrew Dunstan <andrew@dunslane.net> wrote:
That would defeat being able to chain these.
Not if it’s a different operator. But I’m fine to just keep using ->> at the end of a chain.
D
On Thu, Sep 26, 2024 at 11:45 PM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
Hi,
I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!
the attached patch solves the domain type issue, Andrew mentioned in the thread.
I also added a test case: composite over jsonb domain type,
it still works. for example:
create domain json_d as jsonb;
create type test as (a int, b json_d);
create table t1(a test);
insert into t1 select $$(1,"{""a"": 3, ""key1"": {""c"": ""42""},
""key2"": [11, 12]}") $$;
insert into t1 select $$ (1,"{""a"": 3, ""key1"": {""c"": ""42""},
""key2"": [11, 12, {""x"": [31, 42]}]}") $$;
select (t1.a).b.key2[2].x[1] from t1;
select (t1.a).b.key1.c from t1;
Attachments:
v4-0001-make-simplified-accessor-works-with-domain-ove.no-cfbotapplication/octet-stream; name=v4-0001-make-simplified-accessor-works-with-domain-ove.no-cfbotDownload+97-11
On Fri, Oct 4, 2024 at 7:33 AM jian he <jian.universality@gmail.com> wrote:
On Thu, Sep 26, 2024 at 11:45 PM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:Hi,
I didn’t run pgindent earlier, so here’s the updated version with the
correct indentation. Hope this helps!the attached patch solves the domain type issue, Andrew mentioned in the thread.
I also added a test case: composite over jsonb domain type,
it still works. for example:
create domain json_d as jsonb;
create type test as (a int, b json_d);
create table t1(a test);
insert into t1 select $$(1,"{""a"": 3, ""key1"": {""c"": ""42""},
""key2"": [11, 12]}") $$;
insert into t1 select $$ (1,"{""a"": 3, ""key1"": {""c"": ""42""},
""key2"": [11, 12, {""x"": [31, 42]}]}") $$;select (t1.a).b.key2[2].x[1] from t1;
select (t1.a).b.key1.c from t1;
Thank you so much, Jian, for reviewing the patch and providing a fix!
I’ve integrated your fix into the attached v5 patch. Inspired by your
test case, I discovered another issue with domains over JSON:
top-level JSON array access to a domain over JSON when the domain is a
field of a composite type. Here’s an example:
create domain json_d as json;
create type test as (a int, b json_d);
create table t1(a test);
insert into t1 select $$ (1,"[{""a"": 3}, {""key1"": {""c"": ""42""}},
{""key2"": [11, 12]}]") $$;
select (t1.a).b[0] from t1;
The v5 patch includes the following updates:
- Fixed the aforementioned issue and added more tests covering composite
types with domains, nested domains, and arrays of domains over
JSON/JSONB.
- Refactored the logic for parsing JSON/JSONB object fields by moving it
from ParseFuncOrColumn() to transformIndirection() for improved
readability. The ParseFuncOrColumn() function is already handling both
single-argument function calls and composite types, and it has other
callers besides transformIndirection().
Best,
Alex
Attachments:
v5-0001-Add-JSON-JSONB-simplified-accessor.patchapplication/x-patch; name=v5-0001-Add-JSON-JSONB-simplified-accessor.patchDownload+763-16
On 07.11.24 22:57, Alexandra Wang wrote:
The v5 patch includes the following updates:
- Fixed the aforementioned issue and added more tests covering composite
types with domains, nested domains, and arrays of domains over
JSON/JSONB.- Refactored the logic for parsing JSON/JSONB object fields by moving it
from ParseFuncOrColumn() to transformIndirection() for improved
readability. The ParseFuncOrColumn() function is already handling both
single-argument function calls and composite types, and it has other
callers besides transformIndirection().
This patch implements array subscripting support for the json type, but
it does it in a non-standard way, using
ParseJsonSimplifiedAccessorArrayElement(). This would be better done by
providing a typsubscript function for the json type. This is what jsonb
already has, which is why your patch doesn't need to provide the array
support for jsonb. I suggest you implement the typsubscript support for
the json type (make it a separate patch but you can keep it in this
thread IMO) and remove the custom code from this patch.
A few comments on the tests: The tests look good to me. Good coverage
of weirdly nested types. Results look correct.
+drop table if exists test_json_dot;
This can be omitted, since we know that the table doesn't exist yet.
This code could be written in the more conventional insert ... values
syntax:
+insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json;
+insert into test_json_dot select 1, '{"a": 2, "b": {"c": 42}}'::json;
+insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"},
"d":[11, 12]}'::json;
+insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"},
"d":[{"x": [11, 12]}, {"y": [21, 22]}]}'::json;
Then the ::json casts can also go away.
Also, using a different value for "id" for each row would be more
useful, so that the subsequent tests could then be written like
select id, (test_jsonb_dot.test_jsonb).b from test_jsonb_dot;
so we can see which result corresponds to which input row. Also make
id the primary key in this table.
Also, let's keep the json and the jsonb variants aligned. There are
some small differences, like the test_json_dot table having 4 rows but
the test_jsonb_dot having 3 rows. And the array and wildcard tests in
the opposite order. Not a big deal, but keeping these the same helps
eyeballing the test files.
Maybe add a comment somewhere in this file that you are running the
json_query equivalents to cross-check the semantics of the dot syntax.
Some documentation should be written. This looks like this right place
to start:
https://www.postgresql.org/docs/devel/sql-expressions.html#FIELD-SELECTION
and them maybe some cross-linking between there and the sections on JSON
types and operators.
Hi, hackers.
I have implemented dot notation for jsonb using type subscripting back
in April 2023, but failed post it because I left Postgres Professional
company soon after and have not worked anywhere since, not even had
any interest in programming.
But yesterday I accidentally decided to look what is going on at
commitfests and found this thread. I immediately started to rebase
code from PG16, fixed some bugs, and now I'm ready to present my
version of the patches which is much more complex.
Unfortunately, I probably won't be able to devote that much time to
the patches as before.
Description of the patches:
1. Allow transformation only of a sublist of subscripts
This gives ability to custom subscripting code to consume only the
supported prefix of the subscripts list. Remaining subscripts are
applied to the result of subscription which can be of different data
type.
Example of behavior change for hstore which always consumes only the
first subscript and returns text:
=# select (hstore 'foo=>bar')['foo']['bar'];
-ERROR: hstore allows only one subscript
+ERROR: cannot subscript type text because it does not support subscripting
2. GUC compat_field_notation disables treating of field selection as a
function application (non-standard PG feature) if the column data type
supports subscripting.
Example for function hstore_hash(hstore) (after patch #4 applied):
=# set compat_field_notation = on;
=# select (hstore 'foo=>bar,hstore_hash=>123').hstore_hash; -- function
hstore_hash
------------
-1718799972
=# select (hstore 'foo=>bar,hstore_hash=>123').foo;
foo
-----
bar
=# set compat_field_notation = off; -- default
=# select (hstore 'foo=>bar').hstore_hash; -- missing key
hstore_hash
-------------
=# select (hstore 'foo=>bar,hstore_hash=>123').hstore_hash;
hstore_hash
-------------
123
3. Enable processing of field accessors by generic subscripting code.
Field accessors are represented as String nodes in refupperexprs for
distinguishing from ordinary text subscripts which can be needed for
correct EXPLAIN.
Strings node seem to no longer be a valid expression nodes, so I
had to add quite ugly handling for them in nodeFuncs etc during rebase
to PG18.
4. Implement read-only dot notation for hstore (see example above).
5. Enable processing of .* by generic subscripting (similary to #3)
6. Export jsonPathFromParseResult(): simple refactoring for #7.
7. Implement read-only dot notation for jsonb using jsonpath.
A list of subscripts is converted to jsonpath using
JsonPathParseItem. Non-constant array accessors are replaced with
jsonpath parameters.
TODO:
- wildcard array accessor [*]
- item methods
There is some inconsistency in subscripting in jsonpath, which
supports only arrays with numeric indexes, and existing generic
subscripting for jsonb, which also supports indexing of objects by
string keys.
-- JSON_QUERY(jb, '$.a["b"]')
=# select (jsonb '{"a": {"b": 1}}').a['b'];
a
---
NULL
-- JSON_QUERY(jb, '$.a')['b'] = generic subscripting after JSON_QUERY
=# select ((jsonb '{"a": {"b": 1}}').a)['b'];
a
---
1
So, I think it would be good to enable subscripting of objects by
strings in jsonpath.
8. Extract transformColumnRefInternal(): simple refactoring for #9
9. Enable non-parenthesized column references in dot notation.
I think such patch needs a separate thread.
This feature, required by the SQL standard, is implemented by calling
transformColumnRefInternal() in a loop and passing different prefixes
of the field chain. But I'm not sure how correct this is, I simply
tried to preserve compatibility with existing name resolution rules.
For example, for A.B.C.D.E we will try the following variants of
splitting to column reference and indirections:
(A.B.C.D).E = db A, schema B, table C, column D
(A.B.C).D.E = schema A, table B, column C
(A.B).C.D.E = table A, column B
(A).B.C.D.E is not tried because by the SQL standard column reference
in dot notation should be prefixed with table name. Although it would
be very nice to write "jb.key" instead of "tab.jb.key".
There is also an inconsistency in execution when the accessor chain is
separated by parentheses.
Example:
(jb).a [0] => JSON_QUERY(jb, '$.a[0]')
((jb).a)[0] => JSON_QUERY(jb, '$.a') [0]
=# select (jsonb '[{"a": 1}, {"a": 2}]').a[0];
a
--------
[1, 2]
- jsonpath '$.a' returns two items: 1, 2
- jsonpath '[0]' returns the same items due to auto array wrapping
- items wrapped into array
=# select ((jsonb '[{"a": 1}, {"a": 2}]').a)[0];
a
---
1
- JSON_QUERY(jb, '$.a') returns array [1, 2],
- generic subscript '[0]' extracts its first element
The problem could be solved if we somehow allowed intermediate
jsonpaths to return unwrapped items sequences (like SRF).
Also I think it would be an interesting task to implement the
assignment to JSON using dot notation, but it is unclear what to do
here with .* and [*].
Attachments:
v6-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchtext/x-patch; charset=UTF-8; name=v6-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchDownload+29-20
v6-0002-Add-GUC-compat_field_notation.patchtext/x-patch; charset=UTF-8; name=v6-0002-Add-GUC-compat_field_notation.patchDownload+13-1
v6-0003-Pass-field-accessors-to-generic-subscripting.patchtext/x-patch; charset=UTF-8; name=v6-0003-Pass-field-accessors-to-generic-subscripting.patchDownload+282-76
v6-0004-Implement-read-only-dot-notation-for-hstore.patchtext/x-patch; charset=UTF-8; name=v6-0004-Implement-read-only-dot-notation-for-hstore.patchDownload+33-9
v6-0005-Allow-processing-of-.-by-generic-subscripting.patchtext/x-patch; charset=UTF-8; name=v6-0005-Allow-processing-of-.-by-generic-subscripting.patchDownload+73-33
v6-0006-Export-jsonPathFromParseResult.patchtext/x-patch; charset=UTF-8; name=v6-0006-Export-jsonPathFromParseResult.patchDownload+22-7
v6-0007-Implement-read-only-dot-notation-for-jsonb-using-.patchtext/x-patch; charset=UTF-8; name=v6-0007-Implement-read-only-dot-notation-for-jsonb-using-.patchDownload+754-92
v6-0008-Extract-transformColumnRefInternal.patchtext/x-patch; charset=UTF-8; name=v6-0008-Extract-transformColumnRefInternal.patchDownload+132-107
v6-0009-Enable-non-parenthesized-column-references-in-dot.patchtext/x-patch; charset=UTF-8; name=v6-0009-Enable-non-parenthesized-column-references-in-dot.patchDownload+275-107
Hi,
On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote:
Hi, hackers.
I have implemented dot notation for jsonb using type subscripting back
in April 2023, but failed post it because I left Postgres Professional
company soon after and have not worked anywhere since, not even had
any interest in programming.But yesterday I accidentally decided to look what is going on at
commitfests and found this thread. I immediately started to rebase
code from PG16, fixed some bugs, and now I'm ready to present my
version of the patches which is much more complex.Unfortunately, I probably won't be able to devote that much time to
the patches as before.
Thank you so much, Nikita, for revisiting this topic and sharing your
v6 patches!
Now that we have two solutions, I’d like to summarize our current
options.
In Postgres, there are currently three ways to access json/jsonb
object fields and array elements:
1. '->' operator (Postgres-specific, predates SQL standard):
postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd'
-> 0; -- returns 1
2. jsonb subscripting (not available for the plain json type):
postgres=# select ('{"a": 1, "b": "c", "d": [1, 2,
3]}'::jsonb)['d'][0]; --returns 1
3. json_query() function:
postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2,
3]}', 'lax $.d[0]'); --returns 1
A few weeks ago, I did the following performance benchmarking of the
three approaches:
-- setup:
create table tbl(id int, col1 jsonb);
insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2,
{"j":"vj"}]}]}' from generate_series(1, 100000)i;
-- jsonb_operator.sql
SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl;
-- jsonb_subscripting.sql
SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl;
-- jsonb_path_query.sql
SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl;
# pgbench on my local MacOS machine, using -O3 optimization:
pgbench -n -f XXX.sql postgres -T100
Results (Latency | tps):
"->" operator: 14ms | 68
jsonb subscripting: 17ms | 58
jsonb_path_query() function: 23ms | 43
So performance from best to worst:
"->" operator > jsonb subscripting >> jsonb_path_query() function.
I’m excited to see your implementation of dot notation for jsonb using
type subscripting! This approach rounds out the three possible ways to
implement JSON simplified accessors:
## v1: json_query() implementation
Pros:
- Fully adheres to the SQL standard.
According to the SQL standard, if the JSON simplified accessor <JA> is
not a JSON item method, it is equivalent to a <JSON query>:
JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)
(I’m skipping <JA> that includes a JSON item method, as it is
currently outside the scope of both sets of patches.)
- Easiest to implement
Cons:
- Slow due to function call overhead.
## v2-v5: "->" operator implementation
We initially chose this approach for its performance benefits.
However, while addressing Peter’s feedback on v5, I encountered the
following issue:
-- setup
create table test_json_dot(id serial primary key, test_json json);
insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2,
"b": {"c": 42}}]');
-- problematic query:
test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH
CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from
test_json_dot;
id | b | expected
----+---+-----------------
5 | | [42, {"c": 42}]
(1 row)
This issue arises from the semantic differences between the "->"
operator and json_query’s "lax" mode. One possible workaround is to
redefine the "->" operator and modify its implementation. However, since
the "->" operator has been in use for a long time, such changes would
break backward compatibility.
## v6: jsonb subscription implementation
Nikita's patches pass all my functional test cases, including those
that failed with the previous approach.
Supported formats:
- JSON member accessor
- JSON wildcard member accessor (Not available in v5, so this is also a plus)
- JSON array accessor
Questions:
1. Since Nikita’s patches did not address the JSON data type, and JSON
currently does not support subscripting, should we limit the initial
feature set to JSONB dot-notation for now? In other words, if we aim
to fully support JSON simplified accessors for the plain JSON type,
should we handle support for plain JSON subscripting as a follow-up
effort?
2. I have yet to have a more thorough review of Nikita’s patches.
One area I am not familiar with is the hstore-related changes. How
relevant is hstore to the JSON simplified accessor?
Best,
Alex
On 2024-11-21 Th 3:52 PM, Alexandra Wang wrote:
Hi,
On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote:
Hi, hackers.
I have implemented dot notation for jsonb using type subscripting back
in April 2023, but failed post it because I left Postgres Professional
company soon after and have not worked anywhere since, not even had
any interest in programming.But yesterday I accidentally decided to look what is going on at
commitfests and found this thread. I immediately started to rebase
code from PG16, fixed some bugs, and now I'm ready to present my
version of the patches which is much more complex.Unfortunately, I probably won't be able to devote that much time to
the patches as before.Thank you so much, Nikita, for revisiting this topic and sharing your
v6 patches!Now that we have two solutions, I’d like to summarize our current
options.In Postgres, there are currently three ways to access json/jsonb
object fields and array elements:1. '->' operator (Postgres-specific, predates SQL standard):
postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd'
-> 0; -- returns 12. jsonb subscripting (not available for the plain json type):
postgres=# select ('{"a": 1, "b": "c", "d": [1, 2,
3]}'::jsonb)['d'][0]; --returns 13. json_query() function:
postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2,
3]}', 'lax $.d[0]'); --returns 1A few weeks ago, I did the following performance benchmarking of the
three approaches:-- setup:
create table tbl(id int, col1 jsonb);
insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2,
{"j":"vj"}]}]}' from generate_series(1, 100000)i;-- jsonb_operator.sql
SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl;-- jsonb_subscripting.sql
SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl;-- jsonb_path_query.sql
SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl;# pgbench on my local MacOS machine, using -O3 optimization:
pgbench -n -f XXX.sql postgres -T100Results (Latency | tps):
"->" operator: 14ms | 68
jsonb subscripting: 17ms | 58
jsonb_path_query() function: 23ms | 43So performance from best to worst:
"->" operator > jsonb subscripting >> jsonb_path_query() function.I’m excited to see your implementation of dot notation for jsonb using
type subscripting! This approach rounds out the three possible ways to
implement JSON simplified accessors:## v1: json_query() implementation
Pros:
- Fully adheres to the SQL standard.According to the SQL standard, if the JSON simplified accessor <JA> is
not a JSON item method, it is equivalent to a <JSON query>:JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)(I’m skipping <JA> that includes a JSON item method, as it is
currently outside the scope of both sets of patches.)- Easiest to implement
Cons:
- Slow due to function call overhead.## v2-v5: "->" operator implementation
We initially chose this approach for its performance benefits.
However, while addressing Peter’s feedback on v5, I encountered the
following issue:-- setup
create table test_json_dot(id serial primary key, test_json json);
insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2,
"b": {"c": 42}}]');-- problematic query:
test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH
CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from
test_json_dot;
id | b | expected
----+---+-----------------
5 | | [42, {"c": 42}]
(1 row)This issue arises from the semantic differences between the "->"
operator and json_query’s "lax" mode. One possible workaround is to
redefine the "->" operator and modify its implementation. However, since
the "->" operator has been in use for a long time, such changes would
break backward compatibility.## v6: jsonb subscription implementation
Nikita's patches pass all my functional test cases, including those
that failed with the previous approach.Supported formats:
- JSON member accessor
- JSON wildcard member accessor (Not available in v5, so this is also a plus)
- JSON array accessorQuestions:
1. Since Nikita’s patches did not address the JSON data type, and JSON
currently does not support subscripting, should we limit the initial
feature set to JSONB dot-notation for now? In other words, if we aim
to fully support JSON simplified accessors for the plain JSON type,
should we handle support for plain JSON subscripting as a follow-up
effort?2. I have yet to have a more thorough review of Nikita’s patches.
One area I am not familiar with is the hstore-related changes. How
relevant is hstore to the JSON simplified accessor?
We can't change the way the "->" operator works, as there could well be
uses of it in the field that rely on its current behaviour. But maybe we
could invent a new operator which is compliant with the standard
semantics for dot access, and call that. Then we'd get the best
performance, and also we might be able to implement it for the plain
JSON type. If that proves not possible we can think about not
implementing for plain JSON, but I'd rather not go there until we have to.
I don't think we should be including hstore changes here - we should
just be aiming at implementing the standard for JSON access. hstore
changes if any should be a separate feature. The aren't relevant to JSON
access, although they might use some of the same infrastructure,
depending on implementation.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 21.11.24 23:46, Andrew Dunstan wrote:
Questions:
1. Since Nikita’s patches did not address the JSON data type, and JSON
currently does not support subscripting, should we limit the initial
feature set to JSONB dot-notation for now? In other words, if we aim
to fully support JSON simplified accessors for the plain JSON type,
should we handle support for plain JSON subscripting as a follow-up
effort?2. I have yet to have a more thorough review of Nikita’s patches.
One area I am not familiar with is the hstore-related changes. How
relevant is hstore to the JSON simplified accessor?We can't change the way the "->" operator works, as there could well be
uses of it in the field that rely on its current behaviour. But maybe we
could invent a new operator which is compliant with the standard
semantics for dot access, and call that. Then we'd get the best
performance, and also we might be able to implement it for the plain
JSON type. If that proves not possible we can think about not
implementing for plain JSON, but I'd rather not go there until we have to.
Yes, I think writing a custom operator that is similar to "->" but has
the required semantics is the best way forward. (Maybe it can be just a
function?)
I don't think we should be including hstore changes here - we should
just be aiming at implementing the standard for JSON access. hstore
changes if any should be a separate feature. The aren't relevant to JSON
access, although they might use some of the same infrastructure,
depending on implementation.
In a future version, the operator/function mentioned above could be a
catalogued property of a type, similar to typsubscript. Then you could
also apply this to other types. But let's leave that for later.
If I understand it correctly, Nikita's patch uses the typsubscript
support function to handle both bracket subscripting and dot notation.
I'm not sure if it's right to mix these two together. Maybe I didn't
understand that correctly.
Hi hackers,
On Tue, Nov 26, 2024 at 3:12 AM Peter Eisentraut <peter@eisentraut.org>
wrote:
On 21.11.24 23:46, Andrew Dunstan wrote:
Questions:
1. Since Nikita’s patches did not address the JSON data type, and JSON
currently does not support subscripting, should we limit the initial
feature set to JSONB dot-notation for now? In other words, if we aim
to fully support JSON simplified accessors for the plain JSON type,
should we handle support for plain JSON subscripting as a follow-up
effort?2. I have yet to have a more thorough review of Nikita’s patches.
One area I am not familiar with is the hstore-related changes. How
relevant is hstore to the JSON simplified accessor?We can't change the way the "->" operator works, as there could well be
uses of it in the field that rely on its current behaviour. But maybe we
could invent a new operator which is compliant with the standard
semantics for dot access, and call that. Then we'd get the best
performance, and also we might be able to implement it for the plain
JSON type. If that proves not possible we can think about not
implementing for plain JSON, but I'd rather not go there until we haveto.
Yes, I think writing a custom operator that is similar to "->" but has
the required semantics is the best way forward. (Maybe it can be just a
function?)I don't think we should be including hstore changes here - we should
just be aiming at implementing the standard for JSON access. hstore
changes if any should be a separate feature. The aren't relevant to JSON
access, although they might use some of the same infrastructure,
depending on implementation.In a future version, the operator/function mentioned above could be a
catalogued property of a type, similar to typsubscript. Then you could
also apply this to other types. But let's leave that for later.If I understand it correctly, Nikita's patch uses the typsubscript
support function to handle both bracket subscripting and dot notation.
I'm not sure if it's right to mix these two together. Maybe I didn't
understand that correctly.
I’ve been working on a custom operator-like function to support dot
notation in lax mode for JSONB. However, I realized this approach has
the following drawbacks:
1. Handling both dot notation and bracket subscripting together
becomes complicated, as we still need to consider jsonb’s existing
type subscript functions.
2. Chaining N dot-access operators causes multiple unnecessary
deserialization/serialization cycles: for each operator call, the source
jsonb binary is converted to an in-memory JsonbValue, then the
relevant field is extracted, and finally it’s turned back into a
binary jsonb object. This hurts performance. A direct use of the
jsonpath functions API seems more efficient.
3. Correctly applying lax mode requires different handling for the
first, middle, and last operators, which adds further complexity.
Because of these issues, I took a closer look at Nikita’s patch. His
solution generalizes the existing jsonb typesubscript support function
to handle both bracket subscripting and dot notation. It achieves this
by translating dot notation into a jsonpath expression during
transformation, and then calls JsonPathQuery at execution.
Overall, I find this approach more efficient for chained accessors and
more flexible for future enhancements.
I attached a minimized version of Nikita’s patch (v7):
- The first three patches are refactoring steps that could be squashed
if preferred.
- The last two patches implement dot notation and wildcard access,
respectively.
Changes in this new version:
- Removed code handling hstore, as Andrew pointed out it isn’t
directly relevant to JSON access and should be handled separately.
- Split tests for dot notation and wildcard access.
- Dropped the two patches in v6 that enabled non-parenthesized column
references (per Nikita’s suggestion, this will need its own separate
discussion).
For reference, I’ve also attached the operator-like function approach
in 0001-WIP-Operator-approach-JSONB-dot-notation.txt.
I’d appreciate any feedback and thoughts!
Best,
Alex
Attachments:
0001-WIP-Operator-apporach-JSONB-dot-notation.txttext/plain; charset=UTF-8; name=0001-WIP-Operator-apporach-JSONB-dot-notation.txtDownload+481-16
v7-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchapplication/octet-stream; name=v7-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchDownload+20-15
v7-0004-Implement-read-only-dot-notation-for-jsonb-using-.patchapplication/octet-stream; name=v7-0004-Implement-read-only-dot-notation-for-jsonb-using-.patchDownload+670-92
v7-0005-Allow-processing-of-.-by-generic-subscripting.patchapplication/octet-stream; name=v7-0005-Allow-processing-of-.-by-generic-subscripting.patchDownload+206-95
v7-0003-Export-jsonPathFromParseResult.patchapplication/octet-stream; name=v7-0003-Export-jsonPathFromParseResult.patchDownload+22-7
v7-0002-Pass-field-accessors-to-generic-subscripting.patchapplication/octet-stream; name=v7-0002-Pass-field-accessors-to-generic-subscripting.patchDownload+218-58
Hi,
On Wed, Feb 5, 2025 at 1:20 AM Alexandra Wang <alexandra.wang.oss@gmail.com>
wrote:
I attached a minimized version of Nikita’s patch (v7):
- The first three patches are refactoring steps that could be squashed
if preferred.
- The last two patches implement dot notation and wildcard access,
respectively.Changes in this new version:
- Removed code handling hstore, as Andrew pointed out it isn’t
directly relevant to JSON access and should be handled separately.
- Split tests for dot notation and wildcard access.
- Dropped the two patches in v6 that enabled non-parenthesized column
references (per Nikita’s suggestion, this will need its own separate
discussion).
It appears that the Commitfest app selected the wrong patch. Sorry
about the confusion! I'm reposting the patches to correct this.
Attachments:
v7-0002-Pass-field-accessors-to-generic-subscripting.patchapplication/octet-stream; name=v7-0002-Pass-field-accessors-to-generic-subscripting.patchDownload+218-58
v7-0005-Allow-processing-of-.-by-generic-subscripting.patchapplication/octet-stream; name=v7-0005-Allow-processing-of-.-by-generic-subscripting.patchDownload+206-95
v7-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchapplication/octet-stream; name=v7-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchDownload+20-15
v7-0003-Export-jsonPathFromParseResult.patchapplication/octet-stream; name=v7-0003-Export-jsonPathFromParseResult.patchDownload+22-7
v7-0004-Implement-read-only-dot-notation-for-jsonb-using-.patchapplication/octet-stream; name=v7-0004-Implement-read-only-dot-notation-for-jsonb-using-.patchDownload+670-92
Hello hackers,
I’ve fixed the compilation failure for hstore and updated the patches.
In this version, I’ve further cleaned up the code and added more
comments. I hope this helps!
Summary of changes:
v8-0001 through v8-0005:
Refactoring and preparatory steps for the actual implementation.
v8-0006 (Implement read-only dot notation for JSONB):
I removed the vars field (introduced in v7) from JsonbSubWorkspace
after realizing that JsonPathVariable is not actually needed for
dot-notation.
v8-0007 (Allow wildcard member access for JSONB):
I'm aware that the #if 0 in check_indirection() is not ideal. I
haven’t removed it yet because I’m still reviewing other cases—beyond
our JSONB simplified accessor use case—where this check should remain
strict. I’ll post an additional patch to address this.
Looking forward to comments and feedback!
Thanks,
Alex
Attachments:
v8-0002-Pass-field-accessors-to-generic-subscripting.patchapplication/octet-stream; name=v8-0002-Pass-field-accessors-to-generic-subscripting.patchDownload+105-33
v8-0004-Extract-coerce_jsonpath_subscript.patchapplication/octet-stream; name=v8-0004-Extract-coerce_jsonpath_subscript.patchDownload+78-65
v8-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchapplication/octet-stream; name=v8-0001-Allow-transformation-only-of-a-sublist-of-subscri.patchDownload+28-19
v8-0005-Eanble-String-node-as-field-accessors-in-generic-.patchapplication/octet-stream; name=v8-0005-Eanble-String-node-as-field-accessors-in-generic-.patchDownload+121-28
v8-0006-Implement-read-only-dot-notation-for-jsonb.patchapplication/octet-stream; name=v8-0006-Implement-read-only-dot-notation-for-jsonb.patchDownload+556-30
v8-0003-Export-jsonPathFromParseResult.patchapplication/octet-stream; name=v8-0003-Export-jsonPathFromParseResult.patchDownload+19-5
v8-0007-Allow-wild-card-member-access-for-jsonb.patchapplication/octet-stream; name=v8-0007-Allow-wild-card-member-access-for-jsonb.patchDownload+288-36
Hello hackers,
On Thu, Feb 27, 2025 at 9:46 AM Alexandra Wang <alexandra.wang.oss@gmail.com>
wrote:
Summary of changes:
v8-0001 through v8-0005:
Refactoring and preparatory steps for the actual implementation.v8-0006 (Implement read-only dot notation for JSONB):
I removed the vars field (introduced in v7) from JsonbSubWorkspace
after realizing that JsonPathVariable is not actually needed for
dot-notation.v8-0007 (Allow wildcard member access for JSONB):
I'm aware that the #if 0 in check_indirection() is not ideal. I
haven’t removed it yet because I’m still reviewing other cases—beyond
our JSONB simplified accessor use case—where this check should remain
strict. I’ll post an additional patch to address this.
I made the following minor changes in v9:
- More detailed commit messages
- Additional tests
- Use "?column?" as the column name for trailing .*.
Other than that, the patches remain the same as the previous
version:
0001 - 0005: preparation steps for the actual implementation and do
not change or add new behavior.
0006: jsonb dot notation and sliced subscripting
0007: jsonb wildcard member access
Thanks,
Alex
Attachments:
v9-0003-Export-jsonPathFromParseResult.patchapplication/octet-stream; name=v9-0003-Export-jsonPathFromParseResult.patchDownload+19-5
v9-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-Not.patchapplication/octet-stream; name=v9-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-Not.patchDownload+105-33
v9-0004-Extract-coerce_jsonpath_subscript.patchapplication/octet-stream; name=v9-0004-Extract-coerce_jsonpath_subscript.patchDownload+78-65
v9-0001-Allow-transformation-of-only-a-sublist-of-subscri.patchapplication/octet-stream; name=v9-0001-Allow-transformation-of-only-a-sublist-of-subscri.patchDownload+28-19
v9-0006-Implement-read-only-dot-notation-for-jsonb.patchapplication/octet-stream; name=v9-0006-Implement-read-only-dot-notation-for-jsonb.patchDownload+568-30
v9-0007-Allow-wild-card-member-access-for-jsonb.patchapplication/octet-stream; name=v9-0007-Allow-wild-card-member-access-for-jsonb.patchDownload+299-37
v9-0005-Eanble-String-node-as-field-accessors-in-generic-.patchapplication/octet-stream; name=v9-0005-Eanble-String-node-as-field-accessors-in-generic-.patchDownload+121-28
Hi Alex,
The code comments and the commit messages help a lot when reviewing! Thanks for
the new version.
The code LGTM and check-world is happy. I've also performed some tests and
everything looks good!
Just some minor points about this new version:
## v9-0005
Typo on commit message title
## v9-0006
+ * The following functions create various types of JsonPathParseItem nodes, + * which are used to build JsonPath expressions for jsonb simplified accessor.
Just to avoid misinterpretation I think that we can replace "The following
functions" with "The make_jsonpath_item_* functions" since we can have more
functions in the future that are not fully related with these. Does that make
sense?
--
Matheus Alcantara