Patch: Improve Boolean Predicate JSON Path Docs
Hackers,
Following up from a suggestion from Tom Lane[1]/messages/by-id/1229727.1680535592@sss.pgh.pa.us to improve the documentation of boolean predicate JSON path expressions, please find enclosed a draft patch to do so. It does three things:
1. Converts all of the example path queries to use jsonb_path_query() and show the results, to make it clearer what the behaviors are.
2. Replaces the list of deviations from the standards with a new subsection, with each deviation in its own sub-subsection. The regex section is unchanged, but I’ve greatly expanded the boolean expression JSON path section with examples comparing standard filter expressions and nonstandard boolean predicates. I’ve also added an exhortation not use boolean expressions with @? or standard path expressions with @@.
3. While converting the modes section to use jsonb_path_query() and show the results, I also added an example of strict mode returning an error.
Follow-ups I’d like to make:
1. Expand the modes section to show how the types of results can vary depending on the mode, thanks to the flattening. Examples:
david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query
------------------
3
4
5
(3 rows)
david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)');
jsonb_path_query
------------------
[1, 2, 3, 4, 5]
2. Improve the descriptions and examples for @?/jsonb_path_exists() and @@/jsonb_path_match().
Best,
David
Attachments:
jsonpath-pred-docs.patchapplication/octet-stream; name=jsonpath-pred-docs.patch; x-unix-mode=0644Download
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index affd1254bb..295f8ca5c9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17205,7 +17205,7 @@ array w/o UK? | t
For example, suppose you have some JSON data from a GPS tracker that you
would like to parse, such as:
<programlisting>
-{
+ \set json '{
"track": {
"segments": [
{
@@ -17220,7 +17220,7 @@ array w/o UK? | t
}
]
}
-}
+}'
</programlisting>
</para>
@@ -17229,7 +17229,10 @@ array w/o UK? | t
<literal>.<replaceable>key</replaceable></literal> accessor
operator to descend through surrounding JSON objects:
<programlisting>
-$.track.segments
+select jsonb_path_query(:'json'::jsonb, '$.track.segments');
+ jsonb_path_query
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
</programlisting>
</para>
@@ -17239,7 +17242,11 @@ $.track.segments
the following path will return the location coordinates for all
the available track segments:
<programlisting>
-$.track.segments[*].location
+select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].location');
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
</programlisting>
</para>
@@ -17248,7 +17255,10 @@ $.track.segments[*].location
specify the corresponding subscript in the <literal>[]</literal>
accessor operator. Recall that JSON array indexes are 0-relative:
<programlisting>
-$.track.segments[0].location
+select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[0].location');
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
</programlisting>
</para>
@@ -17259,7 +17269,10 @@ $.track.segments[0].location
Each method name must be preceded by a dot. For example,
you can get the size of an array:
<programlisting>
-$.track.segments.size()
+select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.size()');
+ jsonb_path_query
+------------------
+ 2
</programlisting>
More examples of using <type>jsonpath</type> operators
and methods within path expressions appear below in
@@ -17302,7 +17315,10 @@ $.track.segments.size()
For example, suppose you would like to retrieve all heart rate values higher
than 130. You can achieve this using the following expression:
<programlisting>
-$.track.segments[*].HR ? (@ > 130)
+select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR ? (@ > 130)');
+ jsonb_path_query
+------------------
+ 135
</programlisting>
</para>
@@ -17312,7 +17328,10 @@ $.track.segments[*].HR ? (@ > 130)
filter expression is applied to the previous step, and the path used
in the condition is different:
<programlisting>
-$.track.segments[*] ? (@.HR > 130)."start time"
+ select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.HR > 130)."start time"');
+ jsonb_path_query
+-----------------------
+ "2018-10-14 10:39:21"
</programlisting>
</para>
@@ -17321,7 +17340,10 @@ $.track.segments[*] ? (@.HR > 130)."start time"
example, the following expression selects start times of all segments that
contain locations with relevant coordinates and high heart rate values:
<programlisting>
-$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"
+select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"');
+ jsonb_path_query
+-----------------------
+ "2018-10-14 10:39:21"
</programlisting>
</para>
@@ -17330,46 +17352,81 @@ $.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"
The following example first filters all segments by location, and then
returns high heart rate values for these segments, if available:
<programlisting>
-$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)
+select jsonb_path_query(:'json'::jsonb, $.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)');
+ jsonb_path_query
+------------------
+ 135
</programlisting>
</para>
<para>
You can also nest filter expressions within each other:
<programlisting>
-$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()
+select jsonb_path_query(:'json'::jsonb, $.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()');
+ jsonb_path_query
+------------------
+ 2
</programlisting>
This expression returns the size of the track if it contains any
segments with high heart rate values, or an empty sequence otherwise.
</para>
- <para>
- <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
- language has the following deviations from the SQL/JSON standard:
- </para>
+ <sect3 id="devations-from-the-standard">
+ <title>Devaiations from the SQL Standard</title>
+ <para>
+ <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
+ language has the following deviations from the SQL/JSON standard:
+ </para>
- <itemizedlist>
- <listitem>
+ <sect4 id="boolean-predicate-path-expressions">
+ <title>Boolean Predicate Path Expressions</title>
<para>
- A path expression can be a Boolean predicate, although the SQL/JSON
- standard allows predicates only in filters. This is necessary for
- implementation of the <literal>@@</literal> operator. For example,
- the following <type>jsonpath</type> expression is valid in
- <productname>PostgreSQL</productname>:
+ As an extension to the SQL standard, a <productname>PostgreSQL</productname>
+ path expression can be a Boolean predicate, whereas the SQL standard allows
+ predicates only in filters. Where SQL standard path expressions return the
+ relevant contents of the queried JSON value, predicate path expressions
+ return the three-value three-valued result of the predicate:
+ <literal>true</literal>, <literal>false</literal>, or
+ <literal>unknown</literal>. Compare this filter <type>jsonpath</type>
+ exression:
<programlisting>
-$.track.segments[*].HR < 70
+select jsonb_path_query(:'json'::jsonb, '$.track.segments ?(@[*].HR > 130)');
+ jsonb_path_query
+---------------------------------------------------------------------------------
+ {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}
</programlisting>
- </para>
- </listitem>
+ To a predicate expression, which returns <literal>true</literal>
+<programlisting>
+select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR > 130');
+ jsonb_path_query
+------------------
+ true
+</programlisting>
+ </para>
- <listitem>
- <para>
- There are minor differences in the interpretation of regular
- expression patterns used in <literal>like_regex</literal> filters, as
- described in <xref linkend="jsonpath-regular-expressions"/>.
- </para>
- </listitem>
- </itemizedlist>
+ <para>
+ Predicate-only path expressions are necessary for implementation of the
+ <literal>@@</literal> operator (and the
+ <function>jsonb_path_match</function> function), and should not be used
+ with the <literal>@?</literal> operator (or
+ <function>jsonb_path_exists</function> function).
+ </para>
+
+ <para>
+ Conversely, non-predicate <type>jsonpath</type> expressions should not be
+ used with the <literal>@@</literal> operator (or the
+ <function>jsonb_path_match</function> function).
+ </para>
+ </sect4>
+
+ <sect4 id="jsonpath-regular-expression-deviation">
+ <title>Regular Expression Interpretation</title>
+ <para>
+ There are minor differences in the interpretation of regular
+ expression patterns used in <literal>like_regex</literal> filters, as
+ described in <xref linkend="jsonpath-regular-expressions"/>.
+ </para>
+ </sect4>
<sect3 id="strict-and-lax-modes">
<title>Strict and Lax Modes</title>
@@ -17431,18 +17488,30 @@ $.track.segments[*].HR < 70
abstract from the fact that it stores an array of segments
when using the lax mode:
<programlisting>
-lax $.track.segments.location
+ select jsonb_path_query(:'json'::jsonb, 'lax $.track.segments.location');
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
</programlisting>
</para>
<para>
- In the strict mode, the specified path must exactly match the structure of
+ In strict mode, the specified path must exactly match the structure of
the queried JSON document to return an SQL/JSON item, so using this
- path expression will cause an error. To get the same result as in
- the lax mode, you have to explicitly unwrap the
+ path expression will cause an error:
+<programlisting>
+select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.location');
+ERROR: jsonpath member accessor can only be applied to an object
+</programlisting>
+ To get the same result as in the lax mode, you have to explicitly unwrap the
<literal>segments</literal> array:
<programlisting>
-strict $.track.segments[*].location
+select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[*].location');
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
</programlisting>
</para>
@@ -17451,7 +17520,13 @@ strict $.track.segments[*].location
when using the lax mode. For instance, the following query selects every
<literal>HR</literal> value twice:
<programlisting>
-lax $.**.HR
+select jsonb_path_query(:'json'::jsonb, 'lax $.**.HR');
+ jsonb_path_query
+------------------
+ 73
+ 135
+ 73
+ 135
</programlisting>
This happens because the <literal>.**</literal> accessor selects both
the <literal>segments</literal> array and each of its elements, while
@@ -17460,7 +17535,11 @@ lax $.**.HR
the <literal>.**</literal> accessor only in the strict mode. The
following query selects each <literal>HR</literal> value just once:
<programlisting>
-strict $.**.HR
+select jsonb_path_query(:'json'::jsonb, 'strict $.**.HR');
+ jsonb_path_query
+------------------
+ 73
+ 135
</programlisting>
</para>
On Oct 14, 2023, at 16:40, David E. Wheeler <david@justatheory.com> wrote:
Following up from a suggestion from Tom Lane[1] to improve the documentation of boolean predicate JSON path expressions, please find enclosed a draft patch to do so.
And now I see I can’t spell “Deviations”. Will fix along with any other requested revisions. GitHub diff here if you’re into that sort of thing:
https://github.com/postgres/postgres/compare/master...theory:postgres:jsonpath-pred-docs
Best,
David
On 2023-10-14 22:40 +0200, David E. Wheeler write:
Following up from a suggestion from Tom Lane[1] to improve the
documentation of boolean predicate JSON path expressions, please find
enclosed a draft patch to do so.
Thanks for putting this together. See my review at the end.
It does three things:
1. Converts all of the example path queries to use jsonb_path_query()
and show the results, to make it clearer what the behaviors are.
Nice. This really does help to make some sense of it. I checked all
queries and they do work out except for two queries where the path
expression string is not properly quoted (but the intended output is
still correct).
2. Replaces the list of deviations from the standards with a new
subsection, with each deviation in its own sub-subsection. The regex
section is unchanged, but I’ve greatly expanded the boolean expression
JSON path section with examples comparing standard filter expressions
and nonstandard boolean predicates. I’ve also added an exhortation not
use boolean expressions with @? or standard path expressions with @@.
LGTM.
3. While converting the modes section to use jsonb_path_query() and
show the results, I also added an example of strict mode returning an
error.Follow-ups I’d like to make:
1. Expand the modes section to show how the types of results can vary
depending on the mode, thanks to the flattening. Examples:david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query
------------------
3
4
5
(3 rows)david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)');
jsonb_path_query
------------------
[1, 2, 3, 4, 5]2. Improve the descriptions and examples for @?/jsonb_path_exists()
and @@/jsonb_path_match().
+1
My review:
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index affd1254bb..295f8ca5c9 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17205,7 +17205,7 @@ array w/o UK? | t For example, suppose you have some JSON data from a GPS tracker that you would like to parse, such as: <programlisting> -{ + \set json '{
Perhaps make it explicit that the reader must run this in psql in order
to use \set and :'json' in the ensuing samples? Some of the existing
examples already use psql output but they do not rely on any psql
features.
"track": {
"segments": [
{
@@ -17220,7 +17220,7 @@ array w/o UK? | t
}
]
}
-}
+}'
</programlisting>
</para>@@ -17229,7 +17229,10 @@ array w/o UK? | t <literal>.<replaceable>key</replaceable></literal> accessor operator to descend through surrounding JSON objects: <programlisting> -$.track.segments +select jsonb_path_query(:'json'::jsonb, '$.track.segments'); + jsonb_path_query +------------------------------------------------------------------------------------------------------------------------------------------------------------------- + [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}] </programlisting>
This should use <screen>, <userinput>, and <computeroutput> if it shows
a psql session, e.g.:
<screen>
<userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput>
<computeroutput>
jsonb_path_query
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
[{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
</computeroutput>
</screen>
Also the cast to jsonb is not necessary and only adds clutter IMO.
</para>
@@ -17239,7 +17242,11 @@ $.track.segments the following path will return the location coordinates for all the available track segments: <programlisting> -$.track.segments[*].location +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].location'); + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] </programlisting> </para>@@ -17248,7 +17255,10 @@ $.track.segments[*].location specify the corresponding subscript in the <literal>[]</literal> accessor operator. Recall that JSON array indexes are 0-relative: <programlisting> -$.track.segments[0].location +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[0].location'); + jsonb_path_query +------------------- + [47.763, 13.4034] </programlisting> </para>@@ -17259,7 +17269,10 @@ $.track.segments[0].location Each method name must be preceded by a dot. For example, you can get the size of an array: <programlisting> -$.track.segments.size() +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.size()'); + jsonb_path_query +------------------ + 2 </programlisting> More examples of using <type>jsonpath</type> operators and methods within path expressions appear below in @@ -17302,7 +17315,10 @@ $.track.segments.size() For example, suppose you would like to retrieve all heart rate values higher than 130. You can achieve this using the following expression: <programlisting> -$.track.segments[*].HR ? (@ > 130) +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR ? (@ > 130)'); + jsonb_path_query +------------------ + 135 </programlisting> </para>@@ -17312,7 +17328,10 @@ $.track.segments[*].HR ? (@ > 130) filter expression is applied to the previous step, and the path used in the condition is different: <programlisting> -$.track.segments[*] ? (@.HR > 130)."start time" + select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.HR > 130)."start time"'); + jsonb_path_query +----------------------- + "2018-10-14 10:39:21" </programlisting> </para>@@ -17321,7 +17340,10 @@ $.track.segments[*] ? (@.HR > 130)."start time" example, the following expression selects start times of all segments that contain locations with relevant coordinates and high heart rate values: <programlisting> -$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"'); + jsonb_path_query +----------------------- + "2018-10-14 10:39:21" </programlisting> </para>@@ -17330,46 +17352,81 @@ $.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" The following example first filters all segments by location, and then returns high heart rate values for these segments, if available: <programlisting> -$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130) +select jsonb_path_query(:'json'::jsonb, $.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)');
The opening quote is missing from the jsonpath literal.
+ jsonb_path_query +------------------ + 135 </programlisting> </para><para> You can also nest filter expressions within each other: <programlisting> -$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size() +select jsonb_path_query(:'json'::jsonb, $.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()');
Missing opening quote here as well.
+ jsonb_path_query
+------------------
+ 2
</programlisting>
This expression returns the size of the track if it contains any
segments with high heart rate values, or an empty sequence otherwise.
</para>- <para> - <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path - language has the following deviations from the SQL/JSON standard: - </para> + <sect3 id="devations-from-the-standard"> + <title>Devaiations from the SQL Standard</title>
Typo in "deviations" (section ID and title).
+ <para> + <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path + language has the following deviations from the SQL/JSON standard:
The sentence should and in a period when this para is no longer followed
by an item list.
+ </para>
- <itemizedlist> - <listitem> + <sect4 id="boolean-predicate-path-expressions"> + <title>Boolean Predicate Path Expressions</title> <para> - A path expression can be a Boolean predicate, although the SQL/JSON - standard allows predicates only in filters. This is necessary for - implementation of the <literal>@@</literal> operator. For example, - the following <type>jsonpath</type> expression is valid in - <productname>PostgreSQL</productname>: + As an extension to the SQL standard, a <productname>PostgreSQL</productname> + path expression can be a Boolean predicate, whereas the SQL standard allows + predicates only in filters. Where SQL standard path expressions return the + relevant contents of the queried JSON value, predicate path expressions + return the three-value three-valued result of the predicate:
Redundant "three-value" before "three-valued result".
+ <literal>true</literal>, <literal>false</literal>, or + <literal>unknown</literal>. Compare this filter <type>jsonpath</type> + exression: <programlisting> -$.track.segments[*].HR < 70 +select jsonb_path_query(:'json'::jsonb, '$.track.segments ?(@[*].HR > 130)'); + jsonb_path_query +--------------------------------------------------------------------------------- + {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"} </programlisting> - </para> - </listitem> + To a predicate expression, which returns <literal>true</literal> +<programlisting> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR > 130'); + jsonb_path_query +------------------ + true +</programlisting> + </para>- <listitem> - <para> - There are minor differences in the interpretation of regular - expression patterns used in <literal>like_regex</literal> filters, as - described in <xref linkend="jsonpath-regular-expressions"/>. - </para> - </listitem> - </itemizedlist> + <para> + Predicate-only path expressions are necessary for implementation of the + <literal>@@</literal> operator (and the + <function>jsonb_path_match</function> function), and should not be used + with the <literal>@?</literal> operator (or + <function>jsonb_path_exists</function> function). + </para> + + <para> + Conversely, non-predicate <type>jsonpath</type> expressions should not be + used with the <literal>@@</literal> operator (or the + <function>jsonb_path_match</function> function). + </para> + </sect4>
Both paras should be wrapped in a single <note> so that they stand out
from the rest of the text. Maybe even <warning>, but <note> is already
used on this page for things that I'd consider warnings.
+ <sect4 id="jsonpath-regular-expression-deviation"> + <title>Regular Expression Interpretation</title> + <para> + There are minor differences in the interpretation of regular + expression patterns used in <literal>like_regex</literal> filters, as + described in <xref linkend="jsonpath-regular-expressions"/>. + </para> + </sect4>
<sect3 id="devations-from-the-standard"> should be closed here,
otherwise the docs won't build. This can be checked with
`make -C doc/src/sgml check`.
<sect3 id="strict-and-lax-modes"> <title>Strict and Lax Modes</title> @@ -17431,18 +17488,30 @@ $.track.segments[*].HR < 70 abstract from the fact that it stores an array of segments when using the lax mode: <programlisting> -lax $.track.segments.location + select jsonb_path_query(:'json'::jsonb, 'lax $.track.segments.location'); + jsonb_path_query
`git diff --check` shows a couple of lines with trailing whitespace
(mostly psql output).
+------------------- + [47.763, 13.4034] + [47.706, 13.2635] </programlisting> </para><para> - In the strict mode, the specified path must exactly match the structure of + In strict mode, the specified path must exactly match the structure of the queried JSON document to return an SQL/JSON item, so using this - path expression will cause an error. To get the same result as in - the lax mode, you have to explicitly unwrap the + path expression will cause an error: +<programlisting> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.location'); +ERROR: jsonpath member accessor can only be applied to an object +</programlisting> + To get the same result as in the lax mode, you have to explicitly unwrap the <literal>segments</literal> array: <programlisting> -strict $.track.segments[*].location +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[*].location'); + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] </programlisting> </para>@@ -17451,7 +17520,13 @@ strict $.track.segments[*].location when using the lax mode. For instance, the following query selects every <literal>HR</literal> value twice: <programlisting> -lax $.**.HR +select jsonb_path_query(:'json'::jsonb, 'lax $.**.HR'); + jsonb_path_query +------------------ + 73 + 135 + 73 + 135 </programlisting> This happens because the <literal>.**</literal> accessor selects both the <literal>segments</literal> array and each of its elements, while @@ -17460,7 +17535,11 @@ lax $.**.HR the <literal>.**</literal> accessor only in the strict mode. The following query selects each <literal>HR</literal> value just once: <programlisting> -strict $.**.HR +select jsonb_path_query(:'json'::jsonb, 'strict $.**.HR'); + jsonb_path_query +------------------ + 73 + 135 </programlisting> </para>
--
Erik
On Oct 14, 2023, at 19:51, Erik Wienhold <ewie@ewie.name> wrote:
Thanks for putting this together. See my review at the end.
Appreciate the speedy review!
Nice. This really does help to make some sense of it. I checked all
queries and they do work out except for two queries where the path
expression string is not properly quoted (but the intended output is
still correct).
🤦🏻♂️
Follow-ups I’d like to make:
1. Expand the modes section to show how the types of results can vary
depending on the mode, thanks to the flattening. Examples:david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query
------------------
3
4
5
(3 rows)david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)');
jsonb_path_query
------------------
[1, 2, 3, 4, 5]2. Improve the descriptions and examples for @?/jsonb_path_exists()
and @@/jsonb_path_match().+1
I planned to submit these changes in a separate patch, based on Tom Lane’s suggestion[1]/messages/by-id/1229727.1680535592@sss.pgh.pa.us. Would it be preferred to add them to this patch?
Perhaps make it explicit that the reader must run this in psql in order
to use \set and :'json' in the ensuing samples? Some of the existing
examples already use psql output but they do not rely on any psql
features.
Good call, done.
This should use <screen>, <userinput>, and <computeroutput> if it shows
a psql session, e.g.:<screen>
<userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput>
<computeroutput>
jsonb_path_query
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
[{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
</computeroutput>
</screen>
I pokwds around, and it appears the computeroutput bit is used for function output. So I followed the precedent in queries.sgml[2]https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-JOIN and omitted the computeroutput tags but added prompt, e.g.,
<screen>
<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.**.HR');</userinput>
jsonb_path_query
------------------
73
135
</screen>
Also the cast to jsonb is not necessary and only adds clutter IMO.
Right, removed them all in function calls.
+ <para> + Predicate-only path expressions are necessary for implementation of the + <literal>@@</literal> operator (and the + <function>jsonb_path_match</function> function), and should not be used + with the <literal>@?</literal> operator (or + <function>jsonb_path_exists</function> function). + </para> + + <para> + Conversely, non-predicate <type>jsonpath</type> expressions should not be + used with the <literal>@@</literal> operator (or the + <function>jsonb_path_match</function> function). + </para> + </sect4>Both paras should be wrapped in a single <note> so that they stand out
from the rest of the text. Maybe even <warning>, but <note> is already
used on this page for things that I'd consider warnings.
Agreed. Would be good if we could teach these functions and operators to reject path expressions they don’t support.
+ <sect4 id="jsonpath-regular-expression-deviation"> + <title>Regular Expression Interpretation</title> + <para> + There are minor differences in the interpretation of regular + expression patterns used in <literal>like_regex</literal> filters, as + described in <xref linkend="jsonpath-regular-expressions"/>. + </para> + </sect4><sect3 id="devations-from-the-standard"> should be closed here,
otherwise the docs won't build. This can be checked with
`make -C doc/src/sgml check`.
Thanks. That produces a bunch of warnings for postgres.sgml and legal.sgml (and a failure to load the docbook DTD), but func.sgml is clean now.
`git diff --check` shows a couple of lines with trailing whitespace
(mostly psql output).
I must’ve cleaned those after I sent the patch, good now. Updated patch attached, this time created by `git format-patch -v2`.
Best,
David
[1]: /messages/by-id/1229727.1680535592@sss.pgh.pa.us
[2]: https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-JOIN
Attachments:
v2-0001-Improve-boolean-predicate-JSON-Path-docs.patchapplication/octet-stream; name=v2-0001-Improve-boolean-predicate-JSON-Path-docs.patch; x-unix-mode=0644Download
From d0ededc16eee7f879eefe4f726921bee8644b51b Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Sat, 14 Oct 2023 16:42:58 -0400
Subject: [PATCH v2] Improve boolean predicate JSON Path docs
Following up from a suggestion from Tom Lane[1] to improve the
documentation of boolean predicate JSON path expressions, please find
enclosed a draft patch to do so. It does three things:
1. Converts all of the example path queries to use `jsonb_path_query()`
and show the results, to make it clearer what the behaviors are.
2. Replaces the list of deviations from the standards with a new
subsection, with each deviation in its own sub-subsection. The regex
section is unchanged, but I've greatly expanded the boolean
expression JSON path section with examples comparing standard filter
expressions and nonstandard boolean predicates. I've also added an
exhortation not use boolean expressions with @? or standard path
expressions with @@.
3. While converting the modes section to use `jsonb_path_query()` and
show the results, I also added an example of strict mode returning an
error.
---
doc/src/sgml/func.sgml | 224 ++++++++++++++++++++++++++++-------------
1 file changed, 154 insertions(+), 70 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index affd1254bb..a2bfc12312 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17203,9 +17203,12 @@ array w/o UK? | t
<para>
For example, suppose you have some JSON data from a GPS tracker that you
- would like to parse, such as:
+ would like to parse, such as this JSON, set up as a
+ <link linkend="app-psql-meta-command-set"><application>psql</application>
+ <command>\set</command> variable</link> for use as <literal>:'json'</literal>
+ in the examples below:
<programlisting>
-{
+ \set json '{
"track": {
"segments": [
{
@@ -17220,7 +17223,7 @@ array w/o UK? | t
}
]
}
-}
+}'
</programlisting>
</para>
@@ -17228,9 +17231,13 @@ array w/o UK? | t
To retrieve the available track segments, you need to use the
<literal>.<replaceable>key</replaceable></literal> accessor
operator to descend through surrounding JSON objects:
-<programlisting>
-$.track.segments
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput>
+select jsonb_path_query(:'json', '$.track.segments');
+ jsonb_path_query
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
+</screen>
</para>
<para>
@@ -17238,18 +17245,25 @@ $.track.segments
<literal>[*]</literal> operator. For example,
the following path will return the location coordinates for all
the available track segments:
-<programlisting>
-$.track.segments[*].location
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+</screen>
</para>
<para>
To return the coordinates of the first segment only, you can
specify the corresponding subscript in the <literal>[]</literal>
accessor operator. Recall that JSON array indexes are 0-relative:
-<programlisting>
-$.track.segments[0].location
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[0].location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+</screen>
</para>
<para>
@@ -17258,9 +17272,12 @@ $.track.segments[0].location
listed in <xref linkend="functions-sqljson-path-operators"/>.
Each method name must be preceded by a dot. For example,
you can get the size of an array:
-<programlisting>
-$.track.segments.size()
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments.size()');</userinput>
+ jsonb_path_query
+------------------
+ 2
+</screen>
More examples of using <type>jsonpath</type> operators
and methods within path expressions appear below in
<xref linkend="functions-sqljson-path-operators"/>.
@@ -17301,9 +17318,12 @@ $.track.segments.size()
<para>
For example, suppose you would like to retrieve all heart rate values higher
than 130. You can achieve this using the following expression:
-<programlisting>
-$.track.segments[*].HR ? (@ > 130)
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ > 130)');</userinput>
+ jsonb_path_query
+------------------
+ 135
+</screen>
</para>
<para>
@@ -17311,65 +17331,108 @@ $.track.segments[*].HR ? (@ > 130)
filter out irrelevant segments before returning the start times, so the
filter expression is applied to the previous step, and the path used
in the condition is different:
-<programlisting>
-$.track.segments[*] ? (@.HR > 130)."start time"
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.HR > 130)."start time"');</userinput>
+ jsonb_path_query
+-----------------------
+ "2018-10-14 10:39:21"
+</screen>
</para>
<para>
You can use several filter expressions in sequence, if required. For
example, the following expression selects start times of all segments that
contain locations with relevant coordinates and high heart rate values:
-<programlisting>
-$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"');</userinput>
+ jsonb_path_query
+-----------------------
+ "2018-10-14 10:39:21"
+</screen>
</para>
<para>
Using filter expressions at different nesting levels is also allowed.
The following example first filters all segments by location, and then
returns high heart rate values for these segments, if available:
-<programlisting>
-$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)');</userinput>
+ jsonb_path_query
+------------------
+ 135
+</screen>
</para>
<para>
You can also nest filter expressions within each other:
-<programlisting>
-$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()');</userinput>
+ jsonb_path_query
+------------------
+ 2
+</screen>
This expression returns the size of the track if it contains any
segments with high heart rate values, or an empty sequence otherwise.
</para>
- <para>
- <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
- language has the following deviations from the SQL/JSON standard:
- </para>
+ <sect3 id="deviations-from-the-standard">
+ <title>Deviations from the SQL Standard</title>
+ <para>
+ <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
+ language has the following deviations from the SQL/JSON standard.
+ </para>
- <itemizedlist>
- <listitem>
+ <sect4 id="boolean-predicate-path-expressions">
+ <title>Boolean Predicate Path Expressions</title>
<para>
- A path expression can be a Boolean predicate, although the SQL/JSON
- standard allows predicates only in filters. This is necessary for
- implementation of the <literal>@@</literal> operator. For example,
- the following <type>jsonpath</type> expression is valid in
- <productname>PostgreSQL</productname>:
-<programlisting>
-$.track.segments[*].HR < 70
-</programlisting>
- </para>
- </listitem>
+ As an extension to the SQL standard, a <productname>PostgreSQL</productname>
+ path expression can be a Boolean predicate, whereas the SQL standard allows
+ predicates only in filters. Where SQL standard path expressions return the
+ relevant contents of the queried JSON value, predicate path expressions
+ return the three-valued result of the predicate: <literal>true</literal>,
+ <literal>false</literal>, or <literal>unknown</literal>. Compare this
+ filter <type>jsonpath</type> expression:
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments ?(@[*].HR > 130)');</userinput>
+ jsonb_path_query
+---------------------------------------------------------------------------------
+ {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}
+</screen>
+ To a predicate expression, which returns <literal>true</literal>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].HR > 130');</userinput>
+ jsonb_path_query
+------------------
+ true
+</screen>
+ </para>
- <listitem>
- <para>
- There are minor differences in the interpretation of regular
- expression patterns used in <literal>like_regex</literal> filters, as
- described in <xref linkend="jsonpath-regular-expressions"/>.
- </para>
- </listitem>
- </itemizedlist>
+ <note>
+ <para>
+ Predicate-only path expressions are necessary for implementation of the
+ <literal>@@</literal> operator (and the
+ <function>jsonb_path_match</function> function), and should not be used
+ with the <literal>@?</literal> operator (or
+ <function>jsonb_path_exists</function> function).
+ </para>
+
+ <para>
+ Conversely, non-predicate <type>jsonpath</type> expressions should not be
+ used with the <literal>@@</literal> operator (or the
+ <function>jsonb_path_match</function> function).
+ </para>
+ </note>
+ </sect4>
+
+ <sect4 id="jsonpath-regular-expression-deviation">
+ <title>Regular Expression Interpretation</title>
+ <para>
+ There are minor differences in the interpretation of regular
+ expression patterns used in <literal>like_regex</literal> filters, as
+ described in <xref linkend="jsonpath-regular-expressions"/>.
+ </para>
+ </sect4>
+ </sect3>
<sect3 id="strict-and-lax-modes">
<title>Strict and Lax Modes</title>
@@ -17430,40 +17493,61 @@ $.track.segments[*].HR < 70
For example, when querying the GPS data listed above, you can
abstract from the fact that it stores an array of segments
when using the lax mode:
-<programlisting>
-lax $.track.segments.location
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'lax $.track.segments.location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+</screen>
</para>
<para>
- In the strict mode, the specified path must exactly match the structure of
+ In strict mode, the specified path must exactly match the structure of
the queried JSON document to return an SQL/JSON item, so using this
- path expression will cause an error. To get the same result as in
- the lax mode, you have to explicitly unwrap the
+ path expression will cause an error:
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments.location');</userinput>
+ERROR: jsonpath member accessor can only be applied to an object
+</screen>
+ To get the same result as in the lax mode, you have to explicitly unwrap the
<literal>segments</literal> array:
-<programlisting>
-strict $.track.segments[*].location
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[*].location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+</screen>
</para>
<para>
The <literal>.**</literal> accessor can lead to surprising results
when using the lax mode. For instance, the following query selects every
<literal>HR</literal> value twice:
-<programlisting>
-lax $.**.HR
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'lax $.**.HR');</userinput>
+ jsonb_path_query
+------------------
+ 73
+ 135
+ 73
+ 135
+</screen>
This happens because the <literal>.**</literal> accessor selects both
the <literal>segments</literal> array and each of its elements, while
the <literal>.HR</literal> accessor automatically unwraps arrays when
using the lax mode. To avoid surprising results, we recommend using
the <literal>.**</literal> accessor only in the strict mode. The
following query selects each <literal>HR</literal> value just once:
-<programlisting>
-strict $.**.HR
-</programlisting>
+<screen>
+<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.**.HR');</userinput>
+ jsonb_path_query
+------------------
+ 73
+ 135
+</screen>
</para>
-
</sect3>
<sect3 id="functions-sqljson-path-operators">
--
2.42.0
On 2023-10-16 01:04 +0200, David E. Wheeler write:
On Oct 14, 2023, at 19:51, Erik Wienhold <ewie@ewie.name> wrote:
Thanks for putting this together. See my review at the end.
Appreciate the speedy review!
You're welcome.
Follow-ups I’d like to make:
1. Expand the modes section to show how the types of results can vary
depending on the mode, thanks to the flattening. Examples:david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
jsonb_path_query
------------------
3
4
5
(3 rows)david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)');
jsonb_path_query
------------------
[1, 2, 3, 4, 5]2. Improve the descriptions and examples for @?/jsonb_path_exists()
and @@/jsonb_path_match().+1
I planned to submit these changes in a separate patch, based on Tom
Lane’s suggestion[1]. Would it be preferred to add them to this patch?
Your call but I'm not against including it in this patch because it
already touches the modes section.
I pokwds around, and it appears the computeroutput bit is used for
function output. So I followed the precedent in queries.sgml[2] and
omitted the computeroutput tags but added prompt, e.g.,
<screen>
<prompt>=></prompt> <userinput>select jsonb_path_query(:'json', 'strict $.**.HR');</userinput>
jsonb_path_query
------------------
73
135
</screen>
Okay, Not sure what the preferred style is but I saw <userinput> and
<computeroutput> used together in doc/src/sgml/ref/createuser.sgml.
But it's not applied consistently in the rest of the docs.
+ <para> + Predicate-only path expressions are necessary for implementation of the + <literal>@@</literal> operator (and the + <function>jsonb_path_match</function> function), and should not be used + with the <literal>@?</literal> operator (or + <function>jsonb_path_exists</function> function). + </para> + + <para> + Conversely, non-predicate <type>jsonpath</type> expressions should not be + used with the <literal>@@</literal> operator (or the + <function>jsonb_path_match</function> function). + </para> + </sect4>Both paras should be wrapped in a single <note> so that they stand out
from the rest of the text. Maybe even <warning>, but <note> is already
used on this page for things that I'd consider warnings.Agreed. Would be good if we could teach these functions and operators
to reject path expressions they don’t support.
Right, you mentioned that idea in [1]/messages/by-id/BAF11F2D-5EDD-4DBB-87FA-4F35845029AE@justatheory.com (separate types). Not sure what
the best strategy here is but it's likely to break existing queries.
Maybe deprecating unsupported path expressions in the next major release
and changing that to an error in the major release after that.
This can be checked with `make -C doc/src/sgml check`.
Thanks. That produces a bunch of warnings for postgres.sgml and
legal.sgml (and a failure to load the docbook DTD), but func.sgml is
clean now.
Hmm... I get no warnings on 1f89b73c4e. Did you install all tools as
described in [2]https://www.postgresql.org/docs/current/docguide-toolsets.html? The DTD needs to be installed as well.
[1]: /messages/by-id/BAF11F2D-5EDD-4DBB-87FA-4F35845029AE@justatheory.com
[2]: https://www.postgresql.org/docs/current/docguide-toolsets.html
--
Erik
On Oct 15, 2023, at 23:03, Erik Wienhold <ewie@ewie.name> wrote:
Your call but I'm not against including it in this patch because it
already touches the modes section.
Okay, added, let’s just put all our cards on the table. :-)
Agreed. Would be good if we could teach these functions and operators
to reject path expressions they don’t support.Right, you mentioned that idea in [1] (separate types). Not sure what
the best strategy here is but it's likely to break existing queries.
Maybe deprecating unsupported path expressions in the next major release
and changing that to an error in the major release after that.
Well if the functions have a JsonPathItem struct, they can check its type attribute and reject those with a root type that’s a predicate in @? and reject it if it’s not a predicate in @@. Example of checking type here:
This can be checked with `make -C doc/src/sgml check`.
Thanks. That produces a bunch of warnings for postgres.sgml and
legal.sgml (and a failure to load the docbook DTD), but func.sgml is
clean now.Hmm... I get no warnings on 1f89b73c4e. Did you install all tools as
described in [2]? The DTD needs to be installed as well.
Thanks, got it down to one:
postgres.sgml:112: element sect4: validity error : Element sect4 content does not follow the DTD, expecting (sect4info? , (title , subtitle? , titleabbrev?) , (toc | lot | index | glossary | bibliography)* , (((calloutlist | glosslist | bibliolist | itemizedlist | orderedlist | segmentedlist | simplelist | variablelist | caution | important | note | tip | warning | literallayout | programlisting | programlistingco | screen | screenco | screenshot | synopsis | cmdsynopsis | funcsynopsis | classsynopsis | fieldsynopsis | constructorsynopsis | destructorsynopsis | methodsynopsis | formalpara | para | simpara | address | blockquote | graphic | graphicco | mediaobject | mediaobjectco | informalequation | informalexample | informalfigure | informaltable | equation | example | figure | table | msgset | procedure | sidebar | qandaset | task | anchor | bridgehead | remark | highlights | abstract | authorblurb | epigraph | indexterm | beginpage)+ , (refentry* | sect5* | simplesect*)) | refentry+ | sect5+ | simplesect+) , (toc | lot | index | glossary | bibliography)*), got (para para )
&func;
David
Attachments:
v3-0001-Improve-boolean-predicate-JSON-Path-docs.patchapplication/applefile; name=v3-0001-Improve-boolean-predicate-JSON-Path-docs.patchDownload