From 039fd010d631660b1933bcc049ed9d757d297589 Mon Sep 17 00:00:00 2001 From: "David E. Wheeler" Date: Sat, 14 Oct 2023 16:42:58 -0400 Subject: [PATCH v7] Improve boolean predicate JSON Path docs Following up from a suggestion from Tom Lane[1] to improve the documentation of boolean predicate JSON path expressions, please find enclosed a draft patch to do so. It does three things: 1. Converts all of the example path queries to use `jsonb_path_query()` and show the results, to make it clearer what the behaviors are. 2. Replaces the list of deviations from the standards with a new subsection, with each deviation in its own sub-subsection. The regex section is unchanged, but I've greatly expanded the boolean expression JSON path section with examples comparing standard filter expressions and nonstandard boolean predicates. I've also added an exhortation not use boolean expressions with @? or standard path expressions with @@. 3. While converting the modes section to use `jsonb_path_query()` and show the results, I also added an example of strict mode returning an error and a section demonstrating the difference in query results when using lax vs. strict mode. 4. Removes the notes about "only the first value" in the `@@` and `jsonb_path_match()` docs, and noted that they support only predicate JSON path expressions. Also noted that `json_path_exists()` supports only SQL standard path expressions --- in other words, *not* predicate path queries. --- doc/src/sgml/func.sgml | 322 +++++++++++++++++++++++++++++------------ 1 file changed, 227 insertions(+), 95 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7c3e940afe..78d4e89754 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15838,7 +15838,10 @@ table2-mapping boolean - Does JSON path return any item for the specified JSON value? + Does JSON path return any item for the specified JSON value? Use only + SQL-standard JSON path expressions, not + predicate check + expressions. '{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)' @@ -15852,10 +15855,11 @@ table2-mapping boolean - Returns the result of a JSON path predicate check for the - specified JSON value. Only the first item of the result is taken into - account. If the result is not Boolean, then NULL - is returned. + Returns the result of a JSON path predicate check for the specified JSON + value. If the result is not Boolean, then NULL is + returned. Use only with + predicate check + expressions. '{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 2' @@ -16926,7 +16930,9 @@ array w/o UK? | t Checks whether the JSON path returns any item for the specified JSON - value. + value. Use only SQL-standard JSON path expressions, not + predicate check + expressions. If the vars argument is specified, it must be a JSON object, and its fields provide named values to be substituted into the jsonpath expression. @@ -16949,12 +16955,13 @@ array w/o UK? | t boolean - Returns the result of a JSON path predicate check for the specified - JSON value. Only the first item of the result is taken into account. - If the result is not Boolean, then NULL is returned. - The optional vars - and silent arguments act the same as - for jsonb_path_exists. + Returns the result of a JSON path predicate check for the specified JSON + value. If the result is not Boolean, then NULL is + returned. Use only with + predicate check + expressions. The optional vars and + silent arguments act the same as for + jsonb_path_exists. jsonb_path_match('{"a":[1,2,3,4,5]}', 'exists($.a[*] ? (@ >= $min && @ <= $max))', '{"min":2, "max":4}') @@ -16972,8 +16979,12 @@ array w/o UK? | t Returns all JSON items returned by the JSON path for the specified - JSON value. - The optional vars + JSON value. For SQL-standard JSON path expressions it returns the JSON + values selected from target. For + predicate check + Path expressions it returns the result of the predicate check: + true, false, or + null. The optional vars and silent arguments act the same as for jsonb_path_exists. @@ -17203,9 +17214,12 @@ array w/o UK? | t For example, suppose you have some JSON data from a GPS tracker that you - would like to parse, such as: + would like to parse, such as this JSON, set up as a + psql + \set variable for use as :'json' + in the examples below: -{ + \set json '{ "track": { "segments": [ { @@ -17220,7 +17234,7 @@ array w/o UK? | t } ] } -} +}' @@ -17228,9 +17242,12 @@ array w/o UK? | t To retrieve the available track segments, you need to use the .key accessor operator to descend through surrounding JSON objects: - -$.track.segments - + +=> select jsonb_path_query(:'json', '$.track.segments'); + jsonb_path_query +------------------------------------------------------------------------------------------------------------------------------------------------------------------- + [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}] + @@ -17238,18 +17255,25 @@ $.track.segments [*] operator. For example, the following path will return the location coordinates for all the available track segments: - -$.track.segments[*].location - + +=> select jsonb_path_query(:'json', '$.track.segments[*].location'); + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] + To return the coordinates of the first segment only, you can specify the corresponding subscript in the [] accessor operator. Recall that JSON array indexes are 0-relative: - -$.track.segments[0].location - + +=> select jsonb_path_query(:'json', '$.track.segments[0].location'); + jsonb_path_query +------------------- + [47.763, 13.4034] + @@ -17258,9 +17282,12 @@ $.track.segments[0].location listed in . Each method name must be preceded by a dot. For example, you can get the size of an array: - -$.track.segments.size() - + +=> select jsonb_path_query(:'json', '$.track.segments.size()'); + jsonb_path_query +------------------ + 2 + More examples of using jsonpath operators and methods within path expressions appear below in . @@ -17301,9 +17328,12 @@ $.track.segments.size() For example, suppose you would like to retrieve all heart rate values higher than 130. You can achieve this using the following expression: - -$.track.segments[*].HR ? (@ > 130) - + +=> select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ > 130)'); + jsonb_path_query +------------------ + 135 + @@ -17311,65 +17341,108 @@ $.track.segments[*].HR ? (@ > 130) filter out irrelevant segments before returning the start times, so the filter expression is applied to the previous step, and the path used in the condition is different: - -$.track.segments[*] ? (@.HR > 130)."start time" - + +=> select jsonb_path_query(:'json', '$.track.segments[*] ? (@.HR > 130)."start time"'); + jsonb_path_query +----------------------- + "2018-10-14 10:39:21" + You can use several filter expressions in sequence, if required. For example, the following expression selects start times of all segments that contain locations with relevant coordinates and high heart rate values: - -$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time" - + +=> select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] < 13.4) ? (@.HR > 130)."start time"'); + jsonb_path_query +----------------------- + "2018-10-14 10:39:21" + Using filter expressions at different nesting levels is also allowed. The following example first filters all segments by location, and then returns high heart rate values for these segments, if available: - -$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130) - + +=> select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] < 13.4).HR ? (@ > 130)'); + jsonb_path_query +------------------ + 135 + You can also nest filter expressions within each other: - -$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size() - + +=> select jsonb_path_query(:'json', '$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()'); + jsonb_path_query +------------------ + 2 + This expression returns the size of the track if it contains any segments with high heart rate values, or an empty sequence otherwise. - - PostgreSQL's implementation of the SQL/JSON path - language has the following deviations from the SQL/JSON standard: - + + Deviations from the SQL Standard + + PostgreSQL's implementation of the SQL/JSON path + language has the following deviations from the SQL/JSON standard. + - - + + Boolean Predicate Check Expressions - A path expression can be a Boolean predicate, although the SQL/JSON - standard allows predicates only in filters. This is necessary for - implementation of the @@ operator. For example, - the following jsonpath expression is valid in - PostgreSQL: - -$.track.segments[*].HR < 70 - - - + As an extension to the SQL standard, a PostgreSQL + path expression can be a Boolean predicate, whereas the SQL standard allows + predicates only in filters. Where SQL standard path expressions return the + relevant contents of the queried JSON value, predicate check expressions + return the three-valued result of the predicate: true, + false, or unknown. Compare this + filter jsonpath expression: + +=> select jsonb_path_query(:'json', '$.track.segments ?(@[*].HR > 130)'); + jsonb_path_query +--------------------------------------------------------------------------------- + {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"} + + To a predicate expression, which returns true + +=> select jsonb_path_query(:'json', '$.track.segments[*].HR > 130'); + jsonb_path_query +------------------ + true + + - - - There are minor differences in the interpretation of regular - expression patterns used in like_regex filters, as - described in . - - - + + + Predicate-only path expressions are necessary for implementation of the + @@ operator (and the + jsonb_path_match function), and should not be used + with the @? operator (or + jsonb_path_exists function). + + + + Conversely, non-predicate jsonpath expressions should not be + used with the @@ operator (or the + jsonb_path_match function). + + + + + + Regular Expression Interpretation + + There are minor differences in the interpretation of regular + expression patterns used in like_regex filters, as + described in . + + + Strict and Lax Modes @@ -17430,40 +17503,99 @@ $.track.segments[*].HR < 70 For example, when querying the GPS data listed above, you can abstract from the fact that it stores an array of segments when using the lax mode: - -lax $.track.segments.location - + +=> select jsonb_path_query(:'json', 'lax $.track.segments.location'); + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] + In the strict mode, the specified path must exactly match the structure of - the queried JSON document to return an SQL/JSON item, so using this - path expression will cause an error. To get the same result as in - the lax mode, you have to explicitly unwrap the + the queried JSON document to return an SQL/JSON item, so using this path + expression will cause an error: + +=> select jsonb_path_query(:'json', 'strict $.track.segments.location'); +ERROR: jsonpath member accessor can only be applied to an object + + To get the same result as in the lax mode, you have to explicitly unwrap the segments array: - -strict $.track.segments[*].location - + +=> select jsonb_path_query(:'json', 'strict $.track.segments[*].location'); + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] + - - The .** accessor can lead to surprising results - when using the lax mode. For instance, the following query selects every - HR value twice: - -lax $.**.HR - - This happens because the .** accessor selects both - the segments array and each of its elements, while - the .HR accessor automatically unwraps arrays when - using the lax mode. To avoid surprising results, we recommend using - the .** accessor only in the strict mode. The - following query selects each HR value just once: - -strict $.**.HR - - + + Suprising Results from Unwrapping + + The unwrapping behavior of lax mode can lead to surprising results. For + instance, the following query using the .** accessor + selects every HR value twice: + + => select jsonb_path_query(:'json', 'lax $.**.HR'); + jsonb_path_query + ------------------ + 73 + 135 + 73 + 135 + + This happens because the .** accessor selects both + the segments array and each of its elements, while + the .HR accessor automatically unwraps arrays when + using the lax mode. To avoid surprising results, we recommend using + the .** accessor only in the strict mode. The + following query selects each HR value just once: + + => select jsonb_path_query(:'json', 'strict $.**.HR'); + jsonb_path_query + ------------------ + 73 + 135 + + + + + The unwrapping of arrays can also lead to unexpected results. Consider this + example, which selects all the location arrays: + +=> select jsonb_path_query(:'json', 'lax $.track.segments[*].location'); + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] +(2 rows) + + As expected it returns the full arrays. But applying a filter expression + causes the arrays to be unwrapped to evaluate each item, returning only the + items that match the expression: + +=> select jsonb_path_query(:'json', 'lax $.track.segments[*].location ?(@[*] > 15)'); + jsonb_path_query +------------------ + 47.763 + 47.706 +(2 rows) + + This despite the fact that the full arrays are selected by the path + expression. Use strict mode to restore selecting the arrays: + +=> select jsonb_path_query(:'json', 'strict $.track.segments[*].location ?(@[*] > 15)'); + jsonb_path_query +------------------- + [47.763, 13.4034] + [47.706, 13.2635] +(2 rows) + + + @@ -18039,7 +18171,7 @@ strict $.**.HR Tests whether a path expression matches at least one SQL/JSON item. Returns unknown if the path expression would result in an error; the second example uses this to avoid a no-such-key error - in strict mode. + in the strict mode. jsonb_path_query('{"x": [1, 2], "y": [2, 4]}', 'strict $.* ? (exists (@ ? (@[*] > 2)))') -- 2.42.0