Q: Escapes in jsonpath Idents
Hackers,
The jsonpath doc[1]https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH has an excellent description of the format of strings, but for unquoted path keys, it simply says:
Member accessor that returns an object member with the specified key. If the key name matches some named variable starting with $ or does not meet the JavaScript rules for an identifier, it must be enclosed in double quotes to make it a string literal.
I went looking for the JavaScript rules for an identifier and found this in the MDN docs[2]https://developer.mozilla.org/en-US/docs/Glossary/Identifier:
In JavaScript, identifiers can contain Unicode letters, $, _, and digits (0-9), but may not start with a digit. An identifier differs from a string in that a string is data, while an identifier is part of the code. In JavaScript, there is no way to convert identifiers to strings, but sometimes it is possible to parse strings into identifiers.
However, the Postgres parsing of jsonpath keys appears to follow the same rules as strings, allowing backslash escapes:
david=# select '$.fo\u00f8 == $x'::jsonpath;
jsonpath -------------------
($."foø" == $"x")
This would seem to contradict the documentation. Is this behavior required by the SQL standard? Do the docs need updating? Or should the code actually follow the JSON identifier behavior?
Thanks,
David
PS: Those excellent docs on strings mentions support for \v, but the grammar in the right nav of https://www.json.org/json-en.html does not. Another bonus feature?
[1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH
[2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier
On Mar 16, 2024, at 14:39, David E. Wheeler <david@justatheory.com> wrote:
I went looking for the JavaScript rules for an identifier and found this in the MDN docs[2]:
In JavaScript, identifiers can contain Unicode letters, $, _, and digits (0-9), but may not start with a digit. An identifier differs from a string in that a string is data, while an identifier is part of the code. In JavaScript, there is no way to convert identifiers to strings, but sometimes it is possible to parse strings into identifiers.
Coda: Dollar signs don’t work at all outside double-quoted string identifiers:
david=# select '$.$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
^
david=# select '$.f$oo'::jsonpath;
ERROR: syntax error at or near "$oo" of jsonpath input
LINE 1: select '$.f$oo'::jsonpath;
^
david=# select '$."$foo"'::jsonpath;
jsonpath
----------
$."$foo"
This, too, contradicts the MDM definition an identifier (and some quick browser tests).
Best,
David
Hi David,
On 2024-03-16 19:39 +0100, David E. Wheeler wrote:
The jsonpath doc[1] has an excellent description of the format of
strings, but for unquoted path keys, it simply says:Member accessor that returns an object member with the specified
key. If the key name matches some named variable starting with $ or
does not meet the JavaScript rules for an identifier, it must be
enclosed in double quotes to make it a string literal.I went looking for the JavaScript rules for an identifier and found
this in the MDN docs[2]:In JavaScript, identifiers can contain Unicode letters, $, _, and
digits (0-9), but may not start with a digit. An identifier differs
from a string in that a string is data, while an identifier is part
of the code. In JavaScript, there is no way to convert identifiers
to strings, but sometimes it is possible to parse strings into
identifiers.However, the Postgres parsing of jsonpath keys appears to follow the
same rules as strings, allowing backslash escapes:david=# select '$.fo\u00f8 == $x'::jsonpath;
jsonpath -------------------
($."foø" == $"x")This would seem to contradict the documentation. Is this behavior
required by the SQL standard? Do the docs need updating? Or should the
code actually follow the JSON identifier behavior?
That quoted MDN page does not give the whole picture. ECMAScript and JS
do allow Unicode escape sequences in identifier names:
https://262.ecma-international.org/#sec-identifier-names
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers
PS: Those excellent docs on strings mentions support for \v, but the
grammar in the right nav of https://www.json.org/json-en.html does
not. Another bonus feature?
You refer to that sentence: "Other special backslash sequences include
those recognized in JSON strings: \b, \f, \n, \r, \t, \v for various
ASCII control characters, and \uNNNN for a Unicode character identified
by its 4-hex-digit code point."
Mentioning JSON and \v in the same sentence is wrong: JavaScript allows
that escape in strings but JSON doesn't. I think the easiest is to just
replace "JSON" with "JavaScript" in that sentence to make it right. The
paragraph also already says "embedded string literals follow JavaScript/
ECMAScript conventions", so mentioning JSON seems unnecessary to me.
The last sentence also mentions backslash escapes \xNN and \u{N...} as
deviations from JSON when in fact those are valid escape sequences from
ECMA-262: https://262.ecma-international.org/#prod-HexEscapeSequence
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely. The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.
[1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH
[2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier
On 2024-03-16 21:33 +0100, David E. Wheeler wrote:
On Mar 16, 2024, at 14:39, David E. Wheeler <david@justatheory.com>
wrote:I went looking for the JavaScript rules for an identifier and found
this in the MDN docs[2]:In JavaScript, identifiers can contain Unicode letters, $, _, and
digits (0-9), but may not start with a digit. An identifier differs
from a string in that a string is data, while an identifier is part
of the code. In JavaScript, there is no way to convert identifiers
to strings, but sometimes it is possible to parse strings into
identifiers.Coda: Dollar signs don’t work at all outside double-quoted string
identifiers:david=# select '$.$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
^david=# select '$.f$oo'::jsonpath;
ERROR: syntax error at or near "$oo" of jsonpath input
LINE 1: select '$.f$oo'::jsonpath;
^david=# select '$."$foo"'::jsonpath;
jsonpath
----------
$."$foo"This, too, contradicts the MDM definition an identifier (and some
quick browser tests).
The first case ($.$foo) is in line with the restriction on member
accessors that you quoted first.
The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor. Looks like a bug to me because a variable
doesn't even make sense in that place.
What works though, besides double quoting, is escaping the dollar sign:
regress=# select '$.\u0024foo'::jsonpath;
jsonpath
----------
$."$foo"
(1 row)
And we've come full circle :)
--
Erik
Attachments:
v1-0001-Simplify-docs-on-backslash-escapes-in-jsonpath.patchtext/plain; charset=us-asciiDownload
From a2bade71867aecbea90c7c03f0295cecca0c215d Mon Sep 17 00:00:00 2001
From: Erik Wienhold <ewie@ewie.name>
Date: Sun, 17 Mar 2024 19:28:07 +0100
Subject: [PATCH v1] Simplify docs on backslash escapes in jsonpath
The paragraph describing the JavaScript string literals allowed in
jsonpath expressions is not ideal: it unnecessarily mentions JSON by
erroneously listing \v as allowed by JSON and mentioning the \xNN and
\u{N...} backslash escapes as deviations from JSON when in fact both are
accepted by ECMAScript/JavaScript. Fix this by only referring to
JavaScript and with a bulleted list of backslash escapes to make it more
readable.
---
doc/src/sgml/json.sgml | 47 ++++++++++++++++++++++++++++--------------
1 file changed, 32 insertions(+), 15 deletions(-)
diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 1dbb9606e9..1ec4b90abb 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -803,21 +803,38 @@ UPDATE table_name SET jsonb_field[1]['a'] = '1';
In particular, the way to write a double quote within an embedded string
literal is <literal>\"</literal>, and to write a backslash itself, you
must write <literal>\\</literal>. Other special backslash sequences
- include those recognized in JSON strings:
- <literal>\b</literal>,
- <literal>\f</literal>,
- <literal>\n</literal>,
- <literal>\r</literal>,
- <literal>\t</literal>,
- <literal>\v</literal>
- for various ASCII control characters, and
- <literal>\u<replaceable>NNNN</replaceable></literal> for a Unicode
- character identified by its 4-hex-digit code point. The backslash
- syntax also includes two cases not allowed by JSON:
- <literal>\x<replaceable>NN</replaceable></literal> for a character code
- written with only two hex digits, and
- <literal>\u{<replaceable>N...</replaceable>}</literal> for a character
- code written with 1 to 6 hex digits.
+ include those recognized in JavaScript strings:
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>\b</literal>,
+ <literal>\f</literal>,
+ <literal>\n</literal>,
+ <literal>\r</literal>,
+ <literal>\t</literal>,
+ <literal>\v</literal>
+ for various ASCII control characters
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>\x<replaceable>NN</replaceable></literal>
+ for a character code written with 2 hex digits
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>\u<replaceable>NNNN</replaceable></literal>
+ for a Unicode code point written with 4 hex digits
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>\u{<replaceable>N...</replaceable>}</literal>
+ for a Unicode code point written with 1 to 6 hex digits
+ </para>
+ </listitem>
+ </itemizedlist>
</para>
<para>
--
2.44.0
On Mar 17, 2024, at 15:12, Erik Wienhold <ewie@ewie.name> wrote:
Hi David,
Hey Erik. Thanks for the detailed reply and patch!
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely. The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.
Ah, it’s JavaScript format, not JSON! This does clarify things quite nicely, thank you. Happy to add my review once it’s in a commit fest.
The first case ($.$foo) is in line with the restriction on member
accessors that you quoted first.
Huh, that’s now how I read it. Here it is again:
Member accessor that returns an object member with the specified
key. If the key name matches some named variable starting with $ or
does not meet the JavaScript rules for an identifier, it must be
enclosed in double quotes to make it a string literal.
Note that in my example `$foo` does not match a variable. I mean it looks like a variable, but none is used here. I guess it’s being conservative because it might be used in one of the functions, like jsonb_path_exists(), to which variables might be passed.
The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor. Looks like a bug to me because a variable
doesn't even make sense in that place.
Right. Maybe the docs should be updated to say that a literal dollar sign isn’t supported in identifiers, unlike in JavaScript, except through escapes like this:
What works though, besides double quoting, is escaping the dollar sign:
regress=# select '$.\u0024foo'::jsonpath;
jsonpath
----------
$."$foo"
(1 row)And we've come full circle :)
🎉
Best,
David
On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
On Mar 17, 2024, at 15:12, Erik Wienhold <ewie@ewie.name> wrote:
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely. The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.Ah, it’s JavaScript format, not JSON! This does clarify things quite
nicely, thank you. Happy to add my review once it’s in a commit fest.
Thanks. https://commitfest.postgresql.org/48/4899/
The first case ($.$foo) is in line with the restriction on member
accessors that you quoted first.Huh, that’s now how I read it. Here it is again:
Member accessor that returns an object member with the specified
key. If the key name matches some named variable starting with $ or
does not meet the JavaScript rules for an identifier, it must be
enclosed in double quotes to make it a string literal.Note that in my example `$foo` does not match a variable. I mean it
looks like a variable, but none is used here. I guess it’s being
conservative because it might be used in one of the functions, like
jsonb_path_exists(), to which variables might be passed.
I had the same reasoning while writing my first reply but scrapped that
part because I found it obvious: That jsonpath is parsed before calling
jsonb_path_exists() and therefore the parser has no context about any
variables, which might not even be hardcoded but may result from a
query.
The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor. Looks like a bug to me because a variable
doesn't even make sense in that place.Right. Maybe the docs should be updated to say that a literal dollar
sign isn’t supported in identifiers, unlike in JavaScript, except
through escapes like this:
Unfortunately, I don't have access to that part of the SQL spec. So I
don't know how the jsonpath grammar is specified.
I had a look into Oracle, MySQL, and SQLite docs to see what they
implement:
* Oracle requires the unquoted field names to match [A-Za-z][A-Za-z0-9]*
(see "object steps"). It also supports variables.
https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-path-expressions.html
* MySQL refers to ECMAScript identifiers but does not say anything about
variables: https://dev.mysql.com/doc/refman/8.3/en/json.html#json-path-syntax
* SQLite skimps on details and does not document a grammar:
https://sqlite.org/json1.html#path_arguments
But it looks as if it strives for compatibility with MySQL and our dear
Postgres: https://sqlite.org/src/doc/json-in-core/doc/json-enhancements.md
Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
insights but haven't found any yet.
--
Erik
On Mar 17, 2024, at 20:09, Erik Wienhold <ewie@ewie.name> wrote:
On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
On Mar 17, 2024, at 15:12, Erik Wienhold <ewie@ewie.name> wrote:
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely. The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.Ah, it’s JavaScript format, not JSON! This does clarify things quite
nicely, thank you. Happy to add my review once it’s in a commit fest.
Applies cleanly, `make -C doc/src/sgml check` runs without error. Doc improvement welcome and much clearer than before.
I had the same reasoning while writing my first reply but scrapped that
part because I found it obvious: That jsonpath is parsed before calling
jsonb_path_exists() and therefore the parser has no context about any
variables, which might not even be hardcoded but may result from a
query.
Right, there’s a chicken/egg problem.
Unfortunately, I don't have access to that part of the SQL spec. So I
don't know how the jsonpath grammar is specified.
Seems quite logical; I think it should be documented, but I’d also be interested to know what the 2016 and 2023 standards say, exactly.
Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
insights but haven't found any yet.
Everybody’s taking shortcuts relative to the standard, AFAICT. For example, jsonpath_scan.l matches unqouted identifiers with these two regular expressions:
<xnq>{other}+
<xnq>\/\*
<xnq,xq,xvq>\\.
Plus the backslash escapes. {other} is defined as:
/* "other" means anything that's not special, blank, or '\' or '"' */
other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/\\\" \t\n\r\f]
Which is waaaay more liberal than the ECMA standard[1]https://262.ecma-international.org/#sec-identifier-names, by my reading, but the MSDN[2]https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers description is quite succinct (thanks for the links!):
In JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_), and dollar signs ($). Identifiers are not allowed to start with numbers. However, JavaScript identifiers are not only limited to ASCII — many Unicode code points are allowed as well. Namely, any character in the ID_Start category can start an identifier, while any character in the ID_Continue category can appear after the first character.
ID_Start[3]https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D and ID_Continue[4]https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D point to the unicode standard codes lister, nether of which reference Emoji. Sure enough, in Safari:
x = {"🎉": true}
< {🎉: true}
x.🎉
< SyntaxError: Invalid character '\ud83c’
But in Postgres jsonpath:
david=# select '$.🎉'::jsonpath;
jsonpath
----------
$."🎉"
If the MSDN references to ID_Start and ID_Continue are correct, then the Postgres path parser is being overly-liberal. Maybe that’s totally fine? Not sure what should be documented and what’s not worth it.
Aside: I’m only digging into these details because I’m busy porting the path parser, so trying to figure out where to be compatible and where not to. So far I’m rejecting '$' (but allowing '\$' and '\u0024') but taking advantage of the unicode support in Go to specifically validate against ID_Start and ID_Continue.
Best,
David
[1]: https://262.ecma-international.org/#sec-identifier-names
[2]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers
[3]: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D
[4]: https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D
On 17.03.24 20:12, Erik Wienhold wrote:
Mentioning JSON and \v in the same sentence is wrong: JavaScript allows
that escape in strings but JSON doesn't. I think the easiest is to just
replace "JSON" with "JavaScript" in that sentence to make it right. The
paragraph also already says "embedded string literals follow JavaScript/
ECMAScript conventions", so mentioning JSON seems unnecessary to me.The last sentence also mentions backslash escapes \xNN and \u{N...} as
deviations from JSON when in fact those are valid escape sequences from
ECMA-262:https://262.ecma-international.org/#prod-HexEscapeSequence
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely. The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.
I have committed this patch, and backpatched it, as a bug fix, because
the existing description was wrong. To keep the patch minimal for
backpatching, I didn't do the conversion to a list. I'm not sure I like
that anyway, because it tends to draw more attention to that part over
the surrounding parts, which didn't seem appropriate in this case. But
anyway, if you have any more non-bug-fix editing in this area, which
would then target PG18, please send more patches.
On 18.03.24 01:09, Erik Wienhold wrote:
The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor. Looks like a bug to me because a variable
doesn't even make sense in that place.Right. Maybe the docs should be updated to say that a literal dollar
sign isn’t supported in identifiers, unlike in JavaScript, except
through escapes like this:Unfortunately, I don't have access to that part of the SQL spec. So I
don't know how the jsonpath grammar is specified.
The SQL spec says that <JSON path identifier> corresponds to Identifier
in ECMAScript.
But it also says,
A <JSON path identifier> is classified as follows.
Case:
a) A <JSON path identifier> that is a <dollar sign> is a <JSON path
context variable>.
b) A <JSON path identifier> that begins with <dollar sign> is a
<JSON path named variable>.
c) Otherwise, a <JSON path identifier> is a <JSON path key name>.
Does this help? I wasn't following all the discussion to see if there
is anything wrong with the implementation.
On Apr 24, 2024, at 05:51, Peter Eisentraut <peter@eisentraut.org> wrote:
A <JSON path identifier> is classified as follows.
Case:
a) A <JSON path identifier> that is a <dollar sign> is a <JSON path
context variable>.b) A <JSON path identifier> that begins with <dollar sign> is a
<JSON path named variable>.c) Otherwise, a <JSON path identifier> is a <JSON path key name>.
Does this help? I wasn't following all the discussion to see if there is anything wrong with the implementation.
Yes, it does, as it ties the special meaning of the dollar sign to the *beginning* of an expression. So it makes sense that this would be an error:
david=# select '$.$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
^
But I’m less sure when a dollar sign is used in the *middle* (or end) of a json path identifier:
david=# select '$.xx$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.xx$foo'::jsonpath;
^
Perhaps that should be valid?
Best,
David
On Apr 24, 2024, at 05:46, Peter Eisentraut <peter@eisentraut.org> wrote:
I have committed this patch, and backpatched it, as a bug fix, because the existing description was wrong. To keep the patch minimal for backpatching, I didn't do the conversion to a list. I'm not sure I like that anyway, because it tends to draw more attention to that part over the surrounding parts, which didn't seem appropriate in this case. But anyway, if you have any more non-bug-fix editing in this area, which would then target PG18, please send more patches.
Makes sense, that level of detail gets into the weeks so maybe doesn’t need to be quite so prominent as a list. Thank you!
David
On 2024-04-24 13:52 +0200, David E. Wheeler wrote:
On Apr 24, 2024, at 05:51, Peter Eisentraut <peter@eisentraut.org> wrote:
A <JSON path identifier> is classified as follows.
Case:
a) A <JSON path identifier> that is a <dollar sign> is a <JSON
path context variable>.b) A <JSON path identifier> that begins with <dollar sign> is a
<JSON path named variable>.c) Otherwise, a <JSON path identifier> is a <JSON path key name>.
Does this help? I wasn't following all the discussion to see if
there is anything wrong with the implementation.
Thanks Peter! But what is the definition of the entire path expression?
Perhaps something like:
<JSON path> ::= <JSON path identifier> { "." <JSON path identifier> }
That would imply that "$.$foo" is a valid path that accesses a variable
member (but I guess the path evaluation is also specified somewhere).
Does it say anything about double-quoted accessors? In table 8.25[1]https://www.postgresql.org/docs/current/datatype-json.html#TYPE-JSONPATH-ACCESSORS we
allow member accessor ."$varname" and it says "If the key name matches
some named variable starting with $ or does not meet the JavaScript
rules for an identifier, it must be enclosed in double quotes to make it
a string literal."
What bugs me about this description, after reading it a couple of times,
is that it's not clear what is meant by ."$varname". It could mean two
things: (1) the double-quoting masks $varname in order to not interpret
those characters as a variable or (2) an interpolated string that
resolves $varname and yields a dynamic member accessor.
The current implementation supports (1), i.e., ."$foo" does not refer to
variable foo but the actual property "$foo":
=> select jsonb_path_query('{"$foo":123,"bar":456}', '$."$foo"', '{"foo":"bar"}');
jsonb_path_query
------------------
123
(1 row)
Under case (2) I'd expect that query to return 456 (because $foo
resolves to "bar"). (Similar to how psql would resolve :'foo' to
'bar'.)
Variables already work in array accessors and table 8.25 says that "The
specified index can be an integer, as well as an expression returning a
single numeric value [...]". A variable is such an expression.
=> select jsonb_path_query('[2,3,5]', '$[$i]', '{"i":1}');
jsonb_path_query
------------------
3
(1 row)
So I'd expect a similar behavior for member accessors as well when
seeing ."$varname" in the same table.
Yes, it does, as it ties the special meaning of the dollar sign to the
*beginning* of an expression. So it makes sense that this would be an
error:david=# select '$.$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
^
But I’m less sure when a dollar sign is used in the *middle* (or end)
of a json path identifier:david=# select '$.xx$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.xx$foo'::jsonpath;
^
Perhaps that should be valid?
Yes, I think so. That would be case C in the spec excerpt provided by
Peter. So it's just a key name that happens to contain (but not start
with) the dollar sign.
[1]: https://www.postgresql.org/docs/current/datatype-json.html#TYPE-JSONPATH-ACCESSORS
--
Erik
On Apr 24, 2024, at 3:22 PM, Erik Wienhold <ewie@ewie.name> wrote:
Thanks Peter! But what is the definition of the entire path expression?
Perhaps something like:<JSON path> ::= <JSON path identifier> { "." <JSON path identifier> }
That would imply that "$.$foo" is a valid path that accesses a variable
member (but I guess the path evaluation is also specified somewhere).
I read it as “if it starts with a dollar sign, it’s a variable and not a path identifier”, and I assume any `.foo` expression is a path identifier.
What bugs me about this description, after reading it a couple of times,
is that it's not clear what is meant by ."$varname". It could mean two
things: (1) the double-quoting masks $varname in order to not interpret
those characters as a variable or (2) an interpolated string that
resolves $varname and yields a dynamic member accessor.
My understanding is that if it’s in double quotes it’s never anything other than a string (whether a string literal or a path identifier string literal). IOW, variables don’t interpolate inside strings.
Under case (2) I'd expect that query to return 456 (because $foo
resolves to "bar"). (Similar to how psql would resolve :'foo' to
'bar'.)
Yes, I suspect this is the correct interpretation, but agree the wording could use some massaging, especially since path identifiers cannot start with a dollar sign anyway. Perhaps:
"If the key name starts with $ or does not meet the JavaScript rules for an identifier, it must be enclosed in double quotes to make it a string literal."
Variables already work in array accessors and table 8.25 says that "The
specified index can be an integer, as well as an expression returning a
single numeric value [...]". A variable is such an expression.=> select jsonb_path_query('[2,3,5]', '$[$i]', '{"i":1}');
jsonb_path_query
------------------
3
(1 row)So I'd expect a similar behavior for member accessors as well when
seeing ."$varname" in the same table.
Oh, interesting point! Now I wonder if the standard has this inconsistency (and is aware of it).
Yes, I think so. That would be case C in the spec excerpt provided by
Peter. So it's just a key name that happens to contain (but not start
with) the dollar sign.
Exactly. It also matches the doc you quote above. Something would have to change in src/backend/utils/adt/jsonpath_scan.l to fix that, but that file makes my eyes water, so I’m not gonna take a stab at it. :-)
D