❓ JSON Path Dot Precedence
Hello Hackers,
A question about the behavior of the JSON Path parser. The docs[1]https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH have this to say about numbers:
Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON in some minor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON.
In other words, this is valid:
david=# select '2.'::jsonpath;
jsonpath
----------
2
But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How should that be parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2 followed by the path segment “p10”? Here’s the parser’s answer:
david=# select '0x2.p10'::jsonpath;
jsonpath
-----------
(2)."p10"
So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a decimal expression. Is that intentional/correct?
Discovered while writing my Go lexer and throwing all of Go’s floating point literal examples[2]https://tip.golang.org/ref/spec#Floating-point_literals at it and comparing to the Postgres path parser. Curiously, this is only an issue for 0x/0o/0b numeric expressions; a decimal expression does not behave in the same way:
david=# select '2.p10'::jsonpath;
ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input
LINE 1: select '2.p10'::jsonpath;
Which maybe seems a bit inconsistent.
Thoughts on what the “correct” behavior should be?
Best,
David
[1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
[2]: https://tip.golang.org/ref/spec#Floating-point_literals
On 2024-04-07 18:13 +0200, David E. Wheeler wrote:
A question about the behavior of the JSON Path parser. The docs[1]
have this to say about numbers:Numeric literals in SQL/JSON path expressions follow JavaScript
rules, which are different from both SQL and JSON in some minor
details. For example, SQL/JSON path allows .1 and 1., which are
invalid in JSON.In other words, this is valid:
david=# select '2.'::jsonpath;
jsonpath
----------
2But this feature creates a bit of a conflict with the use of a dot for
path expressions. Consider `0x2.p10`. How should that be parsed? As an
invalid decimal expression ("trailing junk after numeric literal”), or
as a valid integer 2 followed by the path segment “p10”? Here’s the
parser’s answer:david=# select '0x2.p10'::jsonpath;
jsonpath
-----------
(2)."p10"So it would seem that, other things being equal, a path key expression
(`.foo`) is slightly higher precedence than a decimal expression. Is
that intentional/correct?
I guess jsonpath assumes that hex, octal, and binary literals are
integers. So there's no ambiguity about any fractional part that might
follow.
Discovered while writing my Go lexer and throwing all of Go’s floating
point literal examples[2] at it and comparing to the Postgres path
parser. Curiously, this is only an issue for 0x/0o/0b numeric
expressions; a decimal expression does not behave in the same way:david=# select '2.p10'::jsonpath;
ERROR: trailing junk after numeric literal at or near "2.p" of jsonpath input
LINE 1: select '2.p10'::jsonpath;
It scans the decimal "2." and then finds junks "p10".
Works with a full decimal:
test=# select '3.14.p10'::jsonpath;
jsonpath
--------------
(3.14)."p10"
(1 row)
And with extra whitespace to resolve the ambiguity:
test=# select '2 .p10'::jsonpath;
jsonpath
-----------
(2)."p10"
(1 row)
Which maybe seems a bit inconsistent.
Thoughts on what the “correct” behavior should be?
I'd say a member accessor after a number doesn't really make sense
because object keys are strings. One could argue that path "$.2.p10"
should match JSON '{"2":{"p10":42}}', i.e. the numeric accessor is
converted to a string. For example, in nodejs I can do:
var x = {2: {p10: 42}}
x[2].p10
42
But that's JavaScript, not JSON.
Also, is there even a use case for path "0x2.p10"? The path has to
start with "$" or ("@" in case of a filter expression), doesn't it? And
it that case it doesn't parse:
test=# select '$.0x2.p10'::jsonpath;
ERROR: trailing junk after numeric literal at or near ".0x" of jsonpath input
LINE 1: select '$.0x2.p10'::jsonpath;
Even with extra whitespace:
test=# select '$ . 0x2 . p10'::jsonpath;
ERROR: syntax error at or near "0x2" of jsonpath input
LINE 1: select '$ . 0x2 . p10'::jsonpath;
Or should it behave like an array accessor? Similar to:
test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath);
jsonb_path_query
------------------
42
(1 row)
[1]: https://www.postgresql.org/docs/devel/datatype-json.html#DATATYPE-JSONPATH
[2]: https://tip.golang.org/ref/spec#Floating-point_literals
--
Erik
On Apr 7, 2024, at 15:46, Erik Wienhold <ewie@ewie.name> wrote:
I guess jsonpath assumes that hex, octal, and binary literals are
integers. So there's no ambiguity about any fractional part that might
follow.
Yeah, that’s what the comment in the flex file says:
https://github.com/postgres/postgres/blob/b4a71cf/src/backend/utils/adt/jsonpath_scan.l#L102-L105
Also, is there even a use case for path "0x2.p10"? The path has to
start with "$" or ("@" in case of a filter expression), doesn't it? And
it that case it doesn't parse:test=# select '$.0x2.p10'::jsonpath;
ERROR: trailing junk after numeric literal at or near ".0x" of jsonpath input
LINE 1: select '$.0x2.p10'::jsonpath;Even with extra whitespace:
test=# select '$ . 0x2 . p10'::jsonpath;
ERROR: syntax error at or near "0x2" of jsonpath input
LINE 1: select '$ . 0x2 . p10'::jsonpath;Or should it behave like an array accessor? Similar to:
test=# select jsonb_path_query('[0,1,{"p10":42},3]', '$[0x2].p10'::jsonpath);
jsonb_path_query
------------------
42
(1 row)
I too am curious why these parse successfully, but don’t appear to be useful.
Best,
David
On 07.04.24 18:13, David E. Wheeler wrote:
Hello Hackers,
A question about the behavior of the JSON Path parser. The docs[1] have this to say about numbers:
Numeric literals in SQL/JSON path expressions follow JavaScript rules, which are different from both SQL and JSON in some minor details. For example, SQL/JSON path allows .1 and 1., which are invalid in JSON.
In other words, this is valid:
david=# select '2.'::jsonpath;
jsonpath
----------
2But this feature creates a bit of a conflict with the use of a dot for path expressions. Consider `0x2.p10`. How should that be parsed? As an invalid decimal expression ("trailing junk after numeric literal”), or as a valid integer 2 followed by the path segment “p10”? Here’s the parser’s answer:
david=# select '0x2.p10'::jsonpath;
jsonpath
-----------
(2)."p10"So it would seem that, other things being equal, a path key expression (`.foo`) is slightly higher precedence than a decimal expression. Is that intentional/correct?
I think the derivation would be like this:
(I'm not sure what the top-level element would be, so let's start
somewhere in the middle ...)
<JSON unary expression> ::= <JSON accessor expression>
<JSON accessor expression> ::= <JSON path primary> <JSON accessor op>
<JSON path primary> ::= <JSON path literal>
<JSON accessor op> ::= <JSON member accessor>
<JSON member accessor> ::= <period> <JSON path key name>
So the whole thing is
<JSON path literal> <period> <JSON path key name>
The syntax of <JSON path literal> and <JSON path key name> is then
punted to ECMAScript 5.1.
0x2 is a HexIntegerLiteral. (There can be no dots in that.)
p10 is an Identifier.
So I think this is all correct.
On Apr 10, 2024, at 10:29, Peter Eisentraut <peter@eisentraut.org> wrote:
So the whole thing is
<JSON path literal> <period> <JSON path key name>
The syntax of <JSON path literal> and <JSON path key name> is then punted to ECMAScript 5.1.
0x2 is a HexIntegerLiteral. (There can be no dots in that.)
p10 is an Identifier.
So I think this is all correct.
That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: what path can `'0x2.p10` even select?
Best,
David
Hi, following up on some old threads.
On Apr 10, 2024, at 16:44, David E. Wheeler <david@justatheory.com> wrote:
That makes sense, thanks. It’s just a little odd to me that the resulting path isn’t a query at all. To Erik’s point: what path can `'0x2.p10` even select?
I’m wondering whether the jsonpath parser should be updated to reject cases like this. I think it will always return no results. AFAICT, there’s no way to navigate to an object identifier immediately after a number:
david=# select '0x2.p10'::jsonpath;
jsonpath
-----------
(2)."p10"
(1 row)
david=# select jsonb_path_query(target => '[0, 1, {"p10": true}]', path => '0x2.p10');
jsonb_path_query
------------------
(0 rows)
david=# select jsonb_path_query(target => '{"0x2": {"p10": true}}', path => '0x2.p10');
jsonb_path_query
------------------
(0 rows)
It’s just inherently meaningless. BTW, it’s not limited to hex numbers:
david=# select '(2).p10'::jsonpath;
jsonpath
-----------
(2)."p10"
OTOH, maybe that’s a corner case we can live with.
Best,
David
On Mon, Jul 8, 2024 at 8:27 AM David E. Wheeler <david@justatheory.com>
wrote:
Hi, following up on some old threads.
On Apr 10, 2024, at 16:44, David E. Wheeler <david@justatheory.com>
wrote:
That makes sense, thanks. It’s just a little odd to me that the
resulting path isn’t a query at all. To Erik’s point: what path can
`'0x2.p10` even select?I’m wondering whether the jsonpath parser should be updated to reject
cases like this. I think it will always return no results. AFAICT, there’s
no way to navigate to an object identifier immediately after a number:
If we go down this path wouldn't the correct framing be: do not allow
accessors after scalars ? The same argument applies to false/"john" and
other scalar types since by definition none of them have subcomponents to
be accessed.
That said, the parser has a lax mode which somewhat implies it doesn't
expect the jsonpath type to perform much in the way of validation of the
semantic correctness of the encoded path expression.
I like the idea of a smarter expression-holding type and would even wish to
have had this on day one. Retrofitting is less appealing. We document a
similarity with regular expressions here where we, for better and worse,
have lived without a regexppath data type forever and leave it to the
executor to tell the user their pattern is invalid. Leaning on that
precedence here makes accepting the status quo more reasonable. Though
strict/lax modes and, I think, variables, motivates me to put my vote
toward the "do more validation" group.
Does the standard even have a separate type here or is that our
implementation detail invention?
David J.
On Jul 8, 2024, at 12:05, David G. Johnston <david.g.johnston@gmail.com> wrote:
If we go down this path wouldn't the correct framing be: do not allow accessors after scalars ? The same argument applies to false/"john" and other scalar types since by definition none of them have subcomponents to be accessed.
Yes, excellent point.
That said, the parser has a lax mode which somewhat implies it doesn't expect the jsonpath type to perform much in the way of validation of the semantic correctness of the encoded path expression.
My understanding is that lax mode means it ignores where the JSON doesn’t abide by expectations of the path expression, not that the path parsing is lax.
I like the idea of a smarter expression-holding type and would even wish to have had this on day one. Retrofitting is less appealing. We document a similarity with regular expressions here where we, for better and worse, have lived without a regexppath data type forever and leave it to the executor to tell the user their pattern is invalid. Leaning on that precedence here makes accepting the status quo more reasonable. Though strict/lax modes and, I think, variables, motivates me to put my vote toward the "do more validation" group.
This feels different from a documented difference in behavior as an implementation choice, like path regex vs. Spencer. In this case, the expression is technically meaningless, but there’s never so much as an error thrown.
Does the standard even have a separate type here or is that our implementation detail invention?
Sorry, separate type for what?
Best,
David
On Mon, Jul 8, 2024 at 9:12 AM David E. Wheeler <david@justatheory.com>
wrote:
On Jul 8, 2024, at 12:05, David G. Johnston <david.g.johnston@gmail.com>
wrote:Does the standard even have a separate type here or is that our
implementation detail invention?
Sorry, separate type for what?
We created a data type named: jsonpath. Does the standard actually have
that data type and defined parsing behavior or does it just have functions
where one of the inputs is text whose contents are a path expression?
David J.
On Jul 8, 2024, at 12:17, David G. Johnston <david.g.johnston@gmail.com> wrote:
We created a data type named: jsonpath. Does the standard actually have that data type and defined parsing behavior or does it just have functions where one of the inputs is text whose contents are a path expression?
Ah, got it.
D