BUG #11207: empty path will segfault jsonb #>

Started by Nonameover 11 years ago19 messagesbugs
Jump to latest
#1Noname
justin.vanwinkle@gmail.com

The following bug has been logged on the website:

Bug reference: 11207
Logged by: Justin Van Winkle
Email address: justin.vanwinkle@gmail.com
PostgreSQL version: 9.4beta2
Operating system: linux
Description:

select '{"a": {"b":{"c": "foo"}}}'::jsonb #> '{}';

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: BUG #11207: empty path will segfault jsonb #>

justin.vanwinkle@gmail.com writes:

select '{"a": {"b":{"c": "foo"}}}'::jsonb #> '{}';
[ dumps core ]

Hm. It's not immediately obvious to me what this should be defined to do;
should it throw an error, or return the complete LHS object, or perhaps
return a NULL? But the author of get_jsonb_path_all() evidently didn't
consider the case.

Easy enough to fix once we settle on a definition.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #11207: empty path will segfault jsonb #>

Tom Lane-2 wrote

justin.vanwinkle@

writes:

select '{"a": {"b":{"c": "foo"}}}'::jsonb #> '{}';
[ dumps core ]

Hm. It's not immediately obvious to me what this should be defined to do;
should it throw an error, or return the complete LHS object, or perhaps
return a NULL? But the author of get_jsonb_path_all() evidently didn't
consider the case.

Easy enough to fix once we settle on a definition.

As there is no immediately obvious way to return the entire LHS when using
this operator it doesn't seem that far-fetched to define this as the syntax
that would do so - though then there wouldn't be a way to not return the
entire LHS without providing a known invalid key (maybe '{NULL}' - what
would that do now anyway...)

Not that we would necessarily want to duplicate hstore but "hstore ->
text[]" seems similar enough to emulate if desired.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-11207-empty-path-will-segfault-jsonb-tp5815469p5815490.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: BUG #11207: empty path will segfault jsonb #>

I wrote:

justin.vanwinkle@gmail.com writes:

select '{"a": {"b":{"c": "foo"}}}'::jsonb #> '{}';
[ dumps core ]

Hm. It's not immediately obvious to me what this should be defined to do;
should it throw an error, or return the complete LHS object, or perhaps
return a NULL? But the author of get_jsonb_path_all() evidently didn't
consider the case.

It occurred to me to check what the pre-existing JSON-type code does with
this case, and I find it returns NULL. So there's certainly an argument
to be made for following that precedent and calling it a day. However,
on reflection it seems to me that this behavior is entirely nonsensical,
and the sane thing to do is to return the given json or jsonb input.
Surely #> with a K-element path ought to act the same as K invocations
of the -> operator ... and zero invocations would result in just having
the original input object, no?

So I propose that we fix both operators to return the input object in this
case, and release-note the change in behavior from 9.3.

BTW, the same argument would suggest that if the array contains null
elements, the right thing to do is return NULL, not throw an error
as the current code does. You would get a NULL if you passed a NULL
RHS to ->, because it's strict.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#3)
Re: BUG #11207: empty path will segfault jsonb #>

David G Johnston <david.g.johnston@gmail.com> writes:

Not that we would necessarily want to duplicate hstore but "hstore ->
text[]" seems similar enough to emulate if desired.

Actually, that operator is something completely different: it does N
independent lookups in the given hstore, and produces an array of the N
results. Since hstore has no concept of nesting, there wouldn't be a
reason for it to have something that does what json's #> does.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: BUG #11207: empty path will segfault jsonb #>

I wrote:

Surely #> with a K-element path ought to act the same as K invocations
of the -> operator ... and zero invocations would result in just having
the original input object, no?

Poking at that some more ...

regression=# select '{"a": {"b":{"c": "foo"}}}'::json #> '{a}';
?column?
--------------------
{"b":{"c": "foo"}}
(1 row)

regression=# select '{"a": {"b":{"c": "foo"}}}'::json #> '{a,b}';
?column?
--------------
{"c": "foo"}
(1 row)

regression=# select '{"a": {"b":{"c": "foo"}}}'::json #> '{a,b,c}';
?column?
----------
"foo"
(1 row)

regression=# select '{"a": {"b":{"c": "foo"}}}'::json #> '{a,b,c,d}';
?column?
----------

(1 row)

That comports with successive applications of -> up to the last step,
where you'd get an error instead of NULL:

regression=# select '"foo"'::json -> 'd';
ERROR: cannot extract element from a scalar

Is there a reason for these to behave inconsistently, and if not, which
behavior should we standardize on? Considering that you get NULL not an
error for extracting a nonexistent element from an object, I think there
is some case to be made for saying that returning NULL is the more
convenient behavior. Of course one can also argue for wanting this
operator to throw errors if the JSON structure doesn't match the
operation, but it seems like we've chosen to prefer being lax.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#6)
Re: BUG #11207: empty path will segfault jsonb #>

On Wed, Aug 20, 2014 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is there a reason for these to behave inconsistently, and if not, which
behavior should we standardize on? Considering that you get NULL not an
error for extracting a nonexistent element from an object, I think there
is some case to be made for saying that returning NULL is the more
convenient behavior. Of course one can also argue for wanting this
operator to throw errors if the JSON structure doesn't match the
operation, but it seems like we've chosen to prefer being lax.

I discussed this very issue with Andrew during development (I think
that this happened to occur in private). My view was that since users
will frequently use -> within expression indexes, it's best to have it
return NULL for non-objects, rather than make them worry about the
case where it'll be rejected, which is rather contrary to the spirit
of jsonb (at least as a default behavior). Andrew argued it was
preferable to stick to the historic behavior of json operators. IMV,
we should have both operators return NULL. They should be consistent,
which implies changing the behavior of the existing json variants too,
but I don't think that's a big problem.

Note that the documentation briefly draws attention to this issue, in
a comment in the expression index example.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#7)
Re: BUG #11207: empty path will segfault jsonb #>

Peter Geoghegan <pg@heroku.com> writes:

On Wed, Aug 20, 2014 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is there a reason for these to behave inconsistently, and if not, which
behavior should we standardize on? Considering that you get NULL not an
error for extracting a nonexistent element from an object, I think there
is some case to be made for saying that returning NULL is the more
convenient behavior. Of course one can also argue for wanting this
operator to throw errors if the JSON structure doesn't match the
operation, but it seems like we've chosen to prefer being lax.

I discussed this very issue with Andrew during development (I think
that this happened to occur in private). My view was that since users
will frequently use -> within expression indexes, it's best to have it
return NULL for non-objects, rather than make them worry about the
case where it'll be rejected, which is rather contrary to the spirit
of jsonb (at least as a default behavior).

Hadn't thought of that angle, but it's a really good point. I'd certainly
rather be able to put an index on (jsoncol -> 'foo') without worrying
about what will happen if the column includes things that aren't objects.

Andrew argued it was
preferable to stick to the historic behavior of json operators. IMV,
we should have both operators return NULL. They should be consistent,
which implies changing the behavior of the existing json variants too,
but I don't think that's a big problem.

None of these operators existed before 9.3, so I don't put a lot of stock
in the idea that their corner-case behaviors should be considered
sacrosanct already. But that will become the case pretty soon; if we
don't get it right in 9.4 it will arguably be too late.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: BUG #11207: empty path will segfault jsonb #>

On 08/20/2014 03:06 PM, Tom Lane wrote:

Peter Geoghegan <pg@heroku.com> writes:

On Wed, Aug 20, 2014 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Is there a reason for these to behave inconsistently, and if not, which
behavior should we standardize on? Considering that you get NULL not an
error for extracting a nonexistent element from an object, I think there
is some case to be made for saying that returning NULL is the more
convenient behavior. Of course one can also argue for wanting this
operator to throw errors if the JSON structure doesn't match the
operation, but it seems like we've chosen to prefer being lax.

I discussed this very issue with Andrew during development (I think
that this happened to occur in private). My view was that since users
will frequently use -> within expression indexes, it's best to have it
return NULL for non-objects, rather than make them worry about the
case where it'll be rejected, which is rather contrary to the spirit
of jsonb (at least as a default behavior).

Hadn't thought of that angle, but it's a really good point. I'd certainly
rather be able to put an index on (jsoncol -> 'foo') without worrying
about what will happen if the column includes things that aren't objects.

Andrew argued it was
preferable to stick to the historic behavior of json operators. IMV,
we should have both operators return NULL. They should be consistent,
which implies changing the behavior of the existing json variants too,
but I don't think that's a big problem.

None of these operators existed before 9.3, so I don't put a lot of stock
in the idea that their corner-case behaviors should be considered
sacrosanct already. But that will become the case pretty soon; if we
don't get it right in 9.4 it will arguably be too late.

I'm not terribly dogmatic about it. If the consensus is to change it
then let's do it.

cheers

andrew

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#9)
Re: BUG #11207: empty path will segfault jsonb #>

Andrew Dunstan <andrew@dunslane.net> writes:

On 08/20/2014 03:06 PM, Tom Lane wrote:

Peter Geoghegan <pg@heroku.com> writes:

Andrew argued it was
preferable to stick to the historic behavior of json operators. IMV,
we should have both operators return NULL. They should be consistent,
which implies changing the behavior of the existing json variants too,
but I don't think that's a big problem.

None of these operators existed before 9.3, so I don't put a lot of stock
in the idea that their corner-case behaviors should be considered
sacrosanct already. But that will become the case pretty soon; if we
don't get it right in 9.4 it will arguably be too late.

I'm not terribly dogmatic about it. If the consensus is to change it
then let's do it.

Well, if the preference is to make jsonb conform to the historical
behavior of json, we have work to do anyway. I noted this inconsistency
while drawing up some test cases:

regression=# SELECT '42'::json #> array['f2'];
?column?
----------

(1 row)

regression=# SELECT '42'::jsonb #> array['f2'];
ERROR: cannot extract path from a scalar

In this particular case the json code is self-inconsistent, since ->
throws an error:

regression=# SELECT '42'::json -> 'f2';
ERROR: cannot extract element from a scalar
regression=# SELECT '42'::jsonb -> 'f2';
ERROR: cannot call jsonb_object_field (jsonb -> text) on a scalar

I think returning NULL is the right thing here, really. Aside from being
arguably more convenient for indexing, we will get less push-back if we
make some operators go from throwing errors to returning null than if
we make some other operators go the other way.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: BUG #11207: empty path will segfault jsonb #>

I wrote:

I think returning NULL is the right thing here, really. Aside from being
arguably more convenient for indexing, we will get less push-back if we
make some operators go from throwing errors to returning null than if
we make some other operators go the other way.

Attached is a proposed patch (against HEAD and 9.4 git tip) that has
these effects:

1. #> now returns the input object if the RHS array is empty;

2. Error cases in -> and #> are all removed in favor of returning NULL.

The regression test changes show that we were really pretty inconsistent
about whether to throw error or return NULL for cases where the JSON
tree structure didn't match the extraction request. I think having a
uniform rule is a definite improvement.

regards, tom lane

Attachments:

rationalize-json-extraction-ops.patchtext/x-diff; charset=us-ascii; name=rationalize-json-extraction-ops.patchDownload+738-739
In reply to: Tom Lane (#11)
Re: BUG #11207: empty path will segfault jsonb #>

On Wed, Aug 20, 2014 at 4:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The regression test changes show that we were really pretty inconsistent
about whether to throw error or return NULL for cases where the JSON
tree structure didn't match the extraction request. I think having a
uniform rule is a definite improvement.

That's unfortunate. I'm very much in favor of these changes.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#11)
Re: BUG #11207: empty path will segfault jsonb #>

On 08/20/2014 07:30 PM, Tom Lane wrote:

I wrote:

I think returning NULL is the right thing here, really. Aside from being
arguably more convenient for indexing, we will get less push-back if we
make some operators go from throwing errors to returning null than if
we make some other operators go the other way.

Attached is a proposed patch (against HEAD and 9.4 git tip) that has
these effects:

1. #> now returns the input object if the RHS array is empty;

So will

val #>> '{}'

now return a dequoted bare scalar string? I think that's where the OP
actually came into this.

2. Error cases in -> and #> are all removed in favor of returning NULL.

The regression test changes show that we were really pretty inconsistent
about whether to throw error or return NULL for cases where the JSON
tree structure didn't match the extraction request. I think having a
uniform rule is a definite improvement.

Works for me. Thans for cleaning this up.

cheers

andrew

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#13)
Re: BUG #11207: empty path will segfault jsonb #>

Andrew Dunstan <andrew@dunslane.net> writes:

On 08/20/2014 07:30 PM, Tom Lane wrote:

1. #> now returns the input object if the RHS array is empty;

So will
val #>> '{}'
now return a dequoted bare scalar string? I think that's where the OP
actually came into this.

Hm ... as the patch stands, you get the same thing from either operator:

regression=# select '"foo"'::json #> '{}';
?column?
----------
"foo"
(1 row)

regression=# select '"foo"'::json #>> '{}';
?column?
----------
"foo"
(1 row)

If you think the latter should be dequoted, we can probably make it so.
I'm not entirely convinced that's right though: you could argue that
dequoting is a function of the -> operator and we applied zero such
operators. (I'm not wedded to that argument, just raising it as food
for thought.) Comments?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#14)
Re: BUG #11207: empty path will segfault jsonb #>

On Wed, Aug 20, 2014 at 10:51 PM, Tom Lane-2 [via PostgreSQL] <
ml-node+s1045698n5815665h39@n5.nabble.com> wrote:

Andrew Dunstan <[hidden email]
<http://user/SendEmail.jtp?type=node&amp;node=5815665&amp;i=0&gt;&gt; writes:

On 08/20/2014 07:30 PM, Tom Lane wrote:

1. #> now returns the input object if the RHS array is empty;

So will
val #>> '{}'
now return a dequoted bare scalar string? I think that's where the OP
actually came into this.

Hm ... as the patch stands, you get the same thing from either operator:

regression=# select '"foo"'::json #> '{}';
?column?
----------
"foo"
(1 row)

regression=# select '"foo"'::json #>> '{}';
?column?
----------
"foo"
(1 row)

If you think the latter should be dequoted, we can probably make it so.
I'm not entirely convinced that's right though: you could argue that
dequoting is a function of the -> operator and we applied zero such
operators. (I'm not wedded to that argument, just raising it as food
for thought.) Comments?

​Examples of both json object/array and json​ scalar results from the
different operators may be worth considering.

It is not obvious that your statement:

"dequoting is a function of the -> operator"

is true if one is just looking at the documentation.

Did you maybe intend to say the "->>" operator (which is stated to return
text, not json)?

"Get JSON object field by key" (->)

vs.

"Get JSON object field as text" (->>)

http://www.postgresql.org/docs/9.4/interactive/functions-json.html

The first one should be more precise, maybe: "Get JSON object value, by
key, as json"; and probably add the "by key" to the ->> operator at the
same time for consistency (though to be honest the "by key" part seems
redundant).

The use of "field" here instead of "value" is also confusing (note I
changed it in my alternative). Do we have the same problem for value/field
as we did for key/name?

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-11207-empty-path-will-segfault-jsonb-tp5815469p5815667.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: BUG #11207: empty path will segfault jsonb #>

On 08/20/2014 10:50 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 08/20/2014 07:30 PM, Tom Lane wrote:

1. #> now returns the input object if the RHS array is empty;

So will
val #>> '{}'
now return a dequoted bare scalar string? I think that's where the OP
actually came into this.

Hm ... as the patch stands, you get the same thing from either operator:

regression=# select '"foo"'::json #> '{}';
?column?
----------
"foo"
(1 row)

regression=# select '"foo"'::json #>> '{}';
?column?
----------
"foo"
(1 row)

If you think the latter should be dequoted, we can probably make it so.
I'm not entirely convinced that's right though: you could argue that
dequoting is a function of the -> operator and we applied zero such
operators. (I'm not wedded to that argument, just raising it as food
for thought.) Comments?

This seems backwards. -> and #> return legal json. ->> and #>> return
text, which is dequoted if the result is a scalar string:

andrew=# select '{"a":"b"}'::json #> '{a}' as j, '{"a":"b"}'::json
#>> '{a}' as t;
j | t
-----+---
"b" | b
(1 row)

cheers

andrew

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#15)
Re: BUG #11207: empty path will segfault jsonb #>

David G Johnston <david.g.johnston@gmail.com> writes:

On Wed, Aug 20, 2014 at 10:51 PM, Tom Lane-2 [via PostgreSQL] <
ml-node+s1045698n5815665h39@n5.nabble.com> wrote:

If you think the latter should be dequoted, we can probably make it so.
I'm not entirely convinced that's right though: you could argue that
dequoting is a function of the -> operator and we applied zero such
operators. (I'm not wedded to that argument, just raising it as food
for thought.) Comments?

Did you maybe intend to say the "->>" operator (which is stated to return
text, not json)?

Right, sorry, sent that too hastily.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#16)
Re: BUG #11207: empty path will segfault jsonb #>

Andrew Dunstan <andrew@dunslane.net> writes:

On 08/20/2014 10:50 PM, Tom Lane wrote:

If you think the latter should be dequoted, we can probably make it so.
I'm not entirely convinced that's right though: you could argue that
dequoting is a function of the -> operator and we applied zero such
operators. (I'm not wedded to that argument, just raising it as food
for thought.) Comments?

This seems backwards. -> and #> return legal json. ->> and #>> return
text, which is dequoted if the result is a scalar string:

Hm. Okay, if you are thinking of it as a datatype transformation rather
than part of the extraction operation, then it makes sense.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#14)
Re: BUG #11207: empty path will segfault jsonb #>

I wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

So will
val #>> '{}'
now return a dequoted bare scalar string? I think that's where the OP
actually came into this.

If you think the latter should be dequoted, we can probably make it so.

Well, that turned out to be significantly more painful than I expected.
I ended up mostly rewriting get_worker() to make it less of a mess...

regards, tom lane

Attachments:

rationalize-json-extraction-ops-2.patchtext/x-diff; charset=us-ascii; name=rationalize-json-extraction-ops-2.patchDownload+1338-1104