jsonb contains behaviour weirdness

Started by Alexander Korotkovover 11 years ago20 messages
#1Alexander Korotkov
aekorotkov@gmail.com

Hi!

Let's consider some examples.

# select '[1,2]'::jsonb @> '[1,2,2]'::jsonb;
?column?
----------
f
(1 row)

One may think it's because second jsonb array contain two "2". So, contains
takes care about count of equal elements.

# select '[1,1,2]'::jsonb @> '[1,2,2]'::jsonb;
?column?
----------
t
(1 row)

But, it's not. Jsonb contains takes care only about length of array.

# select '[[1,2]]'::jsonb @> '[[1,2,2]]'::jsonb;
?column?
----------
t
(1 row)

Even more weird :)
The reason why jsonb contains behaves so is check in the beginning
of jsonb_contains. It makes fast check of jsonb type and elements count
before calling JsonbDeepContains.

if (JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl) ||
JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
PG_RETURN_BOOL(false);

It's likely that "JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl)" should be
checked only for objects, not arrays. Also, should JsonbDeepContains does
same fast check when it deals with nested objects?

------
With best regards,
Alexander Korotkov.

#2Peter Geoghegan
pg@heroku.com
In reply to: Alexander Korotkov (#1)
Re: jsonb contains behaviour weirdness

On Fri, Sep 12, 2014 at 6:40 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

Even more weird :)

Agreed.

The reason why jsonb contains behaves so is check in the beginning of
jsonb_contains. It makes fast check of jsonb type and elements count before
calling JsonbDeepContains.

if (JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl) ||
JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
PG_RETURN_BOOL(false);

It's likely that "JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl)" should be
checked only for objects, not arrays. Also, should JsonbDeepContains does
same fast check when it deals with nested objects?

I think this is due to commit 364ddc. That removed the extra step that
had arrays sorted (and then de-duped) ahead of time, which made arrays
behave like objects at the top level. I think that this sort + de-dup
step was mischaracterized as purely a performance thing (possibly by
me). Basically, JsonbDeepContains() is consistent with the previous
behavior at the top level, but not the current (inadvertently altered)
behavior. I think the fix is probably a return to the previous
behavior. I'll take a closer look.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#2)
Re: jsonb contains behaviour weirdness

Peter Geoghegan <pg@heroku.com> writes:

I think this is due to commit 364ddc. That removed the extra step that
had arrays sorted (and then de-duped) ahead of time, which made arrays
behave like objects at the top level. I think that this sort + de-dup
step was mischaracterized as purely a performance thing (possibly by
me). Basically, JsonbDeepContains() is consistent with the previous
behavior at the top level, but not the current (inadvertently altered)
behavior. I think the fix is probably a return to the previous
behavior. I'll take a closer look.

I'm confused. Are you proposing to return to sort + de-dup of JSON
arrays? Surely that is completely broken. Arrays are ordered.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#3)
Re: jsonb contains behaviour weirdness

On Fri, Sep 12, 2014 at 11:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm confused. Are you proposing to return to sort + de-dup of JSON
arrays? Surely that is completely broken. Arrays are ordered.

Sorry, my earlier remarks were premature. In fact, that alteration
only applied to existence, not containment. However, arrays are
ordered for the purposes of equality, but not containment.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Josh Berkus
josh@agliodbs.com
In reply to: Alexander Korotkov (#1)
Re: jsonb contains behaviour weirdness

On 09/12/2014 06:40 AM, Alexander Korotkov wrote:

Hi!

Let's consider some examples.

# select '[1,2]'::jsonb @> '[1,2,2]'::jsonb;
?column?
----------
f
(1 row)

One may think it's because second jsonb array contain two "2". So,
contains takes care about count of equal elements.

JSONB arrays are allowed to have repleating elements. It's keys which
are not allowed to repeat.

# select '[1,1,2]'::jsonb @> '[1,2,2]'::jsonb;
?column?
----------
t
(1 row)

But, it's not. Jsonb contains takes care only about length of array.

OK, now, that's messed up.

# select '[[1,2]]'::jsonb @> '[[1,2,2]]'::jsonb;
?column?
----------
t
(1 row)

Even more weird :)
The reason why jsonb contains behaves so is check in the beginning
of jsonb_contains. It makes fast check of jsonb type and elements count
before calling JsonbDeepContains.

if (JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl) ||
JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
PG_RETURN_BOOL(false);

It's likely that "JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl)" should be
checked only for objects, not arrays.

Yeah, agreed.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#4)
Re: jsonb contains behaviour weirdness

Peter Geoghegan <pg@heroku.com> writes:

On Fri, Sep 12, 2014 at 11:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm confused. Are you proposing to return to sort + de-dup of JSON
arrays? Surely that is completely broken. Arrays are ordered.

Sorry, my earlier remarks were premature. In fact, that alteration
only applied to existence, not containment. However, arrays are
ordered for the purposes of equality, but not containment.

My remarks were also premature, because looking back at the referenced
commit, I see what it removed was a sort and de-dup as a preliminary step
in containment comparisons, not as a generic alteration of array contents.
So that was sane enough, though I concur with Heikki's opinion that it
likely failed to be a performance win.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Peter Geoghegan
pg@heroku.com
In reply to: Josh Berkus (#5)
Re: jsonb contains behaviour weirdness

On Fri, Sep 12, 2014 at 11:21 AM, Josh Berkus <josh@agliodbs.com> wrote:

# select '[1,1,2]'::jsonb @> '[1,2,2]'::jsonb;
?column?
----------
t
(1 row)

But, it's not. Jsonb contains takes care only about length of array.

OK, now, that's messed up.

To be clear: I don't think that this example is messed up (in
isolation). I think it's the correct behavior. What I find
objectionable is the inconsistency. I believe that this is Alexander's
concern too. Alexander's first example exhibits broken behavior.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Josh Berkus
josh@agliodbs.com
In reply to: Alexander Korotkov (#1)
Re: jsonb contains behaviour weirdness

On 09/12/2014 11:38 AM, Peter Geoghegan wrote:

To be clear: I don't think that this example is messed up (in
isolation). I think it's the correct behavior. What I find
objectionable is the inconsistency. I believe that this is Alexander's
concern too. Alexander's first example exhibits broken behavior.

Hmmm, oh. Yeah, I see what you mean; PostgreSQL's SQL array behavior is
that @> is true if array A contains all of the elements of array B
regardless of ordering or repetition.

jsonic=# select array[1,2,2] @> array[1,1,2]

;
?column?
----------
t

That's consistent with our docs and past behavior.

However, this better become a FAQ item, because it's not necessarily the
behavior that folks used to JSON but not Postgres will expect.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Alexander Korotkov
aekorotkov@gmail.com
In reply to: Peter Geoghegan (#7)
Re: jsonb contains behaviour weirdness

On Fri, Sep 12, 2014 at 10:38 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Sep 12, 2014 at 11:21 AM, Josh Berkus <josh@agliodbs.com> wrote:

# select '[1,1,2]'::jsonb @> '[1,2,2]'::jsonb;
?column?
----------
t
(1 row)

But, it's not. Jsonb contains takes care only about length of array.

OK, now, that's messed up.

To be clear: I don't think that this example is messed up (in
isolation). I think it's the correct behavior. What I find
objectionable is the inconsistency. I believe that this is Alexander's
concern too. Alexander's first example exhibits broken behavior.

Agree. I just tried to explain how current behaviour could look for user
who sees it for the first time.

------
With best regards,
Alexander Korotkov.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#8)
Re: jsonb contains behaviour weirdness

Josh Berkus <josh@agliodbs.com> writes:

However, this better become a FAQ item, because it's not necessarily the
behavior that folks used to JSON but not Postgres will expect.

No, it's a bug, not a documentation deficiency.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Peter Geoghegan
pg@heroku.com
In reply to: Alexander Korotkov (#1)
1 attachment(s)
Re: jsonb contains behaviour weirdness

On Fri, Sep 12, 2014 at 6:40 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

It's likely that "JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl)" should be
checked only for objects, not arrays. Also, should JsonbDeepContains does
same fast check when it deals with nested objects?

Attached patch implements something similar to what you describe here,
fixing your example.

I haven't added the optimization to JsonbDeepContains(). I think that
if anything, we should remove the optimization entirely, which is what
I've done -- an rhs "is it contained within?" value is hardly ever
going to be an object that has more pairs than the object we're
checking it is contained within. It's almost certainly going to have
far fewer pairs. Apart from only applying to objects, that
optimization just isn't an effective way of eliminating jsonb values
from consideration quickly. I'd rather not bother at all, rather than
having a complicated comment about why the optimization applies to
objects and not arrays.

--
Peter Geoghegan

Attachments:

remove_elems_optimization.patchtext/x-patch; charset=US-ASCII; name=remove_elems_optimization.patchDownload
diff --git a/src/backend/utils/adt/jsonb_op.c b/src/backend/utils/adt/jsonb_op.c
new file mode 100644
index 2d071b2..6fcdbad
*** a/src/backend/utils/adt/jsonb_op.c
--- b/src/backend/utils/adt/jsonb_op.c
*************** jsonb_contains(PG_FUNCTION_ARGS)
*** 117,124 ****
  	JsonbIterator *it1,
  			   *it2;
  
! 	if (JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl) ||
! 		JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
  		PG_RETURN_BOOL(false);
  
  	it1 = JsonbIteratorInit(&val->root);
--- 117,123 ----
  	JsonbIterator *it1,
  			   *it2;
  
! 	if (JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
  		PG_RETURN_BOOL(false);
  
  	it1 = JsonbIteratorInit(&val->root);
*************** jsonb_contained(PG_FUNCTION_ARGS)
*** 137,144 ****
  	JsonbIterator *it1,
  			   *it2;
  
! 	if (JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl) ||
! 		JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
  		PG_RETURN_BOOL(false);
  
  	it1 = JsonbIteratorInit(&val->root);
--- 136,142 ----
  	JsonbIterator *it1,
  			   *it2;
  
! 	if (JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
  		PG_RETURN_BOOL(false);
  
  	it1 = JsonbIteratorInit(&val->root);
#12Josh Berkus
josh@agliodbs.com
In reply to: Alexander Korotkov (#1)
Re: jsonb contains behaviour weirdness

On 09/12/2014 01:33 PM, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

However, this better become a FAQ item, because it's not necessarily the
behavior that folks used to JSON but not Postgres will expect.

No, it's a bug, not a documentation deficiency.

Hmmm? Are you proposing that we should change how ARRAY @> ARRAY works
for non-JSON data?

Or are you proposing that JSONARRAY @> JSONARRAY should work differently
from ARRAY @> ARRAY?

Or are you agreeing with Peter that it's the first case that's a bug,
and the 2nd two tests are correct, and just being unclear?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#12)
Re: jsonb contains behaviour weirdness

Josh Berkus <josh@agliodbs.com> writes:

On 09/12/2014 01:33 PM, Tom Lane wrote:

No, it's a bug, not a documentation deficiency.

Hmmm? Are you proposing that we should change how ARRAY @> ARRAY works
for non-JSON data?

No.

Or are you proposing that JSONARRAY @> JSONARRAY should work differently
from ARRAY @> ARRAY?

And no. It's a bug that jsonb array containment works differently from
regular array containment. We understand the source of the bug, ie a
mistaken optimization. I don't see why there's much need for discussion
about anything except whether removing the optimization altogether
(as Peter proposed) is the best fix, or whether we want to retain
some weaker form of it.

Personally I'd think that we should retain it for objects; Peter's
main argument against that was that the comment would be too complicated,
but that seems a bit silly from here.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#13)
Re: jsonb contains behaviour weirdness

On Mon, Sep 15, 2014 at 11:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Personally I'd think that we should retain it for objects; Peter's
main argument against that was that the comment would be too complicated,
but that seems a bit silly from here.

I just don't see any point to it. My argument against the complexity
of explaining why the optimization is only used with objects is based
on the costs and the benefits. I think the benefits are very close to
nil.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#14)
Re: jsonb contains behaviour weirdness

On Mon, Sep 15, 2014 at 2:18 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Sep 15, 2014 at 11:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Personally I'd think that we should retain it for objects; Peter's
main argument against that was that the comment would be too complicated,
but that seems a bit silly from here.

I just don't see any point to it. My argument against the complexity
of explaining why the optimization is only used with objects is based
on the costs and the benefits. I think the benefits are very close to
nil.

That seems pessimistic to me; I'm with Tom.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#14)
Re: jsonb contains behaviour weirdness

Peter Geoghegan <pg@heroku.com> writes:

On Mon, Sep 15, 2014 at 11:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Personally I'd think that we should retain it for objects; Peter's
main argument against that was that the comment would be too complicated,
but that seems a bit silly from here.

I just don't see any point to it. My argument against the complexity
of explaining why the optimization is only used with objects is based
on the costs and the benefits. I think the benefits are very close to
nil.

It might be that the benefit is very close to nil; that would depend a lot
on workload, so it's hard to be sure. I'd say though that the cost is
also very close to nil, in the sense that we're considering two additional
compare-and-branch instructions in a function that will surely expend
hundreds or thousands of instructions if there's no such short-circuit.

I've certainly been on the side of "that optimization isn't worth its
keep" many times before, but I don't think the case is terribly clear cut
here. Since somebody (possibly you) thought it was worth having to begin
with, I'm inclined to follow that lead.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#16)
Re: jsonb contains behaviour weirdness

On Mon, Sep 15, 2014 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It might be that the benefit is very close to nil; that would depend a lot
on workload, so it's hard to be sure. I'd say though that the cost is
also very close to nil, in the sense that we're considering two additional
compare-and-branch instructions in a function that will surely expend
hundreds or thousands of instructions if there's no such short-circuit.

Fair enough.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Josh Berkus
josh@agliodbs.com
In reply to: Alexander Korotkov (#1)
Re: jsonb contains behaviour weirdness

On 09/15/2014 11:12 AM, Tom Lane wrote:

Or are you proposing that JSONARRAY @> JSONARRAY should work differently
from ARRAY @> ARRAY?

And no. It's a bug that jsonb array containment works differently from
regular array containment. We understand the source of the bug, ie a
mistaken optimization. I don't see why there's much need for discussion
about anything except whether removing the optimization altogether
(as Peter proposed) is the best fix, or whether we want to retain
some weaker form of it.

Right, so I was just saying that after we fix this behavior, the
behavior of JSONARRAY @> JSONARRAY should be commented somewhere because
that comparison may not work the way users who are not long-time
postgres users expect. Heck, I've personally done very little ARRAY @>
ARRAY myself in 12 years of using PostgreSQL arrays; I had to test it to
verify the current behavior.

Not sure exactly where this note should go, mind you.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Peter Geoghegan
pg@heroku.com
In reply to: Peter Geoghegan (#11)
Re: jsonb contains behaviour weirdness

This was never committed...

On Sat, Sep 13, 2014 at 4:31 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Sep 12, 2014 at 6:40 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

It's likely that "JB_ROOT_COUNT(val) < JB_ROOT_COUNT(tmpl)" should be
checked only for objects, not arrays. Also, should JsonbDeepContains does
same fast check when it deals with nested objects?

Attached patch implements something similar to what you describe here,
fixing your example.

I haven't added the optimization to JsonbDeepContains(). I think that
if anything, we should remove the optimization entirely, which is what
I've done -- an rhs "is it contained within?" value is hardly ever
going to be an object that has more pairs than the object we're
checking it is contained within. It's almost certainly going to have
far fewer pairs. Apart from only applying to objects, that
optimization just isn't an effective way of eliminating jsonb values
from consideration quickly. I'd rather not bother at all, rather than
having a complicated comment about why the optimization applies to
objects and not arrays.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#19)
Re: jsonb contains behaviour weirdness

Peter Geoghegan <pg@heroku.com> writes:

This was never committed...

Yeah ... the discussion trailed off and I think we forgot to actually
commit a fix. Will go do so.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers