[PATCH] Fix jsonb comparison for raw scalar pseudo arrays

Started by Chengpeng Yanover 1 year ago11 messageshackers
Jump to latest
#1Chengpeng Yan
chengpeng_yan@Outlook.com

Dear PostgreSQL Hackers,

Problem Description

I encountered an issue with the B-Tree ordering of `jsonb` values. According to the PostgreSQL documentation[1]JSON Types - https://www.postgresql.org/docs/current/datatype-json.html, the ordering should follow this precedence:

`Object > Array > Boolean > Number > String > Null`

However, empty arrays (`[]`) are currently considered smaller than `null`, which violates the documented rules. This occurs due to improper handling of the `rawScalar` flag when comparing arrays in the `compareJsonbContainers()` function in `src/backend/utils/adt/jsonb_util.c`.

Example to Reproduce the Issue

```sql
postgres=# -- Create a test table with a jsonb column
CREATE TABLE jsonb_test (j jsonb PRIMARY KEY);

-- Insert various jsonb values
INSERT INTO jsonb_test VALUES ('null');
INSERT INTO jsonb_test VALUES ('true');
INSERT INTO jsonb_test VALUES ('false');
INSERT INTO jsonb_test VALUES ('0');
INSERT INTO jsonb_test VALUES ('1');
INSERT INTO jsonb_test VALUES ('"string"');
INSERT INTO jsonb_test VALUES ('[]');
INSERT INTO jsonb_test VALUES ('[1, 2, 3]');
INSERT INTO jsonb_test VALUES ('{}');
INSERT INTO jsonb_test VALUES ('{"a": 1}');

-- Query the table to check ordering
SELECT * FROM jsonb_test ORDER BY j;
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
j
-----------
[]
null
"string"
0
1
false
true
[1, 2, 3]
{}
{"a": 1}
(10 rows)
```
The empty array ([]) is incorrectly placed before null.

Analysis

The issue stems from how the rawScalar flag is evaluated in the compareJsonbContainers() function. When comparing arrays, the function does not prioritize the rawScalar flag before comparing the number of elements (nElems), leading to incorrect ordering for arrays treated as “raw scalars.”

Proposed Fix

The proposed fix ensures the rawScalar flag is checked first, and only when both values have the same flag, the number of elements is compared. This guarantees correct ordering of arrays and scalar values. The details are in the attached patch.

Testing

I have added new test cases to validate the B-Tree ordering of various `jsonb` values, including:
1. Scalars (null, true, false, numbers, strings).
2. Arrays (empty, single-element, nested).
3. Objects (empty, single-key, nested).
4. Mixed types (arrays containing objects, scalars, etc.).

The test cases are included in jsonb.sql and the corresponding expected output file.

I have run make check, and all tests pass successfully.

I would appreciate feedback on the proposed solution or suggestions for improvement.

References

[1]: JSON Types - https://www.postgresql.org/docs/current/datatype-json.html

Best regards,
Chengpeng(Jack) Yan

Attachments:

v1-0001-fix-jsonb-compare.patchapplication/octet-stream; name=v1-0001-fix-jsonb-compare.patchDownload+150-5
#2jian he
jian.universality@gmail.com
In reply to: Chengpeng Yan (#1)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

On Mon, Nov 18, 2024 at 10:25 PM Yan Chengpeng
<chengpeng_yan@outlook.com> wrote:

I encountered an issue with the B-Tree ordering of `jsonb` values. According to the PostgreSQL documentation[1], the ordering should follow this precedence:

`Object > Array > Boolean > Number > String > Null`

However, empty arrays (`[]`) are currently considered smaller than `null`, which violates the documented rules. This occurs due to improper handling of the `rawScalar` flag when comparing arrays in the `compareJsonbContainers()` function in `src/backend/utils/adt/jsonb_util.c`.

```

The empty array ([]) is incorrectly placed before null.

Analysis

The issue stems from how the rawScalar flag is evaluated in the compareJsonbContainers() function.
When comparing arrays, the function does not prioritize the rawScalar flag before comparing the number of elements (nElems), leading to incorrect ordering for arrays treated as “raw scalars.”

Proposed Fix

The proposed fix ensures the rawScalar flag is checked first, and only when both values have the same flag, the number of elements is compared.
This guarantees correct ordering of arrays and scalar values. The details are in the attached patch.

per https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
Object > Array > Boolean > Number > String > Null

JsonbValue->val.array.rawScalar is false (that is the real array)
should be larger than scalar (Boolean, Number, String, Null).

while sorting, rawScalar flag should have more priority than comparing
the number of elements in an array.
if two jsonb, JsonbValue->val.array.rawScalar values are different,
then we don't need to compare val.array.nElems.

so I think you are right.
but I am confused with your comments change.

src5=# select 'a' < 'A' collate "en_US.utf8";
?column?
----------
t
(1 row)

src5=# select 'a' < 'A' collate "C";
?column?
----------
f
(1 row)

docs says:
""Primitive JSON values are compared using the same comparison rules
as for the underlying PostgreSQL data type.
Strings are compared using the default database collation.
""
To make the regress tests stable, you may need to change the regress
test value ("a", "A")

the only corner case is the empty jsonb array [].
so sql test like:
select jsonb '[]' < jsonb 'null';

should enough?

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Chengpeng Yan (#1)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

On 2024-11-18 Mo 9:25 AM, Yan Chengpeng wrote:

Dear PostgreSQL Hackers,

*Problem Description*

I encountered an issue with the B-Tree ordering of `jsonb` values.
According to the PostgreSQL documentation[1], the ordering should
follow this precedence:

`Object > Array > Boolean > Number > String > Null`

However, empty arrays (`[]`) are currently considered smaller than
`null`, which violates the documented rules. This occurs due to
improper handling of the `rawScalar` flag when comparing arrays in the
`compareJsonbContainers()` function in
`src/backend/utils/adt/jsonb_util.c`.

I agree that this is a (10 year old) bug:

-                        if (va.val.array.nElems != vb.val.array.nElems)
+                        else if (va.val.array.nElems != 
vb.val.array.nElems)

But I don't think we can fix it, because there could well be indexes
that would no longer be valid if we change the sort order. Given that, I
think the best we can do is adjust the documentation to mention the anomaly.

So the actual sort order as implemented is, AIUI,

Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array

which is ugly, but fortunately not many apps rely on jsonb sort order.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#3)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

On 2024-12-03 Tu 9:11 AM, Andrew Dunstan wrote:

On 2024-11-18 Mo 9:25 AM, Yan Chengpeng wrote:

Dear PostgreSQL Hackers,

*Problem Description*

I encountered an issue with the B-Tree ordering of `jsonb` values.
According to the PostgreSQL documentation[1], the ordering should
follow this precedence:

`Object > Array > Boolean > Number > String > Null`

However, empty arrays (`[]`) are currently considered smaller than
`null`, which violates the documented rules. This occurs due to
improper handling of the `rawScalar` flag when comparing arrays in
the `compareJsonbContainers()` function in
`src/backend/utils/adt/jsonb_util.c`.

I agree that this is a (10 year old) bug:

-                        if (va.val.array.nElems != vb.val.array.nElems)
+                        else if (va.val.array.nElems != 
vb.val.array.nElems)

But I don't think we can fix it, because there could well be indexes
that would no longer be valid if we change the sort order. Given that,
I think the best we can do is adjust the documentation to mention the
anomaly.

So the actual sort order as implemented is, AIUI,

Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array

which is ugly, but fortunately not many apps rely on jsonb sort order.

Nobody else has commented, so I propose to apply this patch documenting
the anomaly.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Attachments:

jsonb-sort-doc.patchtext/x-patch; charset=UTF-8; name=jsonb-sort-doc.patchDownload+1-0
#5jian he
jian.universality@gmail.com
In reply to: Andrew Dunstan (#4)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

On Sun, Dec 8, 2024 at 10:58 PM Andrew Dunstan <andrew@dunslane.net> wrote:

So the actual sort order as implemented is, AIUI,

Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array

which is ugly, but fortunately not many apps rely on jsonb sort order.

Nobody else has commented, so I propose to apply this patch documenting the anomaly.

while at it. we can fix the appearance of jsonb null.

since
select jsonb 'Null';
select jsonb 'NULL';
will fail.

so maybe change
<replaceable>Null</replaceable> in <synopsis> section and
<replaceable>NULL</replaceable>
to
<replaceable>null</replaceable>

#6Chengpeng Yan
chengpeng_yan@Outlook.com
In reply to: jian he (#5)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

If many people are already using this ‘wrong’ behavior. I agree to change the doc. I also think using ‘null’ may be a better choice. Thanks for your comments.

From: jian he <jian.universality@gmail.com>
Date: Monday, December 9, 2024 at 16:56
To: Andrew Dunstan <andrew@dunslane.net>
Cc: Yan Chengpeng <chengpeng_yan@outlook.com>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
On Sun, Dec 8, 2024 at 10:58 PM Andrew Dunstan <andrew@dunslane.net> wrote:

So the actual sort order as implemented is, AIUI,

Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array

which is ugly, but fortunately not many apps rely on jsonb sort order.

Nobody else has commented, so I propose to apply this patch documenting the anomaly.

while at it. we can fix the appearance of jsonb null.

since
select jsonb 'Null';
select jsonb 'NULL';
will fail.

so maybe change
<replaceable>Null</replaceable> in <synopsis> section and
<replaceable>NULL</replaceable>
to
<replaceable>null</replaceable>

Attachments:

v1-0001-fix-jsonb-compare.patchapplication/octet-stream; name=v1-0001-fix-jsonb-compare.patchDownload+150-5
#7Chengpeng Yan
chengpeng_yan@Outlook.com
In reply to: Chengpeng Yan (#6)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

Sorry, I uploaded the wrong file. I uploaded a new patch with the modified document. Please take a review. Thanks!

From: Yan Chengpeng <chengpeng_yan@outlook.com>
Date: Monday, December 9, 2024 at 21:22
To: jian he <jian.universality@gmail.com>, Andrew Dunstan <andrew@dunslane.net>
Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
If many people are already using this ‘wrong’ behavior. I agree to change the doc. I also think using ‘null’ may be a better choice. Thanks for your comments.

From: jian he <jian.universality@gmail.com>
Date: Monday, December 9, 2024 at 16:56
To: Andrew Dunstan <andrew@dunslane.net>
Cc: Yan Chengpeng <chengpeng_yan@outlook.com>, pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
On Sun, Dec 8, 2024 at 10:58 PM Andrew Dunstan <andrew@dunslane.net> wrote:

So the actual sort order as implemented is, AIUI,

Object > Non-Empty-Array > Boolean > Number > String > Null > Empty-Array

which is ugly, but fortunately not many apps rely on jsonb sort order.

Nobody else has commented, so I propose to apply this patch documenting the anomaly.

while at it. we can fix the appearance of jsonb null.

since
select jsonb 'Null';
select jsonb 'NULL';
will fail.

so maybe change
<replaceable>Null</replaceable> in <synopsis> section and
<replaceable>NULL</replaceable>
to
<replaceable>null</replaceable>

Attachments:

json-doc-update.patchapplication/octet-stream; name=json-doc-update.patchDownload+1-0
#8jian he
jian.universality@gmail.com
In reply to: Chengpeng Yan (#7)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

On Mon, Dec 9, 2024 at 9:27 PM Yan Chengpeng <chengpeng_yan@outlook.com> wrote:

Sorry, I uploaded the wrong file. I uploaded a new patch with the modified document. Please take a review. Thanks!

sorry. maybe i didn't mention it explicitly.
i mean something like:

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 54648c459c..d9b24e413e 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -584,12 +584,13 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE
jdoc @@ '$.tags[*] == "qui"';
     The <literal>btree</literal> ordering for <type>jsonb</type>
datums is seldom
     of great interest, but for completeness it is:
 <synopsis>
-<replaceable>Object</replaceable> > <replaceable>Array</replaceable>

<replaceable>Boolean</replaceable> >

<replaceable>Number</replaceable> > <replaceable>String</replaceable>

<replaceable>Null</replaceable>

+<replaceable>Object</replaceable> > <replaceable>Array</replaceable>

<replaceable>Boolean</replaceable> >

<replaceable>Number</replaceable> > <replaceable>String</replaceable>

<replaceable>null</replaceable>

<replaceable>Object with n pairs</replaceable> > <replaceable>object
with n - 1 pairs</replaceable>

<replaceable>Array with n elements</replaceable> > <replaceable>array
with n - 1 elements</replaceable>
</synopsis>
+ with the exception that (for historical reasons) an empty array
sorts less than <replaceable>null</replaceable>.
Objects with equal numbers of pairs are compared in the order:
<synopsis>
<replaceable>key-1</replaceable>, <replaceable>value-1</replaceable>,
<replaceable>key-2</replaceable> ...

Attachments:

v2_json_doc_update.difftext/x-patch; charset=US-ASCII; name=v2_json_doc_update.diffDownload+2-1
#9Andrew Dunstan
andrew@dunslane.net
In reply to: jian he (#8)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

On 2024-12-09 Mo 11:16 AM, jian he wrote:

On Mon, Dec 9, 2024 at 9:27 PM Yan Chengpeng <chengpeng_yan@outlook.com> wrote:

Sorry, I uploaded the wrong file. I uploaded a new patch with the modified document. Please take a review. Thanks!

sorry. maybe i didn't mention it explicitly.
i mean something like:

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 54648c459c..d9b24e413e 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -584,12 +584,13 @@ SELECT jdoc->'guid', jdoc->'name' FROM api WHERE
jdoc @@ '$.tags[*] == "qui"';
The <literal>btree</literal> ordering for <type>jsonb</type>
datums is seldom
of great interest, but for completeness it is:
<synopsis>
-<replaceable>Object</replaceable> > <replaceable>Array</replaceable>

<replaceable>Boolean</replaceable> >

<replaceable>Number</replaceable> > <replaceable>String</replaceable>

<replaceable>Null</replaceable>

+<replaceable>Object</replaceable> > <replaceable>Array</replaceable>

<replaceable>Boolean</replaceable> >

<replaceable>Number</replaceable> > <replaceable>String</replaceable>

<replaceable>null</replaceable>

<replaceable>Object with n pairs</replaceable> > <replaceable>object
with n - 1 pairs</replaceable>

<replaceable>Array with n elements</replaceable> > <replaceable>array
with n - 1 elements</replaceable>
</synopsis>
+ with the exception that (for historical reasons) an empty array
sorts less than <replaceable>null</replaceable>.
Objects with equal numbers of pairs are compared in the order:
<synopsis>
<replaceable>key-1</replaceable>, <replaceable>value-1</replaceable>,
<replaceable>key-2</replaceable> ...

Pushed something along these lines. In master I also added a code
comment so nobody might be tempted to "fix" the anomaly.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#9)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

Andrew Dunstan <andrew@dunslane.net> writes:

Pushed something along these lines. In master I also added a code
comment so nobody might be tempted to "fix" the anomaly.

There is still a commitfest entry [1]https://commitfest.postgresql.org/51/5394/ pointing at this thread.
Should it be closed as committed, or is there more to do?

regards, tom lane

[1]: https://commitfest.postgresql.org/51/5394/

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#10)
Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

On 2025-01-18 Sa 1:17 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Pushed something along these lines. In master I also added a code
comment so nobody might be tempted to "fix" the anomaly.

There is still a commitfest entry [1] pointing at this thread.
Should it be closed as committed, or is there more to do?

regards, tom lane

[1] https://commitfest.postgresql.org/51/5394/

Nope, that was it. I have marked the item as committed, thanks for noticing.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com