BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

Started by PG Bug reporting formover 1 year ago11 messages
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18598
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17beta3
Operating system: Ubuntu 22.04
Description:

The following query:
SELECT JSON_OBJECTAGG(i: (i)::text FORMAT JSON WITH UNIQUE)
FROM generate_series(1, 100000) i;

triggers an asan-detected error:
==973230==ERROR: AddressSanitizer: heap-use-after-free on address
0x7fde473f4428 at pc 0x558af80f20a6 bp 0x7ffe6b8e2df0 sp 0x7ffe6b8e2598
READ of size 7 at 0x7fde473f4428 thread T0
#0 0x558af80f20a5 in __interceptor_strncmp.part.0
(.../usr/local/pgsql/bin/postgres+0x32d40a5)
#1 0x558af9ed5276 in json_unique_hash_match
.../src/backend/utils/adt/json.c:922
#2 0x558afa49c6ce in hash_search_with_hash_value
.../src/backend/utils/hash/dynahash.c:1021
#3 0x558afa49bfbc in hash_search
.../src/backend/utils/hash/dynahash.c:960
#4 0x558af9ed58b4 in json_unique_check_key
.../src/backend/utils/adt/json.c:967
#5 0x558af9ed6a71 in json_object_agg_transfn_worker
.../src/backend/utils/adt/json.c:1116
#6 0x558af9ed6fc5 in json_object_agg_unique_transfn
.../src/backend/utils/adt/json.c:1163
#7 0x558af8e3dcbe in ExecAggPlainTransByVal
.../src/backend/executor/execExprInterp.c:5382
...
0x7fde473f4428 is located 506920 bytes inside of 524352-byte region
[0x7fde47378800,0x7fde473f8840)
freed by thread T0 here:
#0 0x558af8114038 in realloc
(.../usr/local/pgsql/bin/postgres+0x32f6038)
#1 0x558afa52c970 in AllocSetRealloc
.../src/backend/utils/mmgr/aset.c:1226
#2 0x558afa56c0e9 in repalloc .../src/backend/utils/mmgr/mcxt.c:1566
#3 0x558afa66c94a in enlargeStringInfo .../src/common/stringinfo.c:349
#4 0x558afa66be4a in appendBinaryStringInfo
.../src/common/stringinfo.c:238
#5 0x558afa66b612 in appendStringInfoString
.../src/common/stringinfo.c:184
#6 0x558af9ed66b9 in json_object_agg_transfn_worker
.../src/backend/utils/adt/json.c:1102
#7 0x558af9ed6fc5 in json_object_agg_unique_transfn
.../src/backend/utils/adt/json.c:1163
#8 0x558af8e3dcbe in ExecAggPlainTransByVal
.../src/backend/executor/execExprInterp.c:5382
...
previously allocated by thread T0 here:
#0 0x558af8114038 in realloc
(.../usr/local/pgsql/bin/postgres+0x32f6038)
#1 0x558afa52c970 in AllocSetRealloc
.../src/backend/utils/mmgr/aset.c:1226
#2 0x558afa56c0e9 in repalloc .../src/backend/utils/mmgr/mcxt.c:1566
#3 0x558afa66c94a in enlargeStringInfo .../src/common/stringinfo.c:349
#4 0x558afa66be4a in appendBinaryStringInfo
.../src/common/stringinfo.c:238
#5 0x558afa66b612 in appendStringInfoString
.../src/common/stringinfo.c:184
#6 0x558af9ed0559 in datum_to_json_internal
.../src/backend/utils/adt/json.c:279
#7 0x558af9ed6ee3 in json_object_agg_transfn_worker
.../src/backend/utils/adt/json.c:1132
#8 0x558af9ed6fc5 in json_object_agg_unique_transfn
.../src/backend/utils/adt/json.c:1163
#9 0x558af8e3dcbe in ExecAggPlainTransByVal
.../src/backend/executor/execExprInterp.c:5382
...

Reproduced starting from 7081ac46a.

#2Tomas Vondra
tomas@vondra.me
In reply to: PG Bug reporting form (#1)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On 9/1/24 21:00, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18598
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17beta3
Operating system: Ubuntu 22.04
Description:

The following query:
SELECT JSON_OBJECTAGG(i: (i)::text FORMAT JSON WITH UNIQUE)
FROM generate_series(1, 100000) i;

triggers an asan-detected error:
==973230==ERROR: AddressSanitizer: heap-use-after-free on address
0x7fde473f4428 at pc 0x558af80f20a6 bp 0x7ffe6b8e2df0 sp 0x7ffe6b8e2598
READ of size 7 at 0x7fde473f4428 thread T0
#0 0x558af80f20a5 in __interceptor_strncmp.part.0
(.../usr/local/pgsql/bin/postgres+0x32d40a5)
#1 0x558af9ed5276 in json_unique_hash_match
...

Reproduced starting from 7081ac46a.

FWIW I can reproduce this using valgrind, with the same stacks reported.

This feels very much like a classical memory context bug - pointing to
memory in a short-lived memory context. I see datum_to_json_internal()
allocates the result in ExprContext, and that's bound to be reset pretty
often. But I'm not too familiar with the JSON aggregate stuff enough to
pinpoint what it does wrong.

regards

--
Tomas Vondra

#3Junwang Zhao
zhjwpku@gmail.com
In reply to: Tomas Vondra (#2)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On Wed, Sep 4, 2024 at 3:21 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/1/24 21:00, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 18598
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 17beta3
Operating system: Ubuntu 22.04
Description:

The following query:
SELECT JSON_OBJECTAGG(i: (i)::text FORMAT JSON WITH UNIQUE)
FROM generate_series(1, 100000) i;

triggers an asan-detected error:
==973230==ERROR: AddressSanitizer: heap-use-after-free on address
0x7fde473f4428 at pc 0x558af80f20a6 bp 0x7ffe6b8e2df0 sp 0x7ffe6b8e2598
READ of size 7 at 0x7fde473f4428 thread T0
#0 0x558af80f20a5 in __interceptor_strncmp.part.0
(.../usr/local/pgsql/bin/postgres+0x32d40a5)
#1 0x558af9ed5276 in json_unique_hash_match
...

Reproduced starting from 7081ac46a.

FWIW I can reproduce this using valgrind, with the same stacks reported.

This feels very much like a classical memory context bug - pointing to
memory in a short-lived memory context. I see datum_to_json_internal()
allocates the result in ExprContext, and that's bound to be reset pretty
often. But I'm not too familiar with the JSON aggregate stuff enough to
pinpoint what it does wrong.

regards

--
Tomas Vondra

ISTM that the JsonUniqueHashEntry.key point to an address later got
invalidated by enlargeStringInfo, we can resolve this by explicitly
pstrdup the key in the same MemoryContext of JsonAggState, like:

@@ -1009,6 +1009,7 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
Datum arg;
bool skip;
int key_offset;
+ const char *key;

if (!AggCheckCallContext(fcinfo, &aggcontext))
{
@@ -1111,7 +1112,9 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,

        if (unique_keys)
        {
-               const char *key = &out->data[key_offset];
+               oldcontext = MemoryContextSwitchTo(aggcontext);
+               key = pstrdup(&out->data[key_offset]);
+               MemoryContextSwitchTo(oldcontext);

--
Regards
Junwang Zhao

#4Tomas Vondra
tomas@vondra.me
In reply to: Junwang Zhao (#3)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On 9/4/24 11:55, Junwang Zhao wrote:

...

ISTM that the JsonUniqueHashEntry.key point to an address later got
invalidated by enlargeStringInfo, we can resolve this by explicitly
pstrdup the key in the same MemoryContext of JsonAggState, like:

Yes, this fixes the issue (at least per valgrind).

@@ -1009,6 +1009,7 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
Datum arg;
bool skip;
int key_offset;
+ const char *key;

if (!AggCheckCallContext(fcinfo, &aggcontext))
{
@@ -1111,7 +1112,9 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,

if (unique_keys)
{
-               const char *key = &out->data[key_offset];
+               oldcontext = MemoryContextSwitchTo(aggcontext);
+               key = pstrdup(&out->data[key_offset]);
+               MemoryContextSwitchTo(oldcontext);

I think you don't need the new key declaration (there's already a local
one), and you can simply do just

const char *key = MemoryContextStrdup(aggcontext,
&out->data[key_offset]);

I wonder if the other json_unique_check_key() call might have a similar
issue. I've not succeeded in constructing a broken query, but perhaps
you could give it a try too?

Thanks!

--
Tomas Vondra

#5Junwang Zhao
zhjwpku@gmail.com
In reply to: Tomas Vondra (#4)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On Wed, Sep 4, 2024 at 7:54 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/4/24 11:55, Junwang Zhao wrote:

...

ISTM that the JsonUniqueHashEntry.key point to an address later got
invalidated by enlargeStringInfo, we can resolve this by explicitly
pstrdup the key in the same MemoryContext of JsonAggState, like:

Yes, this fixes the issue (at least per valgrind).

@@ -1009,6 +1009,7 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
Datum arg;
bool skip;
int key_offset;
+ const char *key;

if (!AggCheckCallContext(fcinfo, &aggcontext))
{
@@ -1111,7 +1112,9 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,

if (unique_keys)
{
-               const char *key = &out->data[key_offset];
+               oldcontext = MemoryContextSwitchTo(aggcontext);
+               key = pstrdup(&out->data[key_offset]);
+               MemoryContextSwitchTo(oldcontext);

I think you don't need the new key declaration (there's already a local
one), and you can simply do just

const char *key = MemoryContextStrdup(aggcontext,
&out->data[key_offset]);

Sure, I will file a patch later.

I wonder if the other json_unique_check_key() call might have a similar
issue. I've not succeeded in constructing a broken query, but perhaps
you could give it a try too?

Sure, I will give it a try, thanks for the comment.

Thanks!

--
Tomas Vondra

--
Regards
Junwang Zhao

#6Junwang Zhao
zhjwpku@gmail.com
In reply to: Tomas Vondra (#4)
1 attachment(s)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

CC'd hackers list.

On Wed, Sep 4, 2024 at 7:54 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/4/24 11:55, Junwang Zhao wrote:

...

ISTM that the JsonUniqueHashEntry.key point to an address later got
invalidated by enlargeStringInfo, we can resolve this by explicitly
pstrdup the key in the same MemoryContext of JsonAggState, like:

Yes, this fixes the issue (at least per valgrind).

@@ -1009,6 +1009,7 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
Datum arg;
bool skip;
int key_offset;
+ const char *key;

if (!AggCheckCallContext(fcinfo, &aggcontext))
{
@@ -1111,7 +1112,9 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,

if (unique_keys)
{
-               const char *key = &out->data[key_offset];
+               oldcontext = MemoryContextSwitchTo(aggcontext);
+               key = pstrdup(&out->data[key_offset]);
+               MemoryContextSwitchTo(oldcontext);

I think you don't need the new key declaration (there's already a local
one), and you can simply do just

const char *key = MemoryContextStrdup(aggcontext,
&out->data[key_offset]);

I wonder if the other json_unique_check_key() call might have a similar
issue. I've not succeeded in constructing a broken query, but perhaps
you could give it a try too?

I found two other places called json_unique_check_key.

One is *json_build_object_worker*, and the usage is the same as
*json_object_agg_transfn_worker*, I fix that the same way, PSA

The following sql should trigger the problem, I haven't tried asan
but traced the *repalloc* in gdb, I will try this later when I set up my
asan building.

SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2:
repeat('a', 100) WITH UNIQUE);

The other place is json_unique_object_field_start, it is set
as a callback of JsonSemAction, and in the parse state machine,
the fname used by the callback has been correctly handled.
see [1]https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L1069-L1087 & [2]https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L785-L823.

[1]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L1069-L1087
[2]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L785-L823

Thanks!

--
Tomas Vondra

--
Regards
Junwang Zhao

Attachments:

v1-0001-fix-use-after-free-bug.patchapplication/octet-stream; name=v1-0001-fix-use-after-free-bug.patchDownload
From cd0f04ed0e2d9b5d689c6843752c2b2a0c2fd437 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <zhjwpku@gmail.com>
Date: Thu, 5 Sep 2024 03:32:05 +0000
Subject: [PATCH v1] fix use after free bug

json_unique_check_key stores key pointing to address
can be invalidated by enlargeStringInfo, use strdup
to resolve this problem.

Signed-off-by: Junwang Zhao <zhjwpku@gmail.com>
---
 src/backend/utils/adt/json.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 4eeeeaf0a6..10fe5b9950 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1111,7 +1111,7 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
 
 	if (unique_keys)
 	{
-		const char *key = &out->data[key_offset];
+		const char *key = MemoryContextStrdup(aggcontext, &out->data[key_offset]);
 
 		if (!json_unique_check_key(&state->unique_check.check, key, 0))
 			ereport(ERROR,
@@ -1275,7 +1275,8 @@ json_build_object_worker(int nargs, const Datum *args, const bool *nulls, const
 		if (unique_keys)
 		{
 			/* check key uniqueness after key appending */
-			const char *key = &out->data[key_offset];
+			const char *key = MemoryContextStrdup(unique_check.mcxt,
+												  &out->data[key_offset]);
 
 			if (!json_unique_check_key(&unique_check.check, key, 0))
 				ereport(ERROR,
-- 
2.39.2

#7Junwang Zhao
zhjwpku@gmail.com
In reply to: Tomas Vondra (#4)
1 attachment(s)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On Wed, Sep 4, 2024 at 7:54 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/4/24 11:55, Junwang Zhao wrote:

...

ISTM that the JsonUniqueHashEntry.key point to an address later got
invalidated by enlargeStringInfo, we can resolve this by explicitly
pstrdup the key in the same MemoryContext of JsonAggState, like:

Yes, this fixes the issue (at least per valgrind).

@@ -1009,6 +1009,7 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
Datum arg;
bool skip;
int key_offset;
+ const char *key;

if (!AggCheckCallContext(fcinfo, &aggcontext))
{
@@ -1111,7 +1112,9 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,

if (unique_keys)
{
-               const char *key = &out->data[key_offset];
+               oldcontext = MemoryContextSwitchTo(aggcontext);
+               key = pstrdup(&out->data[key_offset]);
+               MemoryContextSwitchTo(oldcontext);

I think you don't need the new key declaration (there's already a local
one), and you can simply do just

const char *key = MemoryContextStrdup(aggcontext,
&out->data[key_offset]);

I wonder if the other json_unique_check_key() call might have a similar
issue. I've not succeeded in constructing a broken query, but perhaps
you could give it a try too?

I found two other places called json_unique_check_key.

One is *json_build_object_worker*, and the usage is the same as
*json_object_agg_transfn_worker*, I fix that the same way, PSA

The following sql should trigger the problem, I haven't tried asan
but traced the *repalloc* in gdb, I will try this later when I set up my
asan building.

SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2:
repeat('a', 100) WITH UNIQUE);

The other place is json_unique_object_field_start, it is set
as a callback of JsonSemAction, and in the parse state machine,
the fname used by the callback has been correctly handled.
see [1]https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L1069-L1087 & [2]https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L785-L823.

[1]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L1069-L1087
[2]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L785-L823

Thanks!

--
Tomas Vondra

--
Regards
Junwang Zhao

Attachments:

v1-0001-fix-use-after-free-bug.patchapplication/octet-stream; name=v1-0001-fix-use-after-free-bug.patchDownload
From cd0f04ed0e2d9b5d689c6843752c2b2a0c2fd437 Mon Sep 17 00:00:00 2001
From: Junwang Zhao <zhjwpku@gmail.com>
Date: Thu, 5 Sep 2024 03:32:05 +0000
Subject: [PATCH v1] fix use after free bug

json_unique_check_key stores key pointing to address
can be invalidated by enlargeStringInfo, use strdup
to resolve this problem.

Signed-off-by: Junwang Zhao <zhjwpku@gmail.com>
---
 src/backend/utils/adt/json.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 4eeeeaf0a6..10fe5b9950 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1111,7 +1111,7 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
 
 	if (unique_keys)
 	{
-		const char *key = &out->data[key_offset];
+		const char *key = MemoryContextStrdup(aggcontext, &out->data[key_offset]);
 
 		if (!json_unique_check_key(&state->unique_check.check, key, 0))
 			ereport(ERROR,
@@ -1275,7 +1275,8 @@ json_build_object_worker(int nargs, const Datum *args, const bool *nulls, const
 		if (unique_keys)
 		{
 			/* check key uniqueness after key appending */
-			const char *key = &out->data[key_offset];
+			const char *key = MemoryContextStrdup(unique_check.mcxt,
+												  &out->data[key_offset]);
 
 			if (!json_unique_check_key(&unique_check.check, key, 0))
 				ereport(ERROR,
-- 
2.39.2

#8Tomas Vondra
tomas@vondra.me
In reply to: Junwang Zhao (#6)
1 attachment(s)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On 9/5/24 06:06, Junwang Zhao wrote:

...

I found two other places called json_unique_check_key.

One is *json_build_object_worker*, and the usage is the same as
*json_object_agg_transfn_worker*, I fix that the same way, PSA

The following sql should trigger the problem, I haven't tried asan
but traced the *repalloc* in gdb, I will try this later when I set up my
asan building.

SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2:
repeat('a', 100) WITH UNIQUE);

The other place is json_unique_object_field_start, it is set
as a callback of JsonSemAction, and in the parse state machine,
the fname used by the callback has been correctly handled.
see [1] & [2].

[1]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L1069-L1087
[2]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L785-L823

Thanks for the fix and the idea for a function triggering issue in the
other place. I verified that it indeed triggers a valgrind report, and
the fix makes it go away.

Attached is a patch with proper commit message, explaining the issue,
and various other details. Can you please proofread it and check that I
got all the details right?

The patch also adds two regression tests, triggering the issue (without
the rest of the patch applied). It took me a while to realize the
existing tests pass simply because the objects are tiny and don't
require enlarging the buffer, and thus the repalloc.

The only question that bothers me a little bit is the possibility of a
memory leak - could it happen that we keep the copied key much longer
than needed? Or does aggcontext have with the right life span? AFAICS
that's where we allocate the aggregate state, so it seems fine.

Also, how far back do we need to backpatch this? ITSM PG15 does not have
this issue, and it was introduced with the SQL/JSON stuff in PG16. Is
that correct?

regards

--
Tomas Vondra

Attachments:

0001-Fix-unique-key-checks-in-JSON-object-constructors.patchtext/x-patch; charset=UTF-8; name=0001-Fix-unique-key-checks-in-JSON-object-constructors.patchDownload
From 45837d645a276912b400b5d99d80ac290e0ba8ad Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Tue, 10 Sep 2024 17:15:10 +0200
Subject: [PATCH] Fix unique key checks in JSON object constructors

When building a JSON object, the code builds a hash table of keys, to
allow checking if the keys are unique. The uniqueness check and adding
the new key happens in json_unique_check_key(), but this assumes the
pointer to the key remains valid.

Unfortunately, two places passed pointers to keys in a StringInfo
buffer, while also appending additional data (more key/value pairs) to
the buffer. With enough data the buffer would be resized using enlarge
enlargeStringInfo(), which does repalloc, invalidating the earlier key
pointers.

Due to this the uniqueness check may fail with both false negatives and
false positives, producing JSON objects with duplicate keys or failing
to produce a perfectly valid JSON object.

This affects four functions, introduced in PG16 with the new SQL/JSON:

- json_object_agg_unique / jsonb_object_agg_unique
- json_object / jsonb_objectagg

The existing regression tests did not detect this issue, simply because
the initial buffer size is 1024 and the objects were small enough not to
require the repalloc.

Reported by Alexander Lakhin, based on AddressSanitizer failure.
Valgrind notices this too, but both require the JSON object large enough
to require the repalloc.

Fixed by properly copying the key into the aggregate memory context, and
adding regression tests with enough data to repalloc the buffer.

Investigation and initial fix by Junwang Zhao, with various improvements
and tests by me.

Backpatch to 16, where these functions were introduced.

Reported-by: Alexander Lakhin
Author: Junwang Zhao, Tomas Vondra
Discussion: https://postgr.es/m/18598-3279ed972a2347c7@postgresql.org
Discussion: https://postgr.es/m/CAEG8a3JjH0ReJF2_O7-8LuEbO69BxPhYeXs95_x7+H9AMWF1gw@mail.gmail.com
---
 src/backend/utils/adt/json.c          | 6 ++++--
 src/test/regress/expected/json.out    | 3 +++
 src/test/regress/expected/sqljson.out | 5 +++++
 src/test/regress/sql/json.sql         | 2 ++
 src/test/regress/sql/sqljson.sql      | 5 +++++
 5 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 4eeeeaf0a60..3819afdc433 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1111,7 +1111,8 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
 
 	if (unique_keys)
 	{
-		const char *key = &out->data[key_offset];
+		const char *key = MemoryContextStrdup(aggcontext,
+											  &out->data[key_offset]);
 
 		if (!json_unique_check_key(&state->unique_check.check, key, 0))
 			ereport(ERROR,
@@ -1275,7 +1276,8 @@ json_build_object_worker(int nargs, const Datum *args, const bool *nulls, const
 		if (unique_keys)
 		{
 			/* check key uniqueness after key appending */
-			const char *key = &out->data[key_offset];
+			const char *key = MemoryContextStrdup(unique_check.mcxt,
+												  &out->data[key_offset]);
 
 			if (!json_unique_check_key(&unique_check.check, key, 0))
 				ereport(ERROR,
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index 7df11c2f385..96c40911cb9 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -2330,6 +2330,9 @@ select json_object('{a,b,"","d e f"}','{1,2,3,"a b c"}');
  {"a" : "1", "b" : "2", "" : "3", "d e f" : "a b c"}
 (1 row)
 
+-- json_object_agg_unique requires unique keys
+select json_object_agg_unique(mod(i,100), i) from generate_series(0, 199) i;
+ERROR:  duplicate JSON object key value: "0"
 -- json_to_record and json_to_recordset
 select * from json_to_record('{"a":1,"b":"foo","c":"bar"}')
     as x(a int, b text, d text);
diff --git a/src/test/regress/expected/sqljson.out b/src/test/regress/expected/sqljson.out
index 4f91e2117ef..9adfbe15f65 100644
--- a/src/test/regress/expected/sqljson.out
+++ b/src/test/regress/expected/sqljson.out
@@ -537,6 +537,8 @@ SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2 ABSENT ON NULL);
  {"a" : "1", "c" : 2}
 (1 row)
 
+SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2: repeat('a', 100) WITH UNIQUE);
+ERROR:  duplicate JSON object key value: "2"
 SELECT JSON_OBJECT(1: 1, '1': NULL WITH UNIQUE);
 ERROR:  duplicate JSON object key value: "1"
 SELECT JSON_OBJECT(1: 1, '1': NULL ABSENT ON NULL WITH UNIQUE);
@@ -921,6 +923,9 @@ FROM (VALUES (1, 1), (0, NULL),(4, null), (5, null),(6, null),(2, 2)) foo(k, v);
  {"1": 1, "2": 2}
 (1 row)
 
+SELECT JSON_OBJECTAGG(mod(i,100): (i)::text FORMAT JSON WITH UNIQUE)
+FROM generate_series(0, 199) i;
+ERROR:  duplicate JSON object key value: "0"
 -- Test JSON_OBJECT deparsing
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT JSON_OBJECT('foo' : '1' FORMAT JSON, 'bar' : 'baz' RETURNING json);
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index 5c886cd6b33..8251f4f4006 100644
--- a/src/test/regress/sql/json.sql
+++ b/src/test/regress/sql/json.sql
@@ -755,6 +755,8 @@ select json_object('{a,b,NULL,"d e f"}','{1,2,3,"a b c"}');
 
 select json_object('{a,b,"","d e f"}','{1,2,3,"a b c"}');
 
+-- json_object_agg_unique requires unique keys
+select json_object_agg_unique(mod(i,100), i) from generate_series(0, 199) i;
 
 -- json_to_record and json_to_recordset
 
diff --git a/src/test/regress/sql/sqljson.sql b/src/test/regress/sql/sqljson.sql
index bb2487e8649..42dbec26d6f 100644
--- a/src/test/regress/sql/sqljson.sql
+++ b/src/test/regress/sql/sqljson.sql
@@ -138,6 +138,8 @@ SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2);
 SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2 NULL ON NULL);
 SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2 ABSENT ON NULL);
 
+SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2: repeat('a', 100) WITH UNIQUE);
+
 SELECT JSON_OBJECT(1: 1, '1': NULL WITH UNIQUE);
 SELECT JSON_OBJECT(1: 1, '1': NULL ABSENT ON NULL WITH UNIQUE);
 SELECT JSON_OBJECT(1: 1, '1': NULL NULL ON NULL WITH UNIQUE RETURNING jsonb);
@@ -283,6 +285,9 @@ FROM (VALUES (1, 1), (1, NULL), (2, 2)) foo(k, v);
 SELECT JSON_OBJECTAGG(k: v ABSENT ON NULL WITH UNIQUE KEYS RETURNING jsonb)
 FROM (VALUES (1, 1), (0, NULL),(4, null), (5, null),(6, null),(2, 2)) foo(k, v);
 
+SELECT JSON_OBJECTAGG(mod(i,100): (i)::text FORMAT JSON WITH UNIQUE)
+FROM generate_series(0, 199) i;
+
 -- Test JSON_OBJECT deparsing
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT JSON_OBJECT('foo' : '1' FORMAT JSON, 'bar' : 'baz' RETURNING json);
-- 
2.46.0

#9Tomas Vondra
tomas@vondra.me
In reply to: Junwang Zhao (#6)
1 attachment(s)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On 9/5/24 06:06, Junwang Zhao wrote:

...

I found two other places called json_unique_check_key.

One is *json_build_object_worker*, and the usage is the same as
*json_object_agg_transfn_worker*, I fix that the same way, PSA

The following sql should trigger the problem, I haven't tried asan
but traced the *repalloc* in gdb, I will try this later when I set up my
asan building.

SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2:
repeat('a', 100) WITH UNIQUE);

The other place is json_unique_object_field_start, it is set
as a callback of JsonSemAction, and in the parse state machine,
the fname used by the callback has been correctly handled.
see [1] & [2].

[1]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L1069-L1087
[2]: https://github.com/postgres/postgres/blob/master/src/common/jsonapi.c#L785-L823

Thanks for the fix and the idea for a function triggering issue in the
other place. I verified that it indeed triggers a valgrind report, and
the fix makes it go away.

Attached is a patch with proper commit message, explaining the issue,
and various other details. Can you please proofread it and check that I
got all the details right?

The patch also adds two regression tests, triggering the issue (without
the rest of the patch applied). It took me a while to realize the
existing tests pass simply because the objects are tiny and don't
require enlarging the buffer, and thus the repalloc.

The only question that bothers me a little bit is the possibility of a
memory leak - could it happen that we keep the copied key much longer
than needed? Or does aggcontext have with the right life span? AFAICS
that's where we allocate the aggregate state, so it seems fine.

Also, how far back do we need to backpatch this? ITSM PG15 does not have
this issue, and it was introduced with the SQL/JSON stuff in PG16. Is
that correct?

regards

--
Tomas Vondra

Attachments:

0001-Fix-unique-key-checks-in-JSON-object-constructors.patchtext/x-patch; charset=UTF-8; name=0001-Fix-unique-key-checks-in-JSON-object-constructors.patchDownload
From 45837d645a276912b400b5d99d80ac290e0ba8ad Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Tue, 10 Sep 2024 17:15:10 +0200
Subject: [PATCH] Fix unique key checks in JSON object constructors

When building a JSON object, the code builds a hash table of keys, to
allow checking if the keys are unique. The uniqueness check and adding
the new key happens in json_unique_check_key(), but this assumes the
pointer to the key remains valid.

Unfortunately, two places passed pointers to keys in a StringInfo
buffer, while also appending additional data (more key/value pairs) to
the buffer. With enough data the buffer would be resized using enlarge
enlargeStringInfo(), which does repalloc, invalidating the earlier key
pointers.

Due to this the uniqueness check may fail with both false negatives and
false positives, producing JSON objects with duplicate keys or failing
to produce a perfectly valid JSON object.

This affects four functions, introduced in PG16 with the new SQL/JSON:

- json_object_agg_unique / jsonb_object_agg_unique
- json_object / jsonb_objectagg

The existing regression tests did not detect this issue, simply because
the initial buffer size is 1024 and the objects were small enough not to
require the repalloc.

Reported by Alexander Lakhin, based on AddressSanitizer failure.
Valgrind notices this too, but both require the JSON object large enough
to require the repalloc.

Fixed by properly copying the key into the aggregate memory context, and
adding regression tests with enough data to repalloc the buffer.

Investigation and initial fix by Junwang Zhao, with various improvements
and tests by me.

Backpatch to 16, where these functions were introduced.

Reported-by: Alexander Lakhin
Author: Junwang Zhao, Tomas Vondra
Discussion: https://postgr.es/m/18598-3279ed972a2347c7@postgresql.org
Discussion: https://postgr.es/m/CAEG8a3JjH0ReJF2_O7-8LuEbO69BxPhYeXs95_x7+H9AMWF1gw@mail.gmail.com
---
 src/backend/utils/adt/json.c          | 6 ++++--
 src/test/regress/expected/json.out    | 3 +++
 src/test/regress/expected/sqljson.out | 5 +++++
 src/test/regress/sql/json.sql         | 2 ++
 src/test/regress/sql/sqljson.sql      | 5 +++++
 5 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 4eeeeaf0a60..3819afdc433 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1111,7 +1111,8 @@ json_object_agg_transfn_worker(FunctionCallInfo fcinfo,
 
 	if (unique_keys)
 	{
-		const char *key = &out->data[key_offset];
+		const char *key = MemoryContextStrdup(aggcontext,
+											  &out->data[key_offset]);
 
 		if (!json_unique_check_key(&state->unique_check.check, key, 0))
 			ereport(ERROR,
@@ -1275,7 +1276,8 @@ json_build_object_worker(int nargs, const Datum *args, const bool *nulls, const
 		if (unique_keys)
 		{
 			/* check key uniqueness after key appending */
-			const char *key = &out->data[key_offset];
+			const char *key = MemoryContextStrdup(unique_check.mcxt,
+												  &out->data[key_offset]);
 
 			if (!json_unique_check_key(&unique_check.check, key, 0))
 				ereport(ERROR,
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index 7df11c2f385..96c40911cb9 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -2330,6 +2330,9 @@ select json_object('{a,b,"","d e f"}','{1,2,3,"a b c"}');
  {"a" : "1", "b" : "2", "" : "3", "d e f" : "a b c"}
 (1 row)
 
+-- json_object_agg_unique requires unique keys
+select json_object_agg_unique(mod(i,100), i) from generate_series(0, 199) i;
+ERROR:  duplicate JSON object key value: "0"
 -- json_to_record and json_to_recordset
 select * from json_to_record('{"a":1,"b":"foo","c":"bar"}')
     as x(a int, b text, d text);
diff --git a/src/test/regress/expected/sqljson.out b/src/test/regress/expected/sqljson.out
index 4f91e2117ef..9adfbe15f65 100644
--- a/src/test/regress/expected/sqljson.out
+++ b/src/test/regress/expected/sqljson.out
@@ -537,6 +537,8 @@ SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2 ABSENT ON NULL);
  {"a" : "1", "c" : 2}
 (1 row)
 
+SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2: repeat('a', 100) WITH UNIQUE);
+ERROR:  duplicate JSON object key value: "2"
 SELECT JSON_OBJECT(1: 1, '1': NULL WITH UNIQUE);
 ERROR:  duplicate JSON object key value: "1"
 SELECT JSON_OBJECT(1: 1, '1': NULL ABSENT ON NULL WITH UNIQUE);
@@ -921,6 +923,9 @@ FROM (VALUES (1, 1), (0, NULL),(4, null), (5, null),(6, null),(2, 2)) foo(k, v);
  {"1": 1, "2": 2}
 (1 row)
 
+SELECT JSON_OBJECTAGG(mod(i,100): (i)::text FORMAT JSON WITH UNIQUE)
+FROM generate_series(0, 199) i;
+ERROR:  duplicate JSON object key value: "0"
 -- Test JSON_OBJECT deparsing
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT JSON_OBJECT('foo' : '1' FORMAT JSON, 'bar' : 'baz' RETURNING json);
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index 5c886cd6b33..8251f4f4006 100644
--- a/src/test/regress/sql/json.sql
+++ b/src/test/regress/sql/json.sql
@@ -755,6 +755,8 @@ select json_object('{a,b,NULL,"d e f"}','{1,2,3,"a b c"}');
 
 select json_object('{a,b,"","d e f"}','{1,2,3,"a b c"}');
 
+-- json_object_agg_unique requires unique keys
+select json_object_agg_unique(mod(i,100), i) from generate_series(0, 199) i;
 
 -- json_to_record and json_to_recordset
 
diff --git a/src/test/regress/sql/sqljson.sql b/src/test/regress/sql/sqljson.sql
index bb2487e8649..42dbec26d6f 100644
--- a/src/test/regress/sql/sqljson.sql
+++ b/src/test/regress/sql/sqljson.sql
@@ -138,6 +138,8 @@ SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2);
 SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2 NULL ON NULL);
 SELECT JSON_OBJECT('a': '1', 'b': NULL, 'c': 2 ABSENT ON NULL);
 
+SELECT JSON_OBJECT(1: 1, '2': NULL, '3': 1, repeat('x', 1000): 1, 2: repeat('a', 100) WITH UNIQUE);
+
 SELECT JSON_OBJECT(1: 1, '1': NULL WITH UNIQUE);
 SELECT JSON_OBJECT(1: 1, '1': NULL ABSENT ON NULL WITH UNIQUE);
 SELECT JSON_OBJECT(1: 1, '1': NULL NULL ON NULL WITH UNIQUE RETURNING jsonb);
@@ -283,6 +285,9 @@ FROM (VALUES (1, 1), (1, NULL), (2, 2)) foo(k, v);
 SELECT JSON_OBJECTAGG(k: v ABSENT ON NULL WITH UNIQUE KEYS RETURNING jsonb)
 FROM (VALUES (1, 1), (0, NULL),(4, null), (5, null),(6, null),(2, 2)) foo(k, v);
 
+SELECT JSON_OBJECTAGG(mod(i,100): (i)::text FORMAT JSON WITH UNIQUE)
+FROM generate_series(0, 199) i;
+
 -- Test JSON_OBJECT deparsing
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT JSON_OBJECT('foo' : '1' FORMAT JSON, 'bar' : 'baz' RETURNING json);
-- 
2.46.0

#10Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#9)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

On 9/10/24 21:47, Tomas Vondra wrote:

...

The only question that bothers me a little bit is the possibility of a
memory leak - could it happen that we keep the copied key much longer
than needed? Or does aggcontext have with the right life span? AFAICS
that's where we allocate the aggregate state, so it seems fine.

Also, how far back do we need to backpatch this? ITSM PG15 does not have
this issue, and it was introduced with the SQL/JSON stuff in PG16. Is
that correct?

Nah, I spent a bit of time looking for a memory leak, but I don't think
there's one, or at least not a new one. We use the same memory context
as for the hash table / buffer, so that should be fine.

But this made me realize the code in json_build_object_worker() can
simply use pstrdup() to copy the key into CurrentMemoryContext, which is
where the hash table of unique keys is. In fact, using unique_check.mcxt
would not be quite right:

MemoryContext mcxt; /* context for saving skipped keys */

And this has nothing to do with skipped keys.

So I adjusted that way and pushed.

Thanks for the report / patch.

--
Tomas Vondra

#11Junwang Zhao
zhjwpku@gmail.com
In reply to: Tomas Vondra (#10)
Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()

Hi Tomas,

On Wed, Sep 11, 2024 at 8:08 PM Tomas Vondra <tomas@vondra.me> wrote:

On 9/10/24 21:47, Tomas Vondra wrote:

...

The only question that bothers me a little bit is the possibility of a
memory leak - could it happen that we keep the copied key much longer
than needed? Or does aggcontext have with the right life span? AFAICS
that's where we allocate the aggregate state, so it seems fine.

Also, how far back do we need to backpatch this? ITSM PG15 does not have
this issue, and it was introduced with the SQL/JSON stuff in PG16. Is
that correct?

Nah, I spent a bit of time looking for a memory leak, but I don't think
there's one, or at least not a new one. We use the same memory context
as for the hash table / buffer, so that should be fine.

But this made me realize the code in json_build_object_worker() can
simply use pstrdup() to copy the key into CurrentMemoryContext, which is
where the hash table of unique keys is. In fact, using unique_check.mcxt
would not be quite right:

MemoryContext mcxt; /* context for saving skipped keys */

And this has nothing to do with skipped keys.

So I adjusted that way and pushed.

I didn't get the time to reply to you quickly, sorry about that.
Thank you for improving the patch and appreciate your time
for working on this.

Thanks for the report / patch.

--
Tomas Vondra

--
Regards
Junwang Zhao