Making jsonb_agg() faster

Started by Tom Lane9 months ago27 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

There have been some complaints recently about how jsonb_agg()
is a lot slower than json_agg() [1]/messages/by-id/CAHyXU0xQGXFBZ10GtqTkXL3_b8FbB79qP+XS2XCfxp+6WuH1Cg@mail.gmail.com. That's annoying considering
that the whole selling point of JSONB is to have faster processing
than the original JSON type, so I poked into that. What I found
is that jsonb_agg() and its variants are just really inefficiently
implemented. Basically, for each aggregate input value, they will:

1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.

The motivation for this seems to have been to make sure that any
memory leakage during phase 1 does not happen in the long-lived
aggcontext. But it's hard not to call it a Rube Goldberg contraption.

The attached patch series gets rid of phases 2 and 3 by refactoring
pushJsonbValue() and related functions so that the JsonbValue tree
they construct can be constructed in a context that's not
CurrentMemoryContext. With that and some run-of-the-mill optimization
work, I'm getting 2.5X speedup for jsonb_agg on a text column (as
measured by the attached test script) and a bit over 2X on an int8
column. It's still a little slower than json_agg, but no longer
slower by integer multiples.

0001 is a somewhat invasive refactoring of the API for
pushJsonbValue and friends. It doesn't in itself have any
measurable speed consequences as far as I can tell, but I think
it makes the code nicer in any case. (I really do not like the
existing coding setup where sometimes it's important to capture
the result of pushJsonbValue and sometimes it's not; that
seems awfully confusing and bug-prone.) The real point though
is to have a way of commanding pushJsonbValue to build the
JsonbValue tree somewhere other than CurrentMemoryContext.

Having laid the groundwork with 0001, 0002 simply amounts to
telling pushJsonbValue to put its handiwork in the aggcontext
and then ripping out phases 2 and 3 of the aggregate transfns.

0003 is some simple micro-optimization of the datatype conversion
code in datum_to_jsonb_internal.

Thoughts?

regards, tom lane

[1]: /messages/by-id/CAHyXU0xQGXFBZ10GtqTkXL3_b8FbB79qP+XS2XCfxp+6WuH1Cg@mail.gmail.com

Attachments:

v1-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.p; name*1=atchDownload+515-412
v1-0002-Remove-fundamentally-redundant-processing-in-json.patchtext/x-diff; charset=us-ascii; name*0=v1-0002-Remove-fundamentally-redundant-processing-in-json.p; name*1=atchDownload+45-259
v1-0003-Micro-optimize-datatype-conversions-in-datum_to_j.patchtext/x-diff; charset=us-ascii; name*0=v1-0003-Micro-optimize-datatype-conversions-in-datum_to_j.p; name*1=atchDownload+81-35
jsonagg_speedtest.sqltext/plain; charset=us-ascii; name=jsonagg_speedtest.sqlDownload
#2Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#1)
Re: Making jsonb_agg() faster

On Tue, Jul 22, 2025 at 10:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There have been some complaints recently about how jsonb_agg()
is a lot slower than json_agg() [1]. That's annoying considering
that the whole selling point of JSONB is to have faster processing
than the original JSON type, so I poked into that. What I found
is that jsonb_agg() and its variants are just really inefficiently
implemented. Basically, for each aggregate input value, they will:

1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.

The motivation for this seems to have been to make sure that any
memory leakage during phase 1 does not happen in the long-lived
aggcontext. But it's hard not to call it a Rube Goldberg contraption.

The attached patch series gets rid of phases 2 and 3 by refactoring
pushJsonbValue() and related functions so that the JsonbValue tree
they construct can be constructed in a context that's not
CurrentMemoryContext. With that and some run-of-the-mill optimization
work, I'm getting 2.5X speedup for jsonb_agg on a text column (as
measured by the attached test script) and a bit over 2X on an int8
column. It's still a little slower than json_agg, but no longer
slower by integer multiples.

0001 is a somewhat invasive refactoring of the API for
pushJsonbValue and friends. It doesn't in itself have any
measurable speed consequences as far as I can tell, but I think
it makes the code nicer in any case. (I really do not like the
existing coding setup where sometimes it's important to capture
the result of pushJsonbValue and sometimes it's not; that
seems awfully confusing and bug-prone.) The real point though
is to have a way of commanding pushJsonbValue to build the
JsonbValue tree somewhere other than CurrentMemoryContext.

Having laid the groundwork with 0001, 0002 simply amounts to
telling pushJsonbValue to put its handiwork in the aggcontext
and then ripping out phases 2 and 3 of the aggregate transfns.

0003 is some simple micro-optimization of the datatype conversion
code in datum_to_jsonb_internal.

Thoughts?

Really excited about this -- I'll do some testing. Performance of
serialization (generating json output from non json data) is the main
reason I still recommend json...jsonb is better in most other cases, but
serialization performance is extremely important.

The other reason json type is needed is being able to serialize WYSIWYG:

superuser@postgres=# select to_json(q), to_jsonb(q) from (select 1 as b, 2
as a, 3 as b) q;
to_json | to_jsonb
---------------------+------------------
{"b":1,"a":2,"b":3} | {"a": 2, "b": 3}

jsonb output can be obnoxious for aesthetic reasons here as well as
uncommon technical ones where key order is important to the receiving
processor. Point being, if there was some way to do that in the jsonb
interface with good performance, the need for json type (or at least an
independent implementation) would completely evaporate.

The only hypothetical use case for legacy json is precise storage, but that
can be worked around with TEXT.

merlin

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#2)
Re: Making jsonb_agg() faster

On Tue, Jul 22, 2025 at 11:43 AM Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Jul 22, 2025 at 10:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thoughts?

Really excited about this -- I'll do some testing. Performance of
serialization (generating json output from non json data) is the main
reason I still recommend json...jsonb is better in most other cases, but
serialization performance is extremely important.

Performance improvement is pretty dramatic, down from 300%+ to ~ 40% --
it's pretty reliable 40% on all output bound cases FWICT.

merlin

Show quoted text
#4jian he
jian.universality@gmail.com
In reply to: Tom Lane (#1)
Re: Making jsonb_agg() faster

On Wed, Jul 23, 2025 at 12:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

There have been some complaints recently about how jsonb_agg()
is a lot slower than json_agg() [1]. That's annoying considering
that the whole selling point of JSONB is to have faster processing
than the original JSON type, so I poked into that. What I found
is that jsonb_agg() and its variants are just really inefficiently
implemented. Basically, for each aggregate input value, they will:

1. Build a JsonbValue tree representation of the input value.
2. Flatten the JsonbValue tree into a Jsonb in on-disk format.
3. Iterate through the Jsonb, building a JsonbValue that is part
of the aggregate's state stored in aggcontext, but is otherwise
identical to what phase 1 built.

The motivation for this seems to have been to make sure that any
memory leakage during phase 1 does not happen in the long-lived
aggcontext. But it's hard not to call it a Rube Goldberg contraption.

The attached patch series gets rid of phases 2 and 3 by refactoring
pushJsonbValue() and related functions so that the JsonbValue tree
they construct can be constructed in a context that's not
CurrentMemoryContext. With that and some run-of-the-mill optimization
work, I'm getting 2.5X speedup for jsonb_agg on a text column (as
measured by the attached test script) and a bit over 2X on an int8
column. It's still a little slower than json_agg, but no longer
slower by integer multiples.

0001 is a somewhat invasive refactoring of the API for
pushJsonbValue and friends. It doesn't in itself have any
measurable speed consequences as far as I can tell, but I think
it makes the code nicer in any case. (I really do not like the
existing coding setup where sometimes it's important to capture
the result of pushJsonbValue and sometimes it's not; that
seems awfully confusing and bug-prone.) The real point though
is to have a way of commanding pushJsonbValue to build the
JsonbValue tree somewhere other than CurrentMemoryContext.

in v1-0001
static void
copyScalarSubstructure(JsonbValue *v, MemoryContext outcontext)
{
}
outcontext will always be NULL in 0001.
that means copyScalarSubstructure is never really called.
it also means that JsonbInState->MemoryContext was always as NULL in 00001
maybe we don't need JsonbInState->MemoryContext in 0001
So 0001 refactoring change would be less invasive.

minor issue: copyScalarSubstructure (v->type) order aligned with (enum
jbvType) would be great.

#5jian he
jian.universality@gmail.com
In reply to: jian he (#4)
Re: Making jsonb_agg() faster

On Thu, Aug 14, 2025 at 4:59 PM jian he <jian.universality@gmail.com> wrote:

0001 is a somewhat invasive refactoring of the API for
pushJsonbValue and friends.

in pushJsonbValue:

/*
* pushJsonbValueScalar handles all cases not involving pushing a
* container object as an ELEM or VALUE.
*/
if (!jbval || (seq != WJB_ELEM && seq != WJB_VALUE))
{
pushJsonbValueScalar(pstate, seq, jbval);
return;
}
/* If it's not a jbvBinary value, again it goes to pushJsonbValueScalar */
if (jbval->type != jbvBinary)
{
pushJsonbValueScalar(pstate, seq, jbval);
return;
}

we can combine above these two into
if (!jbval || (seq != WJB_ELEM && seq != WJB_VALUE) ||
IsAJsonbScalar(jbval))
{
pushJsonbValueScalar(pstate, seq, jbval);
return;
}
and put it in the earlier position of pushJsonbValue.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: jian he (#4)
Re: Making jsonb_agg() faster

jian he <jian.universality@gmail.com> writes:

outcontext will always be NULL in 0001.
that means copyScalarSubstructure is never really called.
it also means that JsonbInState->MemoryContext was always as NULL in 00001
maybe we don't need JsonbInState->MemoryContext in 0001
So 0001 refactoring change would be less invasive.

I think you're missing the point: per the commit message for 0001,

The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext. That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.

If we didn't add the outcontext option in this step, we'd just
have to do so in the next one.

regards, tom lane

#7jian he
jian.universality@gmail.com
In reply to: Tom Lane (#6)
Re: Making jsonb_agg() faster

On Sat, Aug 16, 2025 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think you're missing the point: per the commit message for 0001,

The real reason for doing it is to provide a mechanism whereby
pushJsonbValue() can be told to construct the JsonbValue tree
in a context that is not CurrentMemoryContext. That happens
when a non-null "outcontext" is specified in the JsonbInState.
No callers exercise that option in this patch, but the next
patch in the series will make use of it.

If we didn't add the outcontext option in this step, we'd just
have to do so in the next one.

ok, I didn't check the message. now I get it.

0003 looks very good to me.
I did some minor refactoring solely based on v1-0003, see attachment.
we can refactor datum_to_jsonb_internal,
so that ``switch (tcategory)`` order is more aligned with the order of
enum JsonTypeCategory.
maybe it's not worth the trouble.

about 0002:
jsonb_agg_finalfn
/*
* The final function can be called more than once, so we must not change
* the stored JsonbValue data structure. Fortunately, the WJB_END_ARRAY
* action will only change fields in the JsonbInState struct itself, so we
* can simply invoke pushJsonbValue on a local copy of that.
*/
I don't understand the above comments.
If I add another ``pushJsonbValue(&result, WJB_END_ARRAY, NULL);``
then it will cause segmentation fault, that means we can not call
WJB_END_ARRAY action twice.

in finalize_aggregate:
there is no foreach loop within
```if (OidIsValid(peragg->finalfn_oid))```
Overall, I can not come up with a case where the final function is
called more than once.

Attachments:

v1-0001-minor-change-in-datum_to_jsonb_internal.no-cfbotapplication/octet-stream; name=v1-0001-minor-change-in-datum_to_jsonb_internal.no-cfbotDownload+8-4
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: jian he (#7)
Re: Making jsonb_agg() faster

jian he <jian.universality@gmail.com> writes:

about 0002:
jsonb_agg_finalfn
/*
* The final function can be called more than once, so we must not change
* the stored JsonbValue data structure. Fortunately, the WJB_END_ARRAY
* action will only change fields in the JsonbInState struct itself, so we
* can simply invoke pushJsonbValue on a local copy of that.
*/
I don't understand the above comments.
If I add another ``pushJsonbValue(&result, WJB_END_ARRAY, NULL);``
then it will cause segmentation fault, that means we can not call
WJB_END_ARRAY action twice.

No, but so what? We're not doing that twice here.

in finalize_aggregate:
there is no foreach loop within
```if (OidIsValid(peragg->finalfn_oid))```
Overall, I can not come up with a case where the final function is
called more than once.

If memory serves, the issue arises when the aggregate is used as a
window function. If the frame definition allows, the operation will
proceed like "accumulate a row into the transition state, call the
finalfn to get the aggregate's result at this row, accumulate another
row into the transition state, call the finalfn to get the aggregate's
result at that row, accumulate another row, yadda yadda". So in this
example, the point is not about calling the finalfn more than once per
row, but that it mustn't damage the transition state in case we choose
to accumulate more data.

There are also cases where multiple finalfns can be called on the
same transition state, although that requires aggregates that share a
transfn, and I don't think jsonb_agg shares its transfn with anything
else.

See the discussion of FINALFUNC_MODIFY in the CREATE AGGREGATE man
page for more details.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: Making jsonb_agg() faster

I realized that my patch-of-record in this thread is out of date
in the wake of 2a600a93c, because it had some code catering to the
possibility that timestamp[tz] is pass-by-reference. While it
still compiled and worked, we don't want to commit it like that.

v2-0001 takes care of that, and also adopts your suggestion in [1]/messages/by-id/CACJufxGFzbRDnyG-=WZ3RgkVUwOZexUobGDrA3n5bj8MxrAZvQ@mail.gmail.com
about not using two calls of pushJsonbValueScalar where one would do.
I also did a bit more micro-optimization in appendKey, appendValue,
appendElement to avoid redundant copying, because perf testing showed
that appendElement is still a hot-spot for jsonb_agg. Patches 0002
and 0003 are unchanged.

I did not adopt your changes in [2]/messages/by-id/CACJufxGHLYtvOAB7uA59C9Lz+rDPta5Egiii-B5opsZbw6eDDQ@mail.gmail.com, because they didn't really
seem like improvements to me.

regards, tom lane

[1]: /messages/by-id/CACJufxGFzbRDnyG-=WZ3RgkVUwOZexUobGDrA3n5bj8MxrAZvQ@mail.gmail.com
[2]: /messages/by-id/CACJufxGHLYtvOAB7uA59C9Lz+rDPta5Egiii-B5opsZbw6eDDQ@mail.gmail.com

Attachments:

v2-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.p; name*1=atchDownload+518-426
v2-0002-Remove-fundamentally-redundant-processing-in-json.patchtext/x-diff; charset=us-ascii; name*0=v2-0002-Remove-fundamentally-redundant-processing-in-json.p; name*1=atchDownload+45-259
v2-0003-Micro-optimize-datatype-conversions-in-datum_to_j.patchtext/x-diff; charset=us-ascii; name*0=v2-0003-Micro-optimize-datatype-conversions-in-datum_to_j.p; name*1=atchDownload+81-35
#10Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#9)
Re: Making jsonb_agg() faster

On Aug 23, 2025, at 03:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

v2-0001 takes care of that, and also adopts your suggestion in [1]
about not using two calls of pushJsonbValueScalar where one would do.
I also did a bit more micro-optimization in appendKey, appendValue,
appendElement to avoid redundant copying, because perf testing showed
that appendElement is still a hot-spot for jsonb_agg. Patches 0002
and 0003 are unchanged.

A small comment. In most of places, JsonbInState is initialized as “JsonbInState ps = {0};”, which is the preferred C style. There are some other places still using memset() to zero out JsonbInState variables. Maybe we can change all memset() usages to “{0}”.

I did a search and found out all memset to change:

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 0c964052958..9ca501a7a6e 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -240,10 +240,10 @@ static inline Datum
 jsonb_from_cstring(char *json, int len, bool unique_keys, Node *escontext)
 {
        JsonLexContext lex;
-       JsonbInState state;
+       JsonbInState state = {0};
        JsonSemAction sem;
-       memset(&state, 0, sizeof(state));
+       //memset(&state, 0, sizeof(state));
        memset(&sem, 0, sizeof(sem));
        makeJsonLexContextCstringLen(&lex, json, len, GetDatabaseEncoding(), true);
@@ -1147,9 +1147,9 @@ to_jsonb(PG_FUNCTION_ARGS)
 Datum
 datum_to_jsonb(Datum val, JsonTypeCategory tcategory, Oid outfuncoid)
 {
-       JsonbInState result;
+       JsonbInState result = {0};
-       memset(&result, 0, sizeof(JsonbInState));
+       //memset(&result, 0, sizeof(JsonbInState));

datum_to_jsonb_internal(val, false, &result, tcategory, outfuncoid,
false);
@@ -1162,7 +1162,7 @@ jsonb_build_object_worker(int nargs, const Datum *args, const bool *nulls, const
bool absent_on_null, bool unique_keys)
{
int i;
- JsonbInState result;
+ JsonbInState result = {0};

if (nargs % 2 != 0)
ereport(ERROR,
@@ -1172,7 +1172,7 @@ jsonb_build_object_worker(int nargs, const Datum *args, const bool *nulls, const
errhint("The arguments of %s must consist of alternating keys and values.",
"jsonb_build_object()")));

-       memset(&result, 0, sizeof(JsonbInState));
+       //memset(&result, 0, sizeof(JsonbInState));

pushJsonbValue(&result, WJB_BEGIN_OBJECT, NULL);
result.parseState->unique_keys = unique_keys;
@@ -1232,9 +1232,9 @@ jsonb_build_object(PG_FUNCTION_ARGS)
Datum
jsonb_build_object_noargs(PG_FUNCTION_ARGS)
{
- JsonbInState result;
+ JsonbInState result = {0};

-       memset(&result, 0, sizeof(JsonbInState));
+       //memset(&result, 0, sizeof(JsonbInState));

pushJsonbValue(&result, WJB_BEGIN_OBJECT, NULL);
pushJsonbValue(&result, WJB_END_OBJECT, NULL);
@@ -1293,9 +1293,9 @@ jsonb_build_array(PG_FUNCTION_ARGS)
Datum
jsonb_build_array_noargs(PG_FUNCTION_ARGS)
{
- JsonbInState result;
+ JsonbInState result = {0};

-       memset(&result, 0, sizeof(JsonbInState));
+       //memset(&result, 0, sizeof(JsonbInState));

pushJsonbValue(&result, WJB_BEGIN_ARRAY, NULL);
pushJsonbValue(&result, WJB_END_ARRAY, NULL);
@@ -1321,9 +1321,9 @@ jsonb_object(PG_FUNCTION_ARGS)
int in_count,
count,
i;
- JsonbInState result;
+ JsonbInState result = {0};

-       memset(&result, 0, sizeof(JsonbInState));
+       //memset(&result, 0, sizeof(JsonbInState));

pushJsonbValue(&result, WJB_BEGIN_OBJECT, NULL);

@@ -1425,9 +1425,9 @@ jsonb_object_two_arg(PG_FUNCTION_ARGS)
        int                     key_count,
                                val_count,
                                i;
-       JsonbInState result;
+       JsonbInState result = {0};
-       memset(&result, 0, sizeof(JsonbInState));
+       //memset(&result, 0, sizeof(JsonbInState));

pushJsonbValue(&result, WJB_BEGIN_OBJECT, NULL);

diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index f0d2378a8cc..4305623b979 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -2872,7 +2872,7 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
        {
                JsonBaseObjectInfo baseObject;
                JsonbValue      obj;
-               JsonbInState ps;
+               JsonbInState ps = {0};
                Jsonb      *jsonb;

if (tok != WJB_KEY)
@@ -2886,7 +2886,7 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
tok = JsonbIteratorNext(&it, &val, true);
Assert(tok == WJB_VALUE);

-               memset(&ps, 0, sizeof(ps));
+               //memset(&ps, 0, sizeof(ps));

pushJsonbValue(&ps, WJB_BEGIN_OBJECT, NULL);

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#11Chao Li
li.evan.chao@gmail.com
In reply to: Chao Li (#10)
Re: Making jsonb_agg() faster

On Aug 23, 2025, at 03:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:

v2-0001 takes care of that, and also adopts your suggestion in [1]
about not using two calls of pushJsonbValueScalar where one would do.
I also did a bit more micro-optimization in appendKey, appendValue,
appendElement to avoid redundant copying, because perf testing showed
that appendElement is still a hot-spot for jsonb_agg. Patches 0002
and 0003 are unchanged.

A few more suggestions for pushJsonValue():

+       /* If an object or array is pushed, recursively push its contents */
+       if (jbval->type == jbvObject)
        {
                pushJsonbValue(pstate, WJB_BEGIN_OBJECT, NULL);
                for (i = 0; i < jbval->val.object.nPairs; i++)
@@ -581,32 +607,29 @@ pushJsonbValue(JsonbParseState **pstate, JsonbIteratorToken seq,
                        pushJsonbValue(pstate, WJB_KEY, &jbval->val.object.pairs[i].key);
                        pushJsonbValue(pstate, WJB_VALUE, &jbval->val.object.pairs[i].value);
                }
-
-               return pushJsonbValue(pstate, WJB_END_OBJECT, NULL);
+               pushJsonbValue(pstate, WJB_END_OBJECT, NULL);
+               return;
        }

To push WJB_BEGIN_OBJECT and WJB_END_OBJECT, we can directly call pushJsonValueScalar(), because once entering pushJsonbValue, they will meet the check of (seq != WJB_ELEM && seq != WJB_VALUE). Directly calling pushJsonValueScalar() will saves one level of recursion.

-       if (jbval && (seq == WJB_ELEM || seq == WJB_VALUE) && jbval->type == jbvArray)
+       if (jbval->type == jbvArray)
        {
                pushJsonbValue(pstate, WJB_BEGIN_ARRAY, NULL);
                for (i = 0; i < jbval->val.array.nElems; i++)
                {
                        pushJsonbValue(pstate, WJB_ELEM, &jbval->val.array.elems[i]);
                }
-
-               return pushJsonbValue(pstate, WJB_END_ARRAY, NULL);
+               pushJsonbValue(pstate, WJB_END_ARRAY, NULL);
+               return;
        }

Same thing for pushing WJB_BEGIN_ARRAY and WJB_END_ARRAY.

And for pushJsonbValueScalar():

-                               (*pstate)->size = 4;
+                               ppstate->size = 4;      /* initial guess at array size */

Can we do lazy allocation? Initially assume size = 0, only allocate memory when pushing the first element? This way, we won’t allocate memory for empty objects and arrays.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chao Li (#11)
Re: Making jsonb_agg() faster

Chao Li <li.evan.chao@gmail.com> writes:

A few more suggestions for pushJsonValue():
...
To push WJB_BEGIN_OBJECT and WJB_END_OBJECT, we can directly call pushJsonValueScalar(), because once entering pushJsonbValue, they will meet the check of (seq != WJB_ELEM && seq != WJB_VALUE). Directly calling pushJsonValueScalar() will saves one level of recursion.

I'm not excited about that idea, because I think it'd be quite
confusing for some of the calls in those stanzas to be to
pushJsonbValueScalar while others are to pushJsonbValue.
I don't think the recursive-push paths are particularly hot,
so giving up readability to make them faster doesn't seem like
the right tradeoff.

There's certainly room to argue that the separation between
pushJsonbValue and pushJsonbValueScalar is poorly thought out
and could be done better. But I don't have a concrete idea
about what it could look like instead.

And for pushJsonbValueScalar():

-                               (*pstate)->size = 4;
+                               ppstate->size = 4;      /* initial guess at array size */

Can we do lazy allocation? Initially assume size = 0, only allocate memory when pushing the first element? This way, we won’t allocate memory for empty objects and arrays.

Optimizing for the empty-array or empty-object case surely seems like
the wrong thing; how often will that apply? I actually think that the
initial allocation could stand to be a good bit larger, maybe 64 or
256 or so entries, to reduce the number of repallocs. I did
experiment with that a little bit and could not show any definitive
speedup on the test case I was using ... but 4 entries seems miserly
small.

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: Making jsonb_agg() faster

PFA v3, rebased over 8a27d418f, no substantive changes whatever.

regards, tom lane

Attachments:

v3-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.patchtext/x-diff; charset=us-ascii; name*0=v3-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.p; name*1=atchDownload+518-426
v3-0002-Remove-fundamentally-redundant-processing-in-json.patchtext/x-diff; charset=us-ascii; name*0=v3-0002-Remove-fundamentally-redundant-processing-in-json.p; name*1=atchDownload+45-259
v3-0003-Micro-optimize-datatype-conversions-in-datum_to_j.patchtext/x-diff; charset=us-ascii; name*0=v3-0003-Micro-optimize-datatype-conversions-in-datum_to_j.p; name*1=atchDownload+81-35
#14Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#13)
Re: Making jsonb_agg() faster

On Nov 3, 2025, at 04:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:

PFA v3, rebased over 8a27d418f, no substantive changes whatever.

regards, tom lane

I am on vacation this week, I only find a hour in the evening and did an eyeball review without actually tracing and testing this patch set. Overall, the changes are solid and look good to me. Only a few small comments on 0001:

1 - jsonpath_exec.c
```
@@ -2874,8 +2874,7 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
{
JsonBaseObjectInfo baseObject;
JsonbValue obj;
- JsonbParseState *ps;
- JsonbValue *keyval;
+ JsonbInState ps;
Jsonb *jsonb;

if (tok != WJB_KEY)
@@ -2889,7 +2888,8 @@ executeKeyValueMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
tok = JsonbIteratorNext(&it, &val, true);
Assert(tok == WJB_VALUE);

-		ps = NULL;
+		memset(&ps, 0, sizeof(ps));
+
```

In elsewhere of the patch, you all use “JsonbInState ps = {0}” to do the initialization, only this place uses memset(). Can we keep consistent and use {0} here also. I see there is a “continue” and a “break” before the memset(), maybe you want to avoid unnecessary initialization. I guess that is a tiny saving, but if you think that saving is worthwhile, I’m fine with keeping the current code.

2 - jsonb_util.c
```
+				case TIMETZOID:
+					/* pass-by-reference */
+					oldcontext = MemoryContextSwitchTo(outcontext);
+					v->val.datetime.value = datumCopy(v->val.datetime.value,
+													  false, 12);
```

Instead of hard-coding 12, can we use "sizeof(TimeTzADT)” for better readability?

3 - jsonb_plperl.c
```
+	{
+		/* XXX Ugly hack */
+		jsonb_state->result = palloc(sizeof(JsonbValue));
+		memcpy(jsonb_state->result, &out, sizeof(JsonbValue));
+	}
```
And
```
+	else
+	{
+		/* XXX Ugly hack */
+		jsonb_state->result = out;
+	}
```

You left two “ugly hack” comments. Maybe add a short description for why they are hack and what can be improved in the future.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chao Li (#14)
Re: Making jsonb_agg() faster

Chao Li <li.evan.chao@gmail.com> writes:

On Nov 3, 2025, at 04:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PFA v3, rebased over 8a27d418f, no substantive changes whatever.

I am on vacation this week, I only find a hour in the evening and did an eyeball review without actually tracing and testing this patch set.

Thanks for looking at it!

In elsewhere of the patch, you all use “JsonbInState ps = {0}” to do the initialization, only this place uses memset(). Can we keep consistent and use {0} here also. I see there is a “continue” and a “break” before the memset(), maybe you want to avoid unnecessary initialization. I guess that is a tiny saving, but if you think that saving is worthwhile, I’m fine with keeping the current code.

I intentionally made the 0001 patch as mechanical as possible,
and in particular did not second-guess existing decisions about
whether to initialize a variable in the declaration or later.
So I'm not excited about doing it differently in this one place.
There are some other memsets in the same area that could also
be replaced by "= {0}" if we cared to, but that seems like
material for a separate cleanup patch.

Instead of hard-coding 12, can we use "sizeof(TimeTzADT)” for better readability?

No, because that's not the same :-(. sizeof() would produce 16,
on most machines anyway, thanks to alignment. But 12 is the
type's size according to pg_type.

You left two “ugly hack” comments. Maybe add a short description for why they are hack and what can be improved in the future.

It's the same ugly hack as before, I just formatted the code more
legibly ... I didn't stop to look at whether there's a better way to
do it. Again, that seems like material for a different patch.

regards, tom lane

#16Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#15)
Re: Making jsonb_agg() faster

On Nov 4, 2025, at 02:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chao Li <li.evan.chao@gmail.com> writes:

On Nov 3, 2025, at 04:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
PFA v3, rebased over 8a27d418f, no substantive changes whatever.

I am on vacation this week, I only find a hour in the evening and did an eyeball review without actually tracing and testing this patch set.

Thanks for looking at it!

In elsewhere of the patch, you all use “JsonbInState ps = {0}” to do the initialization, only this place uses memset(). Can we keep consistent and use {0} here also. I see there is a “continue” and a “break” before the memset(), maybe you want to avoid unnecessary initialization. I guess that is a tiny saving, but if you think that saving is worthwhile, I’m fine with keeping the current code.

I intentionally made the 0001 patch as mechanical as possible,
and in particular did not second-guess existing decisions about
whether to initialize a variable in the declaration or later.
So I'm not excited about doing it differently in this one place.
There are some other memsets in the same area that could also
be replaced by "= {0}" if we cared to, but that seems like
material for a separate cleanup patch.

Instead of hard-coding 12, can we use "sizeof(TimeTzADT)” for better readability?

No, because that's not the same :-(. sizeof() would produce 16,
on most machines anyway, thanks to alignment. But 12 is the
type's size according to pg_type.

You left two “ugly hack” comments. Maybe add a short description for why they are hack and what can be improved in the future.

It's the same ugly hack as before, I just formatted the code more
legibly ... I didn't stop to look at whether there's a better way to
do it. Again, that seems like material for a different patch.

regards, tom lane

I got some time today, so I ran some performance tests, which show me a significant performance improvement from this patch.

To run the test, I disabled asserts and built with -O2. The tests ran on a MacBook M4.

To prepare for some data:
```
DROP TABLE IF EXISTS jtest;
CREATE TABLE jtest AS SELECT g AS id, (g % 1000) AS grp, g AS ival, (g::text || '_str') AS sval, now() + (g || ' seconds')::interval AS tsval FROM generate_series(1, 5_000_000) g;
VACUUM ANALYZE jtest;
```

Then I ran the following tests against both master and a patched branch:

1. Jsonb_agg

Patched:
```
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5810.921 ms (00:05.811)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5904.465 ms (00:05.904)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 5857.314 ms (00:05.857)
```

Master:
```
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7310.648 ms (00:07.311)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7333.428 ms (00:07.333)
evantest=# SELECT grp, jsonb_agg(jsonb_build_object('id', id, 'ival', ival, 'sval', sval, 'tsval', tsval)) FROM jtest GROUP BY grp;
Time: 7257.666 ms (00:07.258)
```

Roughly 20% improvement.

2. Jsonb_object_agg

Patched:
```
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 767.258 ms
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 760.665 ms
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 772.326 ms
```

Master:
```
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1298.433 ms (00:01.298)
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1286.029 ms (00:01.286)
evantest=# SELECT jsonb_object_agg(id, sval) FROM jtest;
Time: 1283.624 ms (00:01.284)
```

Roughly 40% improvement.

3. To_jsonb

Patched:
```
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2270.958 ms (00:02.271)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2313.483 ms (00:02.313)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2298.716 ms (00:02.299)
```

Master:
```
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2490.771 ms (00:02.491)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2514.326 ms (00:02.514)
evantest=# SELECT to_jsonb(ival), to_jsonb(sval), to_jsonb(tsval) FROM jtest;
Time: 2509.924 ms (00:02.510)
```

Roughly 8% improvement.

So, overall, this patch has done a great performance improvement.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#15)
Re: Making jsonb_agg() faster

I wrote:

Chao Li <li.evan.chao@gmail.com> writes:

You left two “ugly hack” comments. Maybe add a short description for why they are hack and what can be improved in the future.

It's the same ugly hack as before, I just formatted the code more
legibly ... I didn't stop to look at whether there's a better way to
do it. Again, that seems like material for a different patch.

I took another look at that code and decided that it wasn't quite
as bad a hack as I first thought: it's relying on the upcoming
JsonbValueToJsonb call to wrap the scalar value into the right
array structure for a "raw scalar" JSONB. Improving the comments
about what this code is doing is definitely within the charter of
my 0001 patch, so here's a new version that improves those two
comments. Otherwise this is the same as v3.

regards, tom lane

Attachments:

v4-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.patchtext/x-diff; charset=us-ascii; name*0=v4-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.p; name*1=atchDownload+531-427
v4-0002-Remove-fundamentally-redundant-processing-in-json.patchtext/x-diff; charset=us-ascii; name*0=v4-0002-Remove-fundamentally-redundant-processing-in-json.p; name*1=atchDownload+45-259
v4-0003-Micro-optimize-datatype-conversions-in-datum_to_j.patchtext/x-diff; charset=us-ascii; name*0=v4-0003-Micro-optimize-datatype-conversions-in-datum_to_j.p; name*1=atchDownload+81-35
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: Making jsonb_agg() faster

v5, rebased over a couple of recent patches. No substantive changes.

I'd kind of like to get this pushed soon, because it keeps getting
sideswiped ... does anyone have further comments?

regards, tom lane

Attachments:

v5-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.patchtext/x-diff; charset=us-ascii; name*0=v5-0001-Revise-APIs-for-pushJsonbValue-and-associated-rou.p; name*1=atchDownload+527-423
v5-0002-Remove-fundamentally-redundant-processing-in-json.patchtext/x-diff; charset=us-ascii; name*0=v5-0002-Remove-fundamentally-redundant-processing-in-json.p; name*1=atchDownload+45-259
v5-0003-Micro-optimize-datatype-conversions-in-datum_to_j.patchtext/x-diff; charset=us-ascii; name*0=v5-0003-Micro-optimize-datatype-conversions-in-datum_to_j.p; name*1=atchDownload+81-35
#19Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#18)
Re: Making jsonb_agg() faster

On Dec 6, 2025, at 07:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

v5, rebased over a couple of recent patches. No substantive changes.

I'd kind of like to get this pushed soon, because it keeps getting
sideswiped ... does anyone have further comments?

Actually I traced v4 again this week. I tried to find out something to comment, but ended up I didn’t find anything significant.

My only nit commit is still about the hard-coded 12:
```
+				case TIMETZOID:
+					/* pass-by-reference */
+					oldcontext = MemoryContextSwitchTo(outcontext);
+					v->val.datetime.value = datumCopy(v->val.datetime.value,
+													  false, 12);
```

I commented this before and you explained. But I still think it may deserve a comment for why 12 is here, otherwise future reader may also get the same confusion as when I first time read this code.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chao Li (#19)
Re: Making jsonb_agg() faster

Chao Li <li.evan.chao@gmail.com> writes:

On Dec 6, 2025, at 07:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd kind of like to get this pushed soon, because it keeps getting
sideswiped ... does anyone have further comments?

My only nit commit is still about the hard-coded 12:
I commented this before and you explained. But I still think it may deserve a comment for why 12 is here, otherwise future reader may also get the same confusion as when I first time read this code.

I've been thinking of that as an independent issue. But what I'm
inclined to do is add a symbol to date.h, say like

diff --git a/src/include/utils/date.h b/src/include/utils/date.h
index 7316ac0ff17..2aca785b65d 100644
--- a/src/include/utils/date.h
+++ b/src/include/utils/date.h
@@ -30,6 +30,14 @@ typedef struct
 	int32		zone;			/* numeric time zone, in seconds */
 } TimeTzADT;
+/*
+ * sizeof(TimeTzADT) will be 16 on most platforms due to alignment padding.
+ * However, timetz's typlen is 12 according to pg_type.  In most places
+ * we can get away with using sizeof(TimeTzADT), but where it's important
+ * to match the declared typlen, use TIMETZ_TYPLEN.
+ */
+#define TIMETZ_TYPLEN		12
+
 /*
  * Infinity and minus infinity must be the max and min values of DateADT.
  */

and then use that. (I poked around in other code using TimeTzADT,
and could not find any other places where we have hard-wired "12",
which seems a bit surprising perhaps. But pretty much all the
references to sizeof(TimeTzADT) are in palloc's, where it's fine.)

regards, tom lane

#21Chao Li
li.evan.chao@gmail.com
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chao Li (#21)
#23Alexander Lakhin
exclusion@gmail.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Lakhin (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#24)
#26Andrey Borodin
amborodin@acm.org
In reply to: Tom Lane (#25)
#27Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#26)