Multiple calls to json_array_elements slow down nonlinearly

Started by Craig Ringeralmost 12 years ago5 messages
#1Craig Ringer
craig@2ndquadrant.com

Hi all

I ran into this performance report over the weekend:

http://stackoverflow.com/q/21507127/398670

and wanted to mention it here.

json_array_elements seems to spend about 97% of its time in
MemoryContextReset(...).

Given dummy data:

test=> create table g as select (select json_agg(random()) json from
generate_series(0, (r1*4)::int)) from (select random() r1 from
generate_series(1,20000)) aux;

Compare these two methods of producing the same result set:

test=> create table q as select json->x foo from g,
generate_series(0,json_array_length(g.json)-1) x;
SELECT 60103
Time: 157.702 ms
test=> create table p as select json_array_elements(json) foo from g;
SELECT 60103
Time: 4254.494 ms

The issue is reproducible and scales non-linearly with row count, which
is a clue. At 100k rows input, the lateral query takes 592ms vs 179959ms
(3 minutes) for json_array_elements.

Whenever I grab a backtrace it looks like:

#0 0x000000000072dd7d in MemoryContextReset (context=0x2a02dc90) at mcxt.c:130
#1 0x000000000072dd90 in MemoryContextResetChildren (context=<optimized out>) at mcxt.c:155
#2 MemoryContextReset (context=0x1651220) at mcxt.c:131
#3 0x00000000005817f9 in ExecScan (node=node@entry=0x164e1a0, accessMtd=accessMtd@entry=0x592040 <SeqNext>, recheckMtd=recheckMtd@entry=0x592030 <SeqRecheck>)
at execScan.c:155

(Sorry for the quote-paste; only way to make @#$ Thunderbird not wrap
mail, I need to switch clients or fix that).

"perf top" on the process shows:

96.92% postgres [.] MemoryContextReset
0.15% [kernel] [k] cpuacct_account_field
0.09% [kernel] [k] update_cfs_rq_blocked_load
0.09% postgres [.] AllocSetAlloc

At a guess, we're looking at a case where a new child context is created
at every call, so every MemoryContextResetChildren call has to deal with
more child contexts. I'm going to take a quick look now, I just wanted
to get this written up before I got sidetracked.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#1)
1 attachment(s)
Re: bugfix patch for json_array_elements

On 02/03/2014 09:09 AM, Craig Ringer wrote:

At a guess, we're looking at a case where a new child context is created
at every call, so every MemoryContextResetChildren call has to deal with
more child contexts.

That would be "yes". After a short run, I see 32849 lines like:

json_array_elements temporary cxt: 8192 total in 1 blocks; 8160 free (0
chunks); 32 used

under the context:

PortalMemory: 8192 total in 1 blocks
PortalHeapMemory: 7168 total in 3 blocks
ExecutorState: 65600 total in 4 blocks
ExprContext: 8192 total in 1 blocks
json_array_elements temporary cxt: 8192 total in 1 blocks;
8160 free (0 chunks); 32 used

The attached patch deletes the context after use, bringing performance
back into line. It should be backpatched to 9.3.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Fix-leaked-memory-context-in-json_array_each-and-jso.patchtext/x-patch; name=0001-Fix-leaked-memory-context-in-json_array_each-and-jso.patchDownload
>From f4ae516d643dee43200053533dbcc8b362e3f92b Mon Sep 17 00:00:00 2001
From: Craig Ringer <craig@2ndquadrant.com>
Date: Mon, 3 Feb 2014 09:47:34 +0800
Subject: [PATCH] Fix leaked memory context in json_array_each and json_each

json_array_each and json_each leaked a memory context with each
invocation, causing profiles involving many invocations in a single
query context to be dominated by MemoryContextReset because of
MemoryContextResetChildren calls made after each tuple is emitted.

See initial bug report:

http://stackoverflow.com/q/21507127/398670

and list discussion:

 http://www.postgresql.org/message-id/52EEEC37.9040305@2ndquadrant.com
---
 src/backend/utils/adt/jsonfuncs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index a19b222..116d7a0 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -975,6 +975,9 @@ each_worker(PG_FUNCTION_ARGS, bool as_text)
 	rsi->setResult = state->tuple_store;
 	rsi->setDesc = state->ret_tdesc;
 
+	MemoryContextDelete(state->tmp_cxt);
+	state->tmp_cxt = NULL;
+
 	PG_RETURN_NULL();
 }
 
@@ -1157,6 +1160,9 @@ elements_worker(PG_FUNCTION_ARGS, bool as_text)
 	rsi->setResult = state->tuple_store;
 	rsi->setDesc = state->ret_tdesc;
 
+	MemoryContextDelete(state->tmp_cxt);
+	state->tmp_cxt = NULL;
+
 	PG_RETURN_NULL();
 }
 
-- 
1.8.3.1

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Craig Ringer (#2)
Re: bugfix patch for json_array_elements

On 02/02/2014 08:54 PM, Craig Ringer wrote:

On 02/03/2014 09:09 AM, Craig Ringer wrote:

At a guess, we're looking at a case where a new child context is created
at every call, so every MemoryContextResetChildren call has to deal with
more child contexts.

That would be "yes". After a short run, I see 32849 lines like:

json_array_elements temporary cxt: 8192 total in 1 blocks; 8160 free (0
chunks); 32 used

under the context:

PortalMemory: 8192 total in 1 blocks
PortalHeapMemory: 7168 total in 3 blocks
ExecutorState: 65600 total in 4 blocks
ExprContext: 8192 total in 1 blocks
json_array_elements temporary cxt: 8192 total in 1 blocks;
8160 free (0 chunks); 32 used

The attached patch deletes the context after use, bringing performance
back into line. It should be backpatched to 9.3.

[...]

+	MemoryContextDelete(state->tmp_cxt);
+	state->tmp_cxt = NULL;
+
PG_RETURN_NULL();

Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the
end of the function since it's a child of the CurrentMemoryContext. But
if I got that wrong I'm happy to fix it. I don't see the point in
setting state->tmp_cxt to NULL since it's about to go out of scope anyway.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: bugfix patch for json_array_elements

Andrew Dunstan <andrew@dunslane.net> writes:

On 02/02/2014 08:54 PM, Craig Ringer wrote:

The attached patch deletes the context after use, bringing performance
back into line. It should be backpatched to 9.3.

Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the
end of the function since it's a child of the CurrentMemoryContext.

The executor does MemoryContextReset, not
MemoryContextResetAndDeleteChildren, on the per-tuple context. That means
that child contexts will be reset, not deleted. I seem to recall some
discussions about changing that, or even redefining MemoryContextReset to
automatically delete child contexts; but it would take a fair amount of
research to be sure such a change was safe.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#4)
Re: bugfix patch for json_array_elements

On 02/03/2014 11:12 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 02/02/2014 08:54 PM, Craig Ringer wrote:

The attached patch deletes the context after use, bringing performance
back into line. It should be backpatched to 9.3.

Hmm. I guess I was assuming that the tmp_cxt would be cleaned up at the
end of the function since it's a child of the CurrentMemoryContext.

The executor does MemoryContextReset, not
MemoryContextResetAndDeleteChildren, on the per-tuple context. That means
that child contexts will be reset, not deleted. I seem to recall some
discussions about changing that, or even redefining MemoryContextReset to
automatically delete child contexts; but it would take a fair amount of
research to be sure such a change was safe.

Good to know.

Is it worth a note in src/backend/utils/mmgr/README so people are warned
against making the same mistake I did? Both of these are set-returning
functions operating in materialize mode - not sure if that makes any
difference.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers