BUG #5608: array_agg() consumes too much memory

Started by Itagaki Takahiroover 15 years ago6 messages
#1Itagaki Takahiro
itagaki.takahiro@gmail.com

The following bug has been logged online:

Bug reference: 5608
Logged by: Itagaki Takahiro
Email address: itagaki.takahiro@gmail.com
PostgreSQL version: 9.0beta4
Operating system: Windows 7 (32bit)
Description: array_agg() consumes too much memory
Details:

I encountered "out of memory" error in large
GROUP BY query with array_agg(). The server log
was filled by the following messages:

accumArrayResult: 8192 total in 1 blocks; 7800 free (0 chunks); 392
used

Should we choose smaller size of initial memory in accumArrayResult()?
It allocates ALLOCSET_DEFAULT_INITSIZE (=8kB) now,
but only 10% of the area was used in my case.

Note that work_mem is not considered in the case;
array_agg() allocates 8kB * (number of groups).

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Itagaki Takahiro (#1)
Re: BUG #5608: array_agg() consumes too much memory

"Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes:

I encountered "out of memory" error in large
GROUP BY query with array_agg(). The server log
was filled by the following messages:

accumArrayResult: 8192 total in 1 blocks; 7800 free (0 chunks); 392
used

Should we choose smaller size of initial memory in accumArrayResult()?

That's not really going to help much, considering that the planner's
estimated memory use per hash aggregate is only a few dozen bytes.
We have to get that estimate in sync with reality or the problem will
remain.

Eventually it might be nice to have some sort of way to specify the
estimate to use for any aggregate function --- but for a near-term
fix maybe we should just hard-wire a special case for array_agg in
count_agg_clauses_walker(). I'd be inclined to leave the array_agg
code as-is and teach the planner to assume ALLOCSET_DEFAULT_INITSIZE
per array_agg aggregate.

regards, tom lane

#3Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Tom Lane (#2)
1 attachment(s)
Re: BUG #5608: array_agg() consumes too much memory

2010/8/10 Tom Lane <tgl@sss.pgh.pa.us>:

Eventually it might be nice to have some sort of way to specify the
estimate to use for any aggregate function --- but for a near-term
fix maybe we should just hard-wire a special case for array_agg in
count_agg_clauses_walker().

The attached patch is the near-term fix; it adds ALLOCSET_DEFAULT_INITSIZE
bytes to memory assumption.

We might need the same adjustment for string_agg(), that consumes
1024 bytes for the transit condition. array_agg() and string_agg()
are only aggregates that have "internal" for aggtranstype.

--
Itagaki Takahiro

Attachments:

array_agg_memcalc.patchapplication/octet-stream; name=array_agg_memcalc.patchDownload
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index e525ba6..e4a107e 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*************** count_agg_clauses_walker(Node *node, Agg
*** 529,535 ****
  		 * pass-by-reference then we have to add the estimated size of the
  		 * value itself, plus palloc overhead.
  		 */
! 		if (!get_typbyval(aggtranstype))
  		{
  			int32		aggtranstypmod;
  			int32		avgwidth;
--- 529,543 ----
  		 * pass-by-reference then we have to add the estimated size of the
  		 * value itself, plus palloc overhead.
  		 */
! 		if (aggref->aggfnoid == ARRAY_AGG_OID)
! 		{
! 			/*
! 			 * Since array_agg consumes much more memory than other aggregate
! 			 * functions, we made a special case for it.
! 			 */
! 			counts->transitionSpace += ALLOCSET_DEFAULT_INITSIZE;
! 		}
! 		else if (!get_typbyval(aggtranstype))
  		{
  			int32		aggtranstypmod;
  			int32		avgwidth;
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 564917d..0564627 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2334 (  array_agg_fina
*** 1046,1051 ****
--- 1046,1052 ----
  DESCR("array_agg final function");
  DATA(insert OID = 2335 (  array_agg		   PGNSP PGUID 12 1 0 0 t f f f f i 1 0 2277 "2283" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
  DESCR("concatenate aggregate input into an array");
+ #define ARRAY_AGG_OID 2335
  
  DATA(insert OID = 760 (  smgrin			   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 210 "2275" _null_ _null_ _null_ _null_	smgrin _null_ _null_ _null_ ));
  DESCR("I/O");
#4Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Itagaki Takahiro (#3)
Re: BUG #5608: array_agg() consumes too much memory

2010/8/14 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

2010/8/10 Tom Lane <tgl@sss.pgh.pa.us>:

Eventually it might be nice to have some sort of way to specify the
estimate to use for any aggregate function --- but for a near-term
fix maybe we should just hard-wire a special case for array_agg in
count_agg_clauses_walker().

The attached patch is the near-term fix; it adds ALLOCSET_DEFAULT_INITSIZE
bytes to memory assumption.

We might need the same adjustment for string_agg(), that consumes
1024 bytes for the transit condition. array_agg() and string_agg()
are only aggregates that have "internal" for aggtranstype.

So, is it better to generalize as it adds ALLOCSET_DEFAULT_INITSIZE if
the transtype is internal, rather than specifying individual function
OID as the patch stands?

Regards,

--
Hitoshi Harada

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hitoshi Harada (#4)
Re: BUG #5608: array_agg() consumes too much memory

Hitoshi Harada <umi.tanuki@gmail.com> writes:

2010/8/14 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

2010/8/10 Tom Lane <tgl@sss.pgh.pa.us>:

Eventually it might be nice to have some sort of way to specify the
estimate to use for any aggregate function --- but for a near-term
fix maybe we should just hard-wire a special case for array_agg in
count_agg_clauses_walker().

So, is it better to generalize as it adds ALLOCSET_DEFAULT_INITSIZE if
the transtype is internal, rather than specifying individual function
OID as the patch stands?

Seems like a good idea ... it's ugly, but it seems much less likely to
need maintenance.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hitoshi Harada (#4)
Re: BUG #5608: array_agg() consumes too much memory

Hitoshi Harada <umi.tanuki@gmail.com> writes:

2010/8/14 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

The attached patch is the near-term fix; it adds ALLOCSET_DEFAULT_INITSIZE
bytes to memory assumption.

We might need the same adjustment for string_agg(), that consumes
1024 bytes for the transit condition. array_agg() and string_agg()
are only aggregates that have "internal" for aggtranstype.

So, is it better to generalize as it adds ALLOCSET_DEFAULT_INITSIZE if
the transtype is internal, rather than specifying individual function
OID as the patch stands?

I've applied a patch following Hitoshi-san's idea.

BTW, a note about the upthread patch: we don't do manual #define's for
OIDs in pg_proc.h. Use fmgroids.h for that.

regards, tom lane