Memory leak on hashed agg rescan
I noticed a minor leak in the per-query context when ExecReScanAgg()
is called for a hashed aggregate. During rescan, build_hash_table() is
called to create a new empty hash table in the aggcontext. However,
build_hash_table() also constructs the "hash_needed" column list in
the per-query context, so repeated calls of build_hash_table() result
in leaking this memory for the duration of the query.
Attached is a patch that fixes this by only constructing "hash_needed"
if it doesn't already exist. I also bms_free'd the temporary BMS that
is created, although that's pretty harmless.
Neil
Attachments:
hashed_agg_mem_leak-1.patchapplication/octet-stream; name=hashed_agg_mem_leak-1.patchDownload
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
***************
*** 665,673 **** build_hash_table(AggState *aggstate)
Agg *node = (Agg *) aggstate->ss.ps.plan;
MemoryContext tmpmem = aggstate->tmpcontext->ecxt_per_tuple_memory;
Size entrysize;
- Bitmapset *colnos;
- List *collist;
- int i;
Assert(node->aggstrategy == AGG_HASHED);
Assert(node->numGroups > 0);
--- 665,670 ----
***************
*** 700,705 **** build_hash_table(AggState *aggstate)
--- 697,705 ----
* then convert it to an integer list (cheaper to scan at runtime). The
* list is in decreasing order so that the first entry is the largest;
* lookup_hash_entry depends on this to use slot_getsomeattrs correctly.
+ * Since build_hash_table() is invoked for every re-scan but we allocate
+ * the column list in the per-query context, we need only do this the
+ * first time through.
*
* Note: at present, searching the tlist/qual is not really necessary
* since the parser should disallow any unaggregated references to
***************
*** 707,723 **** build_hash_table(AggState *aggstate)
* support for SQL99 semantics that allow use of "functionally dependent"
* columns that haven't been explicitly grouped by.
*/
! /* Find Vars that will be needed in tlist and qual */
! colnos = find_unaggregated_cols(aggstate);
! /* Add in all the grouping columns */
! for (i = 0; i < node->numCols; i++)
! colnos = bms_add_member(colnos, node->grpColIdx[i]);
! /* Convert to list, using lcons so largest element ends up first */
! collist = NIL;
! while ((i = bms_first_member(colnos)) >= 0)
! collist = lcons_int(i, collist);
! aggstate->hash_needed = collist;
}
/*
--- 707,730 ----
* support for SQL99 semantics that allow use of "functionally dependent"
* columns that haven't been explicitly grouped by.
*/
+ if (!aggstate->hash_needed)
+ {
+ Bitmapset *colnos;
+ List *collist;
+ int i;
! /* Find Vars that will be needed in tlist and qual */
! colnos = find_unaggregated_cols(aggstate);
! /* Add in all the grouping columns */
! for (i = 0; i < node->numCols; i++)
! colnos = bms_add_member(colnos, node->grpColIdx[i]);
! /* Convert to list, using lcons so largest element ends up first */
! collist = NIL;
! while ((i = bms_first_member(colnos)) >= 0)
! collist = lcons_int(i, collist);
! aggstate->hash_needed = collist;
! bms_free(colnos);
! }
}
/*
"Neil Conway" <neil.conway@gmail.com> writes:
I noticed a minor leak in the per-query context when ExecReScanAgg()
is called for a hashed aggregate. During rescan, build_hash_table() is
called to create a new empty hash table in the aggcontext. However,
build_hash_table() also constructs the "hash_needed" column list in
the per-query context, so repeated calls of build_hash_table() result
in leaking this memory for the duration of the query.
Attached is a patch that fixes this by only constructing "hash_needed"
if it doesn't already exist. I also bms_free'd the temporary BMS that
is created, although that's pretty harmless.
It would probably be cleaner to take that logic out of build_hash_table
altogether, and put it in a separate function to be called by
ExecInitAgg.
regards, tom lane
On Thu, Oct 16, 2008 at 5:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It would probably be cleaner to take that logic out of build_hash_table
altogether, and put it in a separate function to be called by
ExecInitAgg.
Yeah, I considered that -- makes sense. Attached is the patch I
applied to HEAD, REL8_3_STABLE and REL8_2_STABLE.
Neil
Attachments:
hashed_agg_mem_leak-2.patchapplication/octet-stream; name=hashed_agg_mem_leak-2.patchDownload
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
***************
*** 665,673 **** build_hash_table(AggState *aggstate)
Agg *node = (Agg *) aggstate->ss.ps.plan;
MemoryContext tmpmem = aggstate->tmpcontext->ecxt_per_tuple_memory;
Size entrysize;
- Bitmapset *colnos;
- List *collist;
- int i;
Assert(node->aggstrategy == AGG_HASHED);
Assert(node->numGroups > 0);
--- 665,670 ----
***************
*** 683,712 **** build_hash_table(AggState *aggstate)
entrysize,
aggstate->aggcontext,
tmpmem);
! /*
! * Create a list of the tuple columns that actually need to be stored in
! * hashtable entries. The incoming tuples from the child plan node will
! * contain grouping columns, other columns referenced in our targetlist
! * and qual, columns used to compute the aggregate functions, and perhaps
! * just junk columns we don't use at all. Only columns of the first two
! * types need to be stored in the hashtable, and getting rid of the others
! * can make the table entries significantly smaller. To avoid messing up
! * Var numbering, we keep the same tuple descriptor for hashtable entries
! * as the incoming tuples have, but set unwanted columns to NULL in the
! * tuples that go into the table.
! *
! * To eliminate duplicates, we build a bitmapset of the needed columns,
! * then convert it to an integer list (cheaper to scan at runtime). The
! * list is in decreasing order so that the first entry is the largest;
! * lookup_hash_entry depends on this to use slot_getsomeattrs correctly.
! *
! * Note: at present, searching the tlist/qual is not really necessary
! * since the parser should disallow any unaggregated references to
! * ungrouped columns. However, the search will be needed when we add
! * support for SQL99 semantics that allow use of "functionally dependent"
! * columns that haven't been explicitly grouped by.
! */
/* Find Vars that will be needed in tlist and qual */
colnos = find_unaggregated_cols(aggstate);
--- 680,719 ----
entrysize,
aggstate->aggcontext,
tmpmem);
+ }
! /*
! * Create a list of the tuple columns that actually need to be stored in
! * hashtable entries. The incoming tuples from the child plan node will
! * contain grouping columns, other columns referenced in our targetlist and
! * qual, columns used to compute the aggregate functions, and perhaps just
! * junk columns we don't use at all. Only columns of the first two types
! * need to be stored in the hashtable, and getting rid of the others can
! * make the table entries significantly smaller. To avoid messing up Var
! * numbering, we keep the same tuple descriptor for hashtable entries as the
! * incoming tuples have, but set unwanted columns to NULL in the tuples that
! * go into the table.
! *
! * To eliminate duplicates, we build a bitmapset of the needed columns, then
! * convert it to an integer list (cheaper to scan at runtime). The list is
! * in decreasing order so that the first entry is the largest;
! * lookup_hash_entry depends on this to use slot_getsomeattrs correctly.
! * Note that the list is preserved over ExecReScanAgg, so we allocate it in
! * the per-query context (unlike the hash table itself).
! *
! * Note: at present, searching the tlist/qual is not really necessary since
! * the parser should disallow any unaggregated references to ungrouped
! * columns. However, the search will be needed when we add support for
! * SQL99 semantics that allow use of "functionally dependent" columns that
! * haven't been explicitly grouped by.
! */
! static List *
! find_hash_columns(AggState *aggstate)
! {
! Agg *node = (Agg *) aggstate->ss.ps.plan;
! Bitmapset *colnos;
! List *collist;
! int i;
/* Find Vars that will be needed in tlist and qual */
colnos = find_unaggregated_cols(aggstate);
***************
*** 717,723 **** build_hash_table(AggState *aggstate)
collist = NIL;
while ((i = bms_first_member(colnos)) >= 0)
collist = lcons_int(i, collist);
! aggstate->hash_needed = collist;
}
/*
--- 724,732 ----
collist = NIL;
while ((i = bms_first_member(colnos)) >= 0)
collist = lcons_int(i, collist);
! bms_free(colnos);
!
! return collist;
}
/*
***************
*** 1325,1330 **** ExecInitAgg(Agg *node, EState *estate, int eflags)
--- 1334,1341 ----
{
build_hash_table(aggstate);
aggstate->table_filled = false;
+ /* Compute the columns we actually need to hash on */
+ aggstate->hash_needed = find_hash_columns(aggstate);
}
else
{