tuplesort Generation memory contexts don't play nicely with index builds
Hackers,
I noticed while doing some memory context related work that since we
now use generation.c memory contexts for tuplesorts (40af10b57) that
tuplesort_putindextuplevalues() causes memory "leaks" in the
generation context due to index_form_tuple() being called while we're
switched into the state->tuplecontext.
I use the word "leak" here slightly loosely. It's only a leak due to
how generation.c uses no free lists to allow reuse pfree'd memory.
It looks like the code has been this way ever since 9f03ca915 (Avoid
copying index tuples when building an index.) That commit did add a
big warning at the top of index_form_tuple() that the function must be
careful to not leak any memory.
A quick fix would be just to add a new bool field, e.g, usegencxt to
tuplesort_begin_common() and pass that as false in all functions apart
from tuplesort_begin_heap(). That way we'll always be using an aset.c
context when we call index_form_tuple().
However, part of me thinks that 9f03ca915 is standing in the way of us
doing more in the future to optimize how we store tuples during sorts.
We might, one day, want to consider using a hand-rolled bump
allocator. If we ever do that we'd need to undo the work done by
9f03ca915.
Does anyone have any thoughts on this?
Here's a reproducer from the regression tests:
CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT);
-- Use uncompressed data stored in toast.
CREATE INDEX no_index_cleanup_idx ON no_index_cleanup(t);
ALTER TABLE no_index_cleanup ALTER COLUMN t SET STORAGE EXTERNAL;
INSERT INTO no_index_cleanup(i, t) VALUES (generate_series(1,30),
repeat('1234567890',269));
-- index cleanup option is ignored if VACUUM FULL
VACUUM (INDEX_CLEANUP TRUE, FULL TRUE) no_index_cleanup;
David
On Wed, 29 Jun 2022 at 12:59, David Rowley <dgrowleyml@gmail.com> wrote:
I noticed while doing some memory context related work that since we
now use generation.c memory contexts for tuplesorts (40af10b57) that
tuplesort_putindextuplevalues() causes memory "leaks" in the
generation context due to index_form_tuple() being called while we're
switched into the state->tuplecontext.
I've attached a draft patch which changes things so that we don't use
generation contexts for sorts being done for index builds.
David
Attachments:
dont_use_gencxt_for_index_builds.patchtext/plain; charset=US-ASCII; name=dont_use_gencxt_for_index_builds.patchDownload
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 31554fd867..adc82028ba 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -250,6 +250,8 @@ struct Tuplesortstate
bool bounded; /* did caller specify a maximum number of
* tuples to return? */
bool boundUsed; /* true if we made use of a bounded heap */
+ bool genTupleCxt; /* true if we should use a generation.c memory
+ * context for tuple storage */
int bound; /* if bounded, the maximum number of tuples */
bool tuples; /* Can SortTuple.tuple ever be set? */
int64 availMem; /* remaining memory available, in bytes */
@@ -618,7 +620,7 @@ struct Sharedsort
static Tuplesortstate *tuplesort_begin_common(int workMem,
SortCoordinate coordinate,
- int sortopt);
+ int sortopt, bool genTupleCxt);
static void tuplesort_begin_batch(Tuplesortstate *state);
static void puttuple_common(Tuplesortstate *state, SortTuple *tuple);
static bool consider_abort_common(Tuplesortstate *state);
@@ -842,7 +844,8 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
*/
static Tuplesortstate *
-tuplesort_begin_common(int workMem, SortCoordinate coordinate, int sortopt)
+tuplesort_begin_common(int workMem, SortCoordinate coordinate, int sortopt,
+ bool genTupleCxt)
{
Tuplesortstate *state;
MemoryContext maincontext;
@@ -907,6 +910,8 @@ tuplesort_begin_common(int workMem, SortCoordinate coordinate, int sortopt)
state->memtupsize = INITIAL_MEMTUPSIZE;
state->memtuples = NULL;
+ state->genTupleCxt = genTupleCxt;
+
/*
* After all of the other non-parallel-related state, we setup all of the
* state needed for each batch.
@@ -966,21 +971,16 @@ tuplesort_begin_batch(Tuplesortstate *state)
* eases memory management. Resetting at key points reduces
* fragmentation. Note that the memtuples array of SortTuples is allocated
* in the parent context, not this context, because there is no need to
- * free memtuples early. For bounded sorts, tuples may be pfreed in any
- * order, so we use a regular aset.c context so that it can make use of
- * free'd memory. When the sort is not bounded, we make use of a
- * generation.c context as this keeps allocations more compact with less
- * wastage. Allocations are also slightly more CPU efficient.
+ * free memtuples early.
*/
- if (state->sortopt & TUPLESORT_ALLOWBOUNDED)
+ if (state->genTupleCxt)
+ state->tuplecontext = GenerationContextCreate(state->sortcontext,
+ "Caller tuples",
+ ALLOCSET_DEFAULT_SIZES);
+ else
state->tuplecontext = AllocSetContextCreate(state->sortcontext,
"Caller tuples",
ALLOCSET_DEFAULT_SIZES);
- else
- state->tuplecontext = GenerationContextCreate(state->sortcontext,
- "Caller tuples",
- ALLOCSET_DEFAULT_SIZES);
-
state->status = TSS_INITIAL;
state->bounded = false;
@@ -1033,11 +1033,21 @@ tuplesort_begin_heap(TupleDesc tupDesc,
bool *nullsFirstFlags,
int workMem, SortCoordinate coordinate, int sortopt)
{
- Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
- sortopt);
+ Tuplesortstate *state;
MemoryContext oldcontext;
+ bool genTupleCxt;
int i;
+ /*
+ * For bounded sorts, tuples may be pfreed in any order, so we use a
+ * regular aset.c context so that it can make use of free'd memory. When
+ * the sort is not bounded, we make use of a generation.c context as this
+ * keeps allocations more compact with less wastage. Allocations are also
+ * slightly more CPU efficient.
+ */
+ genTupleCxt = (sortopt & TUPLESORT_ALLOWBOUNDED) == 0;
+ state = tuplesort_begin_common(workMem, coordinate, sortopt, genTupleCxt);
+
oldcontext = MemoryContextSwitchTo(state->maincontext);
AssertArg(nkeys > 0);
@@ -1107,14 +1117,24 @@ tuplesort_begin_cluster(TupleDesc tupDesc,
int workMem,
SortCoordinate coordinate, int sortopt)
{
- Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
- sortopt);
+ Tuplesortstate *state;
BTScanInsert indexScanKey;
MemoryContext oldcontext;
+ bool genTupleCxt;
int i;
Assert(indexRel->rd_rel->relam == BTREE_AM_OID);
+ /*
+ * For bounded sorts, tuples may be pfreed in any order, so we use a
+ * regular aset.c context so that it can make use of free'd memory. When
+ * the sort is not bounded, we make use of a generation.c context as this
+ * keeps allocations more compact with less wastage. Allocations are also
+ * slightly more CPU efficient.
+ */
+ genTupleCxt = (sortopt & TUPLESORT_ALLOWBOUNDED) == 0;
+ state = tuplesort_begin_common(workMem, coordinate, sortopt, genTupleCxt);
+
oldcontext = MemoryContextSwitchTo(state->maincontext);
#ifdef TRACE_SORT
@@ -1214,12 +1234,18 @@ tuplesort_begin_index_btree(Relation heapRel,
SortCoordinate coordinate,
int sortopt)
{
- Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
- sortopt);
+ Tuplesortstate *state;
BTScanInsert indexScanKey;
MemoryContext oldcontext;
int i;
+ /*
+ * We pass false for genTupleCxt due to index_form_tuple being called
+ * in the tuplecontext. We don't want index_form_tuple() allocating
+ * memory in the generation context as generation.c does not reuse
+ * pfree'd memory very well.
+ */
+ state = tuplesort_begin_common(workMem, coordinate, sortopt, false);
oldcontext = MemoryContextSwitchTo(state->maincontext);
#ifdef TRACE_SORT
@@ -1296,10 +1322,17 @@ tuplesort_begin_index_hash(Relation heapRel,
SortCoordinate coordinate,
int sortopt)
{
- Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
- sortopt);
+ Tuplesortstate *state;
MemoryContext oldcontext;
+ /*
+ * We pass false for genTupleCxt due to index_form_tuple being called
+ * in the tuplecontext. We don't want index_form_tuple() allocating
+ * memory in the generation context as generation.c does not reuse
+ * pfree'd memory very well.
+ */
+ state = tuplesort_begin_common(workMem, coordinate, sortopt, false);
+
oldcontext = MemoryContextSwitchTo(state->maincontext);
#ifdef TRACE_SORT
@@ -1341,11 +1374,18 @@ tuplesort_begin_index_gist(Relation heapRel,
SortCoordinate coordinate,
int sortopt)
{
- Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
- sortopt);
+ Tuplesortstate *state;
MemoryContext oldcontext;
int i;
+ /*
+ * We pass false for genTupleCxt due to index_form_tuple being called
+ * in the tuplecontext. We don't want index_form_tuple() allocating
+ * memory in the generation context as generation.c does not reuse
+ * pfree'd memory very well.
+ */
+ state = tuplesort_begin_common(workMem, coordinate, sortopt, false);
+
oldcontext = MemoryContextSwitchTo(state->sortcontext);
#ifdef TRACE_SORT
@@ -1397,11 +1437,21 @@ tuplesort_begin_datum(Oid datumType, Oid sortOperator, Oid sortCollation,
bool nullsFirstFlag, int workMem,
SortCoordinate coordinate, int sortopt)
{
- Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
- sortopt);
+ Tuplesortstate *state;
MemoryContext oldcontext;
int16 typlen;
bool typbyval;
+ bool genTupleCxt;
+
+ /*
+ * For bounded sorts, tuples may be pfreed in any order, so we use a
+ * regular aset.c context so that it can make use of free'd memory. When
+ * the sort is not bounded, we make use of a generation.c context as this
+ * keeps allocations more compact with less wastage. Allocations are also
+ * slightly more CPU efficient.
+ */
+ genTupleCxt = (sortopt & TUPLESORT_ALLOWBOUNDED) == 0;
+ state = tuplesort_begin_common(workMem, coordinate, sortopt, genTupleCxt);
oldcontext = MemoryContextSwitchTo(state->maincontext);
On Wed, 29 Jun 2022 at 12:59, David Rowley <dgrowleyml@gmail.com> wrote:
I noticed while doing some memory context related work that since we
now use generation.c memory contexts for tuplesorts (40af10b57) that
tuplesort_putindextuplevalues() causes memory "leaks" in the
generation context due to index_form_tuple() being called while we're
switched into the state->tuplecontext.
I voiced my dislike for the patch I came up with to fix this issue to
Andres. He suggested that I just add a version of index_form_tuple
that can be given a MemoryContext pointer to allocate the returned
tuple into.
I like that idea much better, so I've attached a patch to fix it that way.
If there are no objections, I plan to push this in the next 24 hours.
David
Attachments:
add_index_form_tuple_context_function.patchtext/plain; charset=US-ASCII; name=add_index_form_tuple_context_function.patchDownload
diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index 3065730bae..c0bad3cd95 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -33,20 +33,39 @@
* ----------------------------------------------------------------
*/
+ /* ----------------
+ * index_form_tuple
+ *
+ * As index_form_tuple_context, but allocates the returned tuple in the
+ * CurrentMemoryContext.
+ * ----------------
+ */
+IndexTuple
+index_form_tuple(TupleDesc tupleDescriptor,
+ Datum *values,
+ bool *isnull)
+{
+ return index_form_tuple_context(tupleDescriptor, values, isnull,
+ CurrentMemoryContext);
+}
+
/* ----------------
- * index_form_tuple
+ * index_form_tuple_context
*
* This shouldn't leak any memory; otherwise, callers such as
* tuplesort_putindextuplevalues() will be very unhappy.
*
* This shouldn't perform external table access provided caller
* does not pass values that are stored EXTERNAL.
+ *
+ * Allocates returned tuple in provided 'context'.
* ----------------
*/
IndexTuple
-index_form_tuple(TupleDesc tupleDescriptor,
- Datum *values,
- bool *isnull)
+index_form_tuple_context(TupleDesc tupleDescriptor,
+ Datum *values,
+ bool *isnull,
+ MemoryContext context)
{
char *tp; /* tuple pointer */
IndexTuple tuple; /* return tuple */
@@ -143,7 +162,7 @@ index_form_tuple(TupleDesc tupleDescriptor,
size = hoff + data_size;
size = MAXALIGN(size); /* be conservative */
- tp = (char *) palloc0(size);
+ tp = (char *) MemoryContextAllocZero(context, size);
tuple = (IndexTuple) tp;
heap_fill_tuple(tupleDescriptor,
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 31554fd867..421afcf47d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1884,12 +1884,13 @@ tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
ItemPointer self, Datum *values,
bool *isnull)
{
- MemoryContext oldcontext = MemoryContextSwitchTo(state->tuplecontext);
+ MemoryContext oldcontext;
SortTuple stup;
Datum original;
IndexTuple tuple;
- stup.tuple = index_form_tuple(RelationGetDescr(rel), values, isnull);
+ stup.tuple = index_form_tuple_context(RelationGetDescr(rel), values,
+ isnull, state->tuplecontext);
tuple = ((IndexTuple) stup.tuple);
tuple->t_tid = *self;
USEMEM(state, GetMemoryChunkSpace(stup.tuple));
@@ -1899,7 +1900,7 @@ tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
RelationGetDescr(state->indexRel),
&stup.isnull1);
- MemoryContextSwitchTo(state->sortcontext);
+ oldcontext = MemoryContextSwitchTo(state->sortcontext);
if (!state->sortKeys || !state->sortKeys->abbrev_converter || stup.isnull1)
{
diff --git a/src/include/access/itup.h b/src/include/access/itup.h
index 2c8877e991..7458bc2fda 100644
--- a/src/include/access/itup.h
+++ b/src/include/access/itup.h
@@ -150,6 +150,9 @@ typedef IndexAttributeBitMapData * IndexAttributeBitMap;
/* routines in indextuple.c */
extern IndexTuple index_form_tuple(TupleDesc tupleDescriptor,
Datum *values, bool *isnull);
+extern IndexTuple index_form_tuple_context(TupleDesc tupleDescriptor,
+ Datum *values, bool *isnull,
+ MemoryContext context);
extern Datum nocache_index_getattr(IndexTuple tup, int attnum,
TupleDesc tupleDesc);
extern void index_deform_tuple(IndexTuple tup, TupleDesc tupleDescriptor,
On Wed, 6 Jul 2022 at 13:34, David Rowley <dgrowleyml@gmail.com> wrote:
If there are no objections, I plan to push this in the next 24 hours.
Pushed.
David
On Tue, Jul 5, 2022 at 9:34 PM David Rowley <dgrowleyml@gmail.com> wrote:
I voiced my dislike for the patch I came up with to fix this issue to
Andres. He suggested that I just add a version of index_form_tuple
that can be given a MemoryContext pointer to allocate the returned
tuple into.I like that idea much better, so I've attached a patch to fix it that way.
If there are no objections, I plan to push this in the next 24 hours.
Apologies for not having looked at this thread sooner, but for what
it's worth, I think this is a fine solution.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Jul 7, 2022 at 3:16 AM David Rowley <dgrowleyml@gmail.com> wrote:
Pushed.
Hmm, the commit appeared on git.postgresql.org, but apparently not in
my email nor the list archives.
--
John Naylor
EDB: http://www.enterprisedb.com
On Thu, 7 Jul 2022 at 13:41, John Naylor <john.naylor@enterprisedb.com> wrote:
On Thu, Jul 7, 2022 at 3:16 AM David Rowley <dgrowleyml@gmail.com> wrote:
Pushed.
Hmm, the commit appeared on git.postgresql.org, but apparently not in
my email nor the list archives.
Strange. I'd suspect a temporary hiccup in whatever code pushes the
commits onto the mailing list, but I see that my fe3caa143 from
yesterday was also missed.
The only difference in my workflow is that I'm sshing to the machine I
push from via another room rather than sitting right in front of it
like I normally am. I struggle to imagine why that would cause this to
happen.
David
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 7 Jul 2022 at 13:41, John Naylor <john.naylor@enterprisedb.com> wrote:
Hmm, the commit appeared on git.postgresql.org, but apparently not in
my email nor the list archives.
Strange. I'd suspect a temporary hiccup in whatever code pushes the
commits onto the mailing list, but I see that my fe3caa143 from
yesterday was also missed.
Caught in list moderation maybe?
regards, tom lane
On 7/6/22 23:15, Tom Lane wrote:
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 7 Jul 2022 at 13:41, John Naylor <john.naylor@enterprisedb.com> wrote:
Hmm, the commit appeared on git.postgresql.org, but apparently not in
my email nor the list archives.Strange. I'd suspect a temporary hiccup in whatever code pushes the
commits onto the mailing list, but I see that my fe3caa143 from
yesterday was also missed.Caught in list moderation maybe?
Actually, yes they are:
8<-----------------------
Date: 2022-07-06 07:41:02
List: pgsql-committers
Reason: sender is not a confirmed email address
From: drowley(at)postgresql(dot)org
Size: 4890 bytes
Subject: pgsql: Remove size increase in ExprEvalStep caused by hashed saops
Date: 2022-07-06 20:14:25
List: pgsql-committers
Reason: sender is not a confirmed email address
Spam score: -7.1
From: drowley(at)postgresql(dot)org
Size: 4703 bytes
Subject: pgsql: Overload index_form_tuple to allow the memory context to
be supp
8<-----------------------
(I manually did the (at) and (dot) obfuscation)
I don't ordinarily moderate the pgsql-committers list, and don't know
offhand who does, so am a bit hesitant to approve them myself. But
perhaps I should?
I guess another good question is why the email address no longer
confirmed...
--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com