Some memory not freed at the exit of RelationBuildPartitionDesc()

Started by amul sulover 6 years ago8 messages
#1amul sul
sulamul@gmail.com
2 attachment(s)

Hi,

In RelationBuildPartitionDesc(), a memory space that use to gather
partitioning
bound info wasn't free at the end. This might not a problem because this
allocated memory will eventually be recovered when the top-level context is
freed, but the case when a partitioned table having 1000s or more
partitions and
this partitioned relation open & close, and its cached entry invalidated in
loop
then we'll have too may call to RelationBuildPartitionDesc() which will keep
wasting some space with every loop.

For a demonstration purpose, I did the following changes to
heap_drop_with_catalog() and tried to drop a partitioned table having 5000
partitions(attached create script) which hit OOM on a machine in no time:

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index b7bcdd9d0f..6b7bc0d7ae 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1842,6 +1842,8 @@ heap_drop_with_catalog(Oid relid)
        parentOid = get_partition_parent(relid);
        LockRelationOid(parentOid, AccessExclusiveLock);
+       rel = relation_open(parentOid, NoLock);
+       relation_close(rel, NoLock);
        /*
         * If this is not the default partition, dropping it will change the
         * default partition's partition constraint, so we must lock it.

I think we should do all partitioned bound information gathering and
calculation in temporary memory context which can be released at the end of
RelationBuildPartitionDesc(), thoughts/Comments?

I did the same in the attached patch and the aforesaid OOM issue is
disappeared.

Regards,
Amul

Attachments:

temp_MemoryContext_POC.patchapplication/octet-stream; name=temp_MemoryContext_POC.patchDownload
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index b207b765f2..521ef14499 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -69,9 +69,15 @@ RelationBuildPartitionDesc(Relation rel)
 	int			i,
 				nparts;
 	PartitionKey key = RelationGetPartitionKey(rel);
-	MemoryContext oldcxt;
+	MemoryContext oldcxt,
+				  tmp_cxt;
 	int		   *mapping;
 
+	tmp_cxt = AllocSetContextCreate(CurrentMemoryContext,
+									"partition context",
+									ALLOCSET_DEFAULT_SIZES);
+	oldcxt = MemoryContextSwitchTo(tmp_cxt);
+
 	/*
 	 * Get partition oids from pg_inherits.  This uses a single snapshot to
 	 * fetch the list of children, so while more children may be getting added
@@ -205,11 +211,10 @@ RelationBuildPartitionDesc(Relation rel)
 		boundinfo = partition_bounds_create(boundspecs, nparts, key, &mapping);
 
 		/* Now copy all info into relcache's partdesc. */
-		oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);
+		MemoryContextSwitchTo(rel->rd_pdcxt);
 		partdesc->boundinfo = partition_bounds_copy(boundinfo, key);
 		partdesc->oids = (Oid *) palloc(nparts * sizeof(Oid));
 		partdesc->is_leaf = (bool *) palloc(nparts * sizeof(bool));
-		MemoryContextSwitchTo(oldcxt);
 
 		/*
 		 * Assign OIDs from the original array into mapped indexes of the
@@ -230,6 +235,8 @@ RelationBuildPartitionDesc(Relation rel)
 				(get_rel_relkind(oids[i]) != RELKIND_PARTITIONED_TABLE);
 		}
 	}
+	MemoryContextSwitchTo(oldcxt);
+	MemoryContextDelete(tmp_cxt);
 
 	rel->rd_partdesc = partdesc;
 }
create_5k_partitions.sqlapplication/octet-stream; name=create_5k_partitions.sqlDownload
#2Amit Langote
amitlangote09@gmail.com
In reply to: amul sul (#1)
Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

Hi Amul,

On Thu, Aug 8, 2019 at 4:15 PM amul sul <sulamul@gmail.com> wrote:

Hi,

In RelationBuildPartitionDesc(), a memory space that use to gather partitioning
bound info wasn't free at the end. This might not a problem because this
allocated memory will eventually be recovered when the top-level context is
freed, but the case when a partitioned table having 1000s or more partitions and
this partitioned relation open & close, and its cached entry invalidated in loop
then we'll have too may call to RelationBuildPartitionDesc() which will keep
wasting some space with every loop.

For a demonstration purpose, I did the following changes to
heap_drop_with_catalog() and tried to drop a partitioned table having 5000
partitions(attached create script) which hit OOM on a machine in no time:

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index b7bcdd9d0f..6b7bc0d7ae 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1842,6 +1842,8 @@ heap_drop_with_catalog(Oid relid)
parentOid = get_partition_parent(relid);
LockRelationOid(parentOid, AccessExclusiveLock);
+       rel = relation_open(parentOid, NoLock);
+       relation_close(rel, NoLock);
/*
* If this is not the default partition, dropping it will change the
* default partition's partition constraint, so we must lock it.

I think we should do all partitioned bound information gathering and
calculation in temporary memory context which can be released at the end of
RelationBuildPartitionDesc(), thoughts/Comments?

I did the same in the attached patch and the aforesaid OOM issue is disappeared.

Thanks for the patch. This was discussed recently in the "hyrax vs.
RelationBuildPartitionDesc()" thread [1]/messages/by-id/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com and I think Alvaro proposed
an approach that's similar to yours. Not sure why it wasn't pursued
though. Maybe the reason is buried somewhere in that discussion.

Thanks,
Amit

[1]: /messages/by-id/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com

#3amul sul
sulamul@gmail.com
In reply to: Amit Langote (#2)
Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

On Thu, Aug 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Amul,

On Thu, Aug 8, 2019 at 4:15 PM amul sul <sulamul@gmail.com> wrote:

Hi,

In RelationBuildPartitionDesc(), a memory space that use to gather

partitioning

bound info wasn't free at the end. This might not a problem because this
allocated memory will eventually be recovered when the top-level context

is

freed, but the case when a partitioned table having 1000s or more

partitions and

this partitioned relation open & close, and its cached entry invalidated

in loop

then we'll have too may call to RelationBuildPartitionDesc() which will

keep

wasting some space with every loop.

For a demonstration purpose, I did the following changes to
heap_drop_with_catalog() and tried to drop a partitioned table having

5000

partitions(attached create script) which hit OOM on a machine in no time:

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index b7bcdd9d0f..6b7bc0d7ae 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1842,6 +1842,8 @@ heap_drop_with_catalog(Oid relid)
parentOid = get_partition_parent(relid);
LockRelationOid(parentOid, AccessExclusiveLock);
+       rel = relation_open(parentOid, NoLock);
+       relation_close(rel, NoLock);
/*
* If this is not the default partition, dropping it will change

the

* default partition's partition constraint, so we must lock it.

I think we should do all partitioned bound information gathering and
calculation in temporary memory context which can be released at the end

of

RelationBuildPartitionDesc(), thoughts/Comments?

I did the same in the attached patch and the aforesaid OOM issue is

disappeared.

Thanks for the patch. This was discussed recently in the "hyrax vs.
RelationBuildPartitionDesc()" thread [1] and I think Alvaro proposed
an approach that's similar to yours. Not sure why it wasn't pursued
though. Maybe the reason is buried somewhere in that discussion.

Thanks,
Amit

[1]
/messages/by-id/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com

Oh, quite similar, thanks Amit for pointing that out.

Look like "hyrax vs.RelationBuildPartitionDesc()" is in discussion for the
master
branch only, not sure though, but we need the similar fix for the back
branches as well.

Regards,
Amul

#4Amit Langote
amitlangote09@gmail.com
In reply to: amul sul (#3)
Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

On Thu, Aug 8, 2019 at 5:33 PM amul sul <sulamul@gmail.com> wrote:

On Thu, Aug 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote:

Thanks for the patch. This was discussed recently in the "hyrax vs.
RelationBuildPartitionDesc()" thread [1] and I think Alvaro proposed
an approach that's similar to yours. Not sure why it wasn't pursued
though. Maybe the reason is buried somewhere in that discussion.

Oh, quite similar, thanks Amit for pointing that out.

Look like "hyrax vs.RelationBuildPartitionDesc()" is in discussion for the master
branch only, not sure though, but we need the similar fix for the back branches as well.

Well, this is not a bug as such, so it's very unlikely that a fix like
this will be back-patched. Also, if this becomes an issue only for
more than over 1000 partitions, then it's not very relevant for PG 10
and PG 11, because we don't recommend using so many partitions with
them. Maybe we can consider fixing PG 12 though.

Thanks,
Amit

#5David Fetter
david@fetter.org
In reply to: Amit Langote (#4)
Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

On Thu, Aug 08, 2019 at 05:42:21PM +0900, Amit Langote wrote:

On Thu, Aug 8, 2019 at 5:33 PM amul sul <sulamul@gmail.com> wrote:

On Thu, Aug 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com> wrote:

Thanks for the patch. This was discussed recently in the "hyrax vs.
RelationBuildPartitionDesc()" thread [1] and I think Alvaro proposed
an approach that's similar to yours. Not sure why it wasn't pursued
though. Maybe the reason is buried somewhere in that discussion.

Oh, quite similar, thanks Amit for pointing that out.

Look like "hyrax vs.RelationBuildPartitionDesc()" is in discussion for the master
branch only, not sure though, but we need the similar fix for the back branches as well.

Well, this is not a bug as such, so it's very unlikely that a fix like
this will be back-patched. Also, if this becomes an issue only for
more than over 1000 partitions, then it's not very relevant for PG 10
and PG 11, because we don't recommend using so many partitions with
them. Maybe we can consider fixing PG 12 though.

A fix for the thousands-of-partitions case would be very welcome for 12.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#6amul sul
sulamul@gmail.com
In reply to: David Fetter (#5)
Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

On Sat, Aug 10, 2019 at 12:16 AM David Fetter <david@fetter.org> wrote:

On Thu, Aug 08, 2019 at 05:42:21PM +0900, Amit Langote wrote:

On Thu, Aug 8, 2019 at 5:33 PM amul sul <sulamul@gmail.com> wrote:

On Thu, Aug 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com>

wrote:

Thanks for the patch. This was discussed recently in the "hyrax vs.
RelationBuildPartitionDesc()" thread [1] and I think Alvaro proposed
an approach that's similar to yours. Not sure why it wasn't pursued
though. Maybe the reason is buried somewhere in that discussion.

Oh, quite similar, thanks Amit for pointing that out.

Look like "hyrax vs.RelationBuildPartitionDesc()" is in discussion for

the master

branch only, not sure though, but we need the similar fix for the back

branches as well.

Well, this is not a bug as such, so it's very unlikely that a fix like
this will be back-patched. Also, if this becomes an issue only for
more than over 1000 partitions, then it's not very relevant for PG 10
and PG 11, because we don't recommend using so many partitions with
them. Maybe we can consider fixing PG 12 though.

A fix for the thousands-of-partitions case would be very welcome for 12.

Look like commit # d3f48dfae42 added the required fix but is enabled only
for
the clobber-cache builds :(

Regards,
Amul

#7David Fetter
david@fetter.org
In reply to: amul sul (#6)
Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

On Tue, Aug 13, 2019 at 03:08:34PM +0530, amul sul wrote:

On Sat, Aug 10, 2019 at 12:16 AM David Fetter <david@fetter.org> wrote:

On Thu, Aug 08, 2019 at 05:42:21PM +0900, Amit Langote wrote:

On Thu, Aug 8, 2019 at 5:33 PM amul sul <sulamul@gmail.com> wrote:

On Thu, Aug 8, 2019 at 1:27 PM Amit Langote <amitlangote09@gmail.com>

wrote:

Thanks for the patch. This was discussed recently in the "hyrax vs.
RelationBuildPartitionDesc()" thread [1] and I think Alvaro proposed
an approach that's similar to yours. Not sure why it wasn't pursued
though. Maybe the reason is buried somewhere in that discussion.

Oh, quite similar, thanks Amit for pointing that out.

Look like "hyrax vs.RelationBuildPartitionDesc()" is in discussion for

the master

branch only, not sure though, but we need the similar fix for the back

branches as well.

Well, this is not a bug as such, so it's very unlikely that a fix like
this will be back-patched. Also, if this becomes an issue only for
more than over 1000 partitions, then it's not very relevant for PG 10
and PG 11, because we don't recommend using so many partitions with
them. Maybe we can consider fixing PG 12 though.

A fix for the thousands-of-partitions case would be very welcome for 12.

Look like commit # d3f48dfae42 added the required fix but is enabled only
for
the clobber-cache builds :(

I've got a real world multi-tenancy case that would really be helped
by this. Can we enable it for all builds, please?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#7)
Re: Some memory not freed at the exit of RelationBuildPartitionDesc()

David Fetter <david@fetter.org> writes:

On Tue, Aug 13, 2019 at 03:08:34PM +0530, amul sul wrote:

Look like commit # d3f48dfae42 added the required fix but is enabled only
for the clobber-cache builds :(

I've got a real world multi-tenancy case that would really be helped
by this. Can we enable it for all builds, please?

This sounds like nonsense to me. As pointed out by the commit message,
that fix was just taking care of bloat observed with CCA on.

regards, tom lane