Dropping a partitioned table takes too long

Started by Amit Langoteover 8 years ago12 messages

Langote_Amit_f8@lab.ntt.co.jp

over 8 years ago

1 attachment(s)

$SUBJECT, if the table has, say, 2000 partitions.

The main reason seems to be that RelationBuildPartitionDesc() will be
called that many times within the same transaction, which perhaps we
cannot do much about right away. But one thing we could do is to reduce
the impact of memory allocations it does. They are currently leaked into
the caller's context, which may not be reset immediately (such as
PortalHeapMemory). Instead of doing it in the caller's context, use a
temporary context that is deleted before returning. Attached is a patch
for that. On my local development VM, `drop table
table_with_2000_partitions` finished in 27 seconds with the patch instead
of more than 20 minutes that it currently takes.

Thoughts?

Adding this to the open items list.

Thanks,
Amit

PS: this was actually mentioned by Ragnar Ouchterlony who reported some
bugs back in declarative partitioning in January [1]/messages/by-id/17d89e08-874b-c1b1-aa46-12d5afb26235@agama.tv

[1]: /messages/by-id/17d89e08-874b-c1b1-aa46-12d5afb26235@agama.tv
/messages/by-id/17d89e08-874b-c1b1-aa46-12d5afb26235@agama.tv

Attachments:

0001-Set-up-a-temp-context-in-RelationBuildPartitionDesc.patchtext/x-diff; name=0001-Set-up-a-temp-context-in-RelationBuildPartitionDesc.patchDownload

From d435dbe04de935ceb092d4a05cfb083f2ede7f19 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 13 Jan 2017 17:47:04 +0900
Subject: [PATCH] Set up a temp context in RelationBuildPartitionDesc

Memory allocations while reading PartitionBoundSpec from the catalog
and converting to the internal representation to be put into relcache
currently leak to whatever context it was when we entered
RelationBuildPartitionDesc().  Instead set up a temporary context as
a workspace for those allocations and delete it before returning.
---
 src/backend/catalog/partition.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e0d2665a91..900b144af5 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -159,7 +159,8 @@ RelationBuildPartitionDesc(Relation rel)
 				nparts;
 	PartitionKey key = RelationGetPartitionKey(rel);
 	PartitionDesc result;
-	MemoryContext oldcxt;
+	MemoryContext oldcxt,
+				  tmpcxt;
 
 	int			ndatums = 0;
 
@@ -178,6 +179,16 @@ RelationBuildPartitionDesc(Relation rel)
 	if (key == NULL)
 		return;
 
+	/*
+	 * Create a temporary context to not leak into the outer potentially long-
+	 * running context
+	 */
+	/* Now build the actual relcache partition descriptor */
+	tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
+								   "RelationBuildPartitionDesc_workspace",
+								   ALLOCSET_DEFAULT_SIZES);
+	oldcxt = MemoryContextSwitchTo(tmpcxt);
+
 	/* Get partition oids from pg_inherits */
 	inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
 
@@ -431,7 +442,7 @@ RelationBuildPartitionDesc(Relation rel)
 	rel->rd_pdcxt = AllocSetContextCreate(CacheMemoryContext,
 										  RelationGetRelationName(rel),
 										  ALLOCSET_DEFAULT_SIZES);
-	oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);
+	MemoryContextSwitchTo(rel->rd_pdcxt);
 
 	result = (PartitionDescData *) palloc0(sizeof(PartitionDescData));
 	result->nparts = nparts;
@@ -580,6 +591,7 @@ RelationBuildPartitionDesc(Relation rel)
 	}
 
 	MemoryContextSwitchTo(oldcxt);
+	MemoryContextDelete(tmpcxt);
 	rel->rd_partdesc = result;
 }
 
-- 
2.11.0

高增琦

pgf00a@gmail.com

over 8 years ago

In reply to: Amit Langote (#1)

1 attachment(s)

Re: Dropping a partitioned table takes too long

The attached patch try to replace 'heap_open' with 'LockRelationOid' when
locking parent table.
It improved dropping a table with 7000 partitions.

2017-04-25 15:07 GMT+08:00 Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>:

$SUBJECT, if the table has, say, 2000 partitions.

The main reason seems to be that RelationBuildPartitionDesc() will be
called that many times within the same transaction, which perhaps we
cannot do much about right away. But one thing we could do is to reduce
the impact of memory allocations it does. They are currently leaked into
the caller's context, which may not be reset immediately (such as
PortalHeapMemory). Instead of doing it in the caller's context, use a
temporary context that is deleted before returning. Attached is a patch
for that. On my local development VM, `drop table
table_with_2000_partitions` finished in 27 seconds with the patch instead
of more than 20 minutes that it currently takes.

Thoughts?

Adding this to the open items list.

Thanks,
Amit

PS: this was actually mentioned by Ragnar Ouchterlony who reported some
bugs back in declarative partitioning in January [1]

[1]
/messages/by-id/17d89e08-874b-c1b1-
aa46-12d5afb26235%40agama.tv

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
GaoZengqi
pgf00a@gmail.com
zengqigao@gmail.com

Attachments:

0001-Don-t-building-a-relcache-entry-since-we-don-t-need-.patchapplication/octet-stream; name=0001-Don-t-building-a-relcache-entry-since-we-don-t-need-.patchDownload

From f6ad5abe8895e1d5167804c165a314d6d226daf9 Mon Sep 17 00:00:00 2001
From: gaozq <gaozq@upbase.com.cn>
Date: Tue, 25 Apr 2017 11:07:04 +0000
Subject: [PATCH] Don't building a relcache entry since we don't need it when
 dropping partitioned table.

---
 src/backend/catalog/heap.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ece4df0..ab3d83f 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -68,6 +68,7 @@
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
+#include "storage/lmgr.h"
 #include "storage/predicate.h"
 #include "storage/smgr.h"
 #include "utils/acl.h"
@@ -1760,8 +1761,7 @@ heap_drop_with_catalog(Oid relid)
 {
 	Relation	rel;
 	HeapTuple	tuple;
-	Oid			parentOid;
-	Relation	parent = NULL;
+	Oid			parentOid = InvalidOid;
 
 	/*
 	 * To drop a partition safely, we must grab exclusive lock on its parent,
@@ -1776,7 +1776,7 @@ heap_drop_with_catalog(Oid relid)
 	if (((Form_pg_class) GETSTRUCT(tuple))->relispartition)
 	{
 		parentOid = get_partition_parent(relid);
-		parent = heap_open(parentOid, AccessExclusiveLock);
+		LockRelationOid(parentOid, AccessExclusiveLock);
 	}
 
 	ReleaseSysCache(tuple);
@@ -1885,14 +1885,14 @@ heap_drop_with_catalog(Oid relid)
 	 */
 	DeleteRelationTuple(relid);
 
-	if (parent)
+	if (OidIsValid(parentOid))
 	{
 		/*
 		 * Invalidate the parent's relcache so that the partition is no longer
 		 * included in its partition descriptor.
 		 */
-		CacheInvalidateRelcache(parent);
-		heap_close(parent, NoLock);		/* keep the lock */
+		CacheInvalidateRelcacheByRelid(parentOid);
+		/* keep the lock */
 	}
 }
 
-- 
2.7.4

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 8 years ago

In reply to: Amit Langote (#1)

Re: Dropping a partitioned table takes too long

On Tue, Apr 25, 2017 at 12:37 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

$SUBJECT, if the table has, say, 2000 partitions.

The main reason seems to be that RelationBuildPartitionDesc() will be
called that many times within the same transaction, which perhaps we
cannot do much about right away. But one thing we could do is to reduce
the impact of memory allocations it does. They are currently leaked into
the caller's context, which may not be reset immediately (such as
PortalHeapMemory). Instead of doing it in the caller's context, use a
temporary context that is deleted before returning. Attached is a patch
for that. On my local development VM, `drop table
table_with_2000_partitions` finished in 27 seconds with the patch instead
of more than 20 minutes that it currently takes.

Thoughts?

I am not able to undestand why does changing memory context cause so
much difference in execution time?

The way this patch uses the memory context in this patch, it's
possible that in future we will allocate something in temporary
context and then refer it from a longer context. Instead, may be we
should free things specially or change memory context only when
allocating those things.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 8 years ago

In reply to: 高增琦 (#2)

Re: Dropping a partitioned table takes too long

Hi,

On 2017/04/25 20:07, 高增琦 wrote:

2017-04-25 15:07 GMT+08:00 Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>:

$SUBJECT, if the table has, say, 2000 partitions.

The main reason seems to be that RelationBuildPartitionDesc() will be
called that many times within the same transaction, which perhaps we
cannot do much about right away. But one thing we could do is to reduce
the impact of memory allocations it does. They are currently leaked into
the caller's context, which may not be reset immediately (such as
PortalHeapMemory). Instead of doing it in the caller's context, use a
temporary context that is deleted before returning. Attached is a patch
for that. On my local development VM, `drop table
table_with_2000_partitions` finished in 27 seconds with the patch instead
of more than 20 minutes that it currently takes.

The attached patch try to replace 'heap_open' with 'LockRelationOid' when
locking parent table.
It improved dropping a table with 7000 partitions.

Your patch seems to be a much better solution to the problem, thanks.

Regards,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 8 years ago

In reply to: Ashutosh Bapat (#3)

Re: Dropping a partitioned table takes too long

On 2017/04/25 20:55, Ashutosh Bapat wrote:

On Tue, Apr 25, 2017 at 12:37 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

$SUBJECT, if the table has, say, 2000 partitions.

The main reason seems to be that RelationBuildPartitionDesc() will be
called that many times within the same transaction, which perhaps we
cannot do much about right away. But one thing we could do is to reduce
the impact of memory allocations it does. They are currently leaked into
the caller's context, which may not be reset immediately (such as
PortalHeapMemory). Instead of doing it in the caller's context, use a
temporary context that is deleted before returning. Attached is a patch
for that. On my local development VM, `drop table
table_with_2000_partitions` finished in 27 seconds with the patch instead
of more than 20 minutes that it currently takes.

Thoughts?

I am not able to undestand why does changing memory context cause so
much difference in execution time?

The way this patch uses the memory context in this patch, it's
possible that in future we will allocate something in temporary
context and then refer it from a longer context. Instead, may be we
should free things specially or change memory context only when
allocating those things.

Actually, I am withdrawing this patch for time being, because a much
direct and better solution has been offered upthread by GaoZengqi.
Anyway, temporary context added by my patch would not contain any objects
that will be accessed outside of RelationBuildPartitionDesc(). Remember
that the content that we put into the longer-lived rd_partdesc uses the
memory allocated in rd_pdcxt, not the temporary context that I proposed.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Langote (#4)

Re: Dropping a partitioned table takes too long

On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

The attached patch try to replace 'heap_open' with 'LockRelationOid' when
locking parent table.
It improved dropping a table with 7000 partitions.

Your patch seems to be a much better solution to the problem, thanks.

Does anyone wish to object to this patch as untimely?

If not, I'll commit it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

over 8 years ago

In reply to: Robert Haas (#6)

Re: Dropping a partitioned table takes too long

On 4/26/17 12:15, Robert Haas wrote:

On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

The attached patch try to replace 'heap_open' with 'LockRelationOid' when
locking parent table.
It improved dropping a table with 7000 partitions.

Your patch seems to be a much better solution to the problem, thanks.

Does anyone wish to object to this patch as untimely?

If not, I'll commit it.

Seems quite reasonable.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#6)

Re: Dropping a partitioned table takes too long

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Your patch seems to be a much better solution to the problem, thanks.

Does anyone wish to object to this patch as untimely?

If not, I'll commit it.

It's certainly not untimely to address such problems. What I'm wondering
is if we should commit both patches. Avoiding an unnecessary heap_open
is certainly a good thing, but it seems like the memory leak addressed
by the first patch might still be of concern in other scenarios.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#8)

Re: Dropping a partitioned table takes too long

On Wed, Apr 26, 2017 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Your patch seems to be a much better solution to the problem, thanks.

Does anyone wish to object to this patch as untimely?

If not, I'll commit it.

It's certainly not untimely to address such problems. What I'm wondering
is if we should commit both patches. Avoiding an unnecessary heap_open
is certainly a good thing, but it seems like the memory leak addressed
by the first patch might still be of concern in other scenarios.

I will defer to you on that. If you think that patch is a good idea,
please have at it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

高增琦

pgf00a@gmail.com

over 8 years ago

In reply to: Robert Haas (#9)

Re: Dropping a partitioned table takes too long

It seems that in 'load_relcache_init_file()', we forget to initialize
'rd_pdcxt' of relcache.

2017-04-27 0:33 GMT+08:00 Robert Haas <robertmhaas@gmail.com>:

On Wed, Apr 26, 2017 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Your patch seems to be a much better solution to the problem, thanks.

Does anyone wish to object to this patch as untimely?

If not, I'll commit it.

It's certainly not untimely to address such problems. What I'm wondering
is if we should commit both patches. Avoiding an unnecessary heap_open
is certainly a good thing, but it seems like the memory leak addressed
by the first patch might still be of concern in other scenarios.

I will defer to you on that. If you think that patch is a good idea,
please have at it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
GaoZengqi
pgf00a@gmail.com
zengqigao@gmail.com

#11

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Peter Eisentraut (#7)

Re: Dropping a partitioned table takes too long

On Wed, Apr 26, 2017 at 12:21 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 4/26/17 12:15, Robert Haas wrote:

On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

The attached patch try to replace 'heap_open' with 'LockRelationOid' when
locking parent table.
It improved dropping a table with 7000 partitions.

Your patch seems to be a much better solution to the problem, thanks.

Does anyone wish to object to this patch as untimely?

If not, I'll commit it.

Seems quite reasonable.

OK, done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: 高增琦 (#10)

Re: Dropping a partitioned table takes too long

On Fri, Apr 28, 2017 at 6:12 AM, 高增琦 <pgf00a@gmail.com> wrote:

It seems that in 'load_relcache_init_file()', we forget to initialize
'rd_pdcxt' of relcache.

Fixed. Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers