Re: adding partitioned tables to publications

Started by 赵锐about 5 years ago5 messages
#1赵锐
875941708@qq.com
1 attachment(s)

The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
To make it easy to see, I attach another patch.
"RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlarge and re-hash.
When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
After applying my patch, 10 billion will be published in 10 minutes.

------------------ Original ------------------
From: &nbsp;"Amit Langote";<amitlangote09@gmail.com&gt;;
Send time:&nbsp;Friday, Apr 17, 2020 10:58 PM
To:&nbsp;"Peter Eisentraut"<peter.eisentraut@2ndquadrant.com&gt;;
Cc:&nbsp;"Petr Jelinek"<petr@2ndquadrant.com&gt;; "Rafia Sabih"<rafia.pghackers@gmail.com&gt;; "PostgreSQL-development"<pgsql-hackers@postgresql.org&gt;;
Subject: &nbsp;Re: adding partitioned tables to publications

On Fri, Apr 17, 2020 at 10:23 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com&gt; wrote:
&gt; On 2020-04-09 09:28, Amit Langote wrote:
&gt; &gt; While figuring this out, I thought the nearby code could be rearranged
&gt; &gt; a bit, especially to de-duplicate the code.&nbsp; Also, I think
&gt; &gt; get_rel_sync_entry() may be a better place to set the map, rather than
&gt; &gt; maybe_send_schema().&nbsp; Thoughts?
&gt;
&gt; because I didn't really have an opinion on that at the time, but if you
&gt; still want it considered or have any open thoughts on this thread,
&gt; please resend or explain.

Sure, thanks for taking care of the bug.

Rebased the code rearrangement patch.&nbsp; Also resending the patch to fix
TAP tests for improving coverage as described in:
/messages/by-id/CA+HiwqFyydvQ5g=qa54UM+Xjm77BdhX-nM4dXQkNOgH=zvDjoA@mail.gmail.com

To summarize:
1. Missing coverage for a couple of related blocks in
apply_handle_tuple_routing()
2. Missing coverage report for the code in pgoutput.c added by 83fd4532

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-RelationClose-after-RelationIdGetRelation.patchapplication/octet-stream; charset=gb18030; name=0001-RelationClose-after-RelationIdGetRelation.patchDownload
From de3a45ad05bc2081b3942c10f701a3a1476e522e Mon Sep 17 00:00:00 2001
From: "Mark Zhao" <875941708@qq.com>
Date: Wed, 30 Dec 2020 13:36:12 +0000
Subject: [PATCH] RelationClose after RelationIdGetRelation

---
 src/backend/replication/pgoutput/pgoutput.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 15379e3..a36dbaf 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -363,6 +363,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
 	MemoryContext old;
 	RelationSyncEntry *relentry;
+	Relation	ancestor = NULL;
 
 	if (!is_publishable_relation(relation))
 		return;
@@ -404,7 +405,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuple if needed. */
 					if (relentry->map)
 						tuple = execute_attr_map_tuple(tuple, relentry->map);
@@ -425,7 +427,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuples if needed. */
 					if (relentry->map)
 					{
@@ -448,7 +451,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispart`ition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuple if needed. */
 					if (relentry->map)
 						oldtuple = execute_attr_map_tuple(oldtuple, relentry->map);
@@ -465,6 +469,9 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 			Assert(false);
 	}
 
+	if (ancestor)
+		RelationClose(ancestor);
+
 	/* Cleanup */
 	MemoryContextSwitchTo(old);
 	MemoryContextReset(data->context);
-- 
1.8.3.1

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: 赵锐 (#1)
2 attachment(s)
Re: adding partitioned tables to publications

On Wed, Dec 30, 2020 at 8:03 PM 赵锐 <875941708@qq.com> wrote:

The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
To make it easy to see, I attach another patch.
"RelationIdGetRelation" will increase ref on owner->relrefarr, without "RelationClose", the owner->relrefarr will enlarge and re-hash.
When the capacity of owner->relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
After applying my patch, 10 billion will be published in 10 minutes.

It is a clear relation descriptor leak. The proposed fix seems correct
to me. The patch wasn't getting applied to HEAD. So, I have prepared
the separate patches for HEAD and 13. There are minor modifications in
the patch like I have used RelationIsValid before closing the
relation. I have not added any test because I see that there is
already a test in src/test/subscription/t/013_partition.

Kindly let me know your English name so that I can give you credit as
a co-author?

--
With Regards,
Amit Kapila.

Attachments:

v1-0001-Fix-relation-descriptor-leak.HEAD.patchapplication/octet-stream; name=v1-0001-Fix-relation-descriptor-leak.HEAD.patchDownload
From 42911152c85cc0dda7384259dbee917d285bb298 Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Mon, 11 Jan 2021 16:22:19 +0530
Subject: [PATCH v1] Fix relation descriptor leak.

We missed closing the relation descriptor while sending changes via the
root of partitioned relations during logical replication.

Author: Amit Langote
Reviewed-by: Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
---
 src/backend/replication/pgoutput/pgoutput.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 81dbed33d5..2f01137b42 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -502,6 +502,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	MemoryContext old;
 	RelationSyncEntry *relentry;
 	TransactionId xid = InvalidTransactionId;
+	Relation	ancestor = NULL;
 
 	if (!is_publishable_relation(relation))
 		return;
@@ -552,7 +553,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuple if needed. */
 					if (relentry->map)
 						tuple = execute_attr_map_tuple(tuple, relentry->map);
@@ -574,7 +576,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuples if needed. */
 					if (relentry->map)
 					{
@@ -598,7 +601,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuple if needed. */
 					if (relentry->map)
 						oldtuple = execute_attr_map_tuple(oldtuple, relentry->map);
@@ -616,6 +620,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 			Assert(false);
 	}
 
+	if (RelationIsValid(ancestor))
+	{
+		RelationClose(ancestor);
+		ancestor = NULL;
+	}
+
 	/* Cleanup */
 	MemoryContextSwitchTo(old);
 	MemoryContextReset(data->context);
-- 
2.28.0.windows.1

v1-0001-Fix-relation-descriptor-leak.13.patchapplication/octet-stream; name=v1-0001-Fix-relation-descriptor-leak.13.patchDownload
From 48b13e39e371fb4e98fb1da86268509bbf3ab5ba Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Mon, 11 Jan 2021 16:49:47 +0530
Subject: [PATCH v1] Fix relation descriptor leak.

We missed closing the relation descriptor while sending changes via the
root of partitioned relations during logical replication.

Author: Amit Langote
Reviewed-by: Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
---
 src/backend/replication/pgoutput/pgoutput.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 15379e3..c5fbebf 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -363,6 +363,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
 	MemoryContext old;
 	RelationSyncEntry *relentry;
+	Relation	ancestor = NULL;
 
 	if (!is_publishable_relation(relation))
 		return;
@@ -404,7 +405,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuple if needed. */
 					if (relentry->map)
 						tuple = execute_attr_map_tuple(tuple, relentry->map);
@@ -425,7 +427,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuples if needed. */
 					if (relentry->map)
 					{
@@ -448,7 +451,8 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 				if (relentry->publish_as_relid != RelationGetRelid(relation))
 				{
 					Assert(relation->rd_rel->relispartition);
-					relation = RelationIdGetRelation(relentry->publish_as_relid);
+					ancestor = RelationIdGetRelation(relentry->publish_as_relid);
+					relation = ancestor;
 					/* Convert tuple if needed. */
 					if (relentry->map)
 						oldtuple = execute_attr_map_tuple(oldtuple, relentry->map);
@@ -465,6 +469,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 			Assert(false);
 	}
 
+	if (RelationIsValid(ancestor))
+	{
+		RelationClose(ancestor);
+		ancestor = NULL;
+	}
+
 	/* Cleanup */
 	MemoryContextSwitchTo(old);
 	MemoryContextReset(data->context);
-- 
1.8.3.1

#3Mark Zhao
875941708@qq.com
In reply to: Amit Kapila (#2)

Thanks for your reply. The patch is exactly what I want.
My English name is Mark Zhao, which should be the current email name.

Thanks,
Mark Zhao

------------------&nbsp;Original&nbsp;------------------
From: &nbsp;"Amit Kapila";<amit.kapila16@gmail.com&gt;;
Send time:&nbsp;Monday, Jan 11, 2021 8:12 PM
To:&nbsp;"赵锐"<875941708@qq.com&gt;;
Cc:&nbsp;"Peter Eisentraut"<peter.eisentraut@2ndquadrant.com&gt;; "Amit Langote"<amitlangote09@gmail.com&gt;; "Petr Jelinek"<petr@2ndquadrant.com&gt;; "Rafia Sabih"<rafia.pghackers@gmail.com&gt;; "PostgreSQL-development"<pgsql-hackers@postgresql.org&gt;;
Subject: &nbsp;Re: adding partitioned tables to publications

On Wed, Dec 30, 2020 at 8:03 PM 赵锐 <875941708@qq.com&gt; wrote:
&gt;
&gt; The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
&gt; To make it easy to see, I attach another patch.
&gt; "RelationIdGetRelation" will increase ref on owner-&gt;relrefarr, without "RelationClose", the owner-&gt;relrefarr will enlarge and re-hash.
&gt; When the capacity of owner-&gt;relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
&gt; When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
&gt; After applying my patch, 10 billion will be published in 10 minutes.
&gt;

It is a clear relation descriptor leak. The proposed fix seems correct
to me. The patch wasn't getting applied to HEAD. So, I have prepared
the separate patches for HEAD and 13. There are minor modifications in
the patch like I have used RelationIsValid before closing the
relation. I have not added any test because I see that there is
already a test in src/test/subscription/t/013_partition.

Kindly let me know your English name so that I can give you credit as
a co-author?

--
With Regards,
Amit Kapila.

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Mark Zhao (#3)
Re: adding partitioned tables to publications

On Mon, Jan 11, 2021 at 5:44 PM Mark Zhao <875941708@qq.com> wrote:

Thanks for your reply. The patch is exactly what I want.
My English name is Mark Zhao, which should be the current email name.

Pushed the fix.

--
With Regards,
Amit Kapila.

#5Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#4)
Re: adding partitioned tables to publications

On Tue, Jan 12, 2021 at 5:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 11, 2021 at 5:44 PM Mark Zhao <875941708@qq.com> wrote:

Thanks for your reply. The patch is exactly what I want.
My English name is Mark Zhao, which should be the current email name.

Pushed the fix.

Thanks Amit and Mark. I hadn't realized at the time that the relation
descriptor leak was occurring, but good to see it tested and fixed.

--
Amit Langote
EDB: http://www.enterprisedb.com