Replica Identity check of partition table on subscriber

Started by shiy.fnst@fujitsu.comover 3 years ago48 messages
#1shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
3 attachment(s)

Hi hackers,

I saw a problem in logical replication, when the target table on subscriber is a
partitioned table, it only checks whether the Replica Identity of partitioned
table is consistent with the publisher, and doesn't check Replica Identity of
the partition.

For example:
-- publisher --
create table tbl (a int not null, b int);
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create publication pub for table tbl;

-- subscriber --
-- table tbl (parent table) has RI index, while table child has no RI index.
create table tbl (a int not null, b int) partition by range (a);
create table child partition of tbl default;
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into tbl values (11,11);
update tbl set a=a+1;

It caused an assertion failure on subscriber:
TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c", Line: 2088, PID: 1616523)

The backtrace is attached.

We got the assertion failure because idxoid is invalid, as table child has no
Replica Identity or Primary Key. We have a check in check_relation_updatable(),
but what it checked is table tbl (the parent table) and it passed the check.

I think one approach to fix it is to check the target partition in this case,
instead of the partitioned table.

When trying to fix it, I saw some other problems about updating partition map
cache.

a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
LogicalRepRelMapEntry.

b) In logicalrep_partition_open(), it didn't check if the entry is valid.

c) When the publisher send new relation mapping, only relation map cache will be
updated, and partition map cache wouldn't. I think it also should be updated
because it has remote relation information, too.

Attach two patches which tried to fix them.
0001 patch: fix the above three problems about partition map cache.
0002 patch: fix the assertion failure, check the Replica Identity of the
partition if the target table is a partitioned table.

Thanks to Hou Zhijie for helping me in the first patch.

I will add a test for it later if no one doesn't like this fix.

Regards,
Shi yu

Attachments:

v1-0001-Fix-partition-map-cache-issues.patchapplication/octet-stream; name=v1-0001-Fix-partition-map-cache-issues.patchDownload
From af85ead5481d844efe3ccd01e8c13ff4ad63cf85 Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Wed, 8 Jun 2022 11:10:21 +0800
Subject: [PATCH v1 1/2] Fix partition map cache issues.

1. Fix the bad structure in logicalrep_partmap_invalidate_cb().
2. Check whether the entry is valid in logicalrep_partition_open().
3. Update partition map cache when the publisher send new relation mapping.

Author: Shi yu, Hou Zhijie
---
 src/backend/replication/logical/relation.c | 119 ++++++++++++++-------
 1 file changed, 81 insertions(+), 38 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..9cc94067d5 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -147,6 +147,66 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 		pfree(entry->attrmap);
 }
 
+/*
+ * Update remote relation information in the relation map entry.
+ */
+static void
+logicalrep_update_remoterel(LogicalRepRelMapEntry *entry,
+							LogicalRepRelation *remoterel)
+{
+	int			i;
+
+	entry->remoterel.remoteid = remoterel->remoteid;
+	entry->remoterel.nspname = pstrdup(remoterel->nspname);
+	entry->remoterel.relname = pstrdup(remoterel->relname);
+	entry->remoterel.natts = remoterel->natts;
+	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+	for (i = 0; i < remoterel->natts; i++)
+	{
+		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+	}
+	entry->remoterel.replident = remoterel->replident;
+	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
+}
+
+/*
+ * Invalidate the existing entry in the partition map.
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ */
+static void
+logicalrep_partmap_invalidate(LogicalRepRelation *remoterel)
+{
+	MemoryContext oldctx;
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+
+		/* Make cached copy of the data */
+		oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
+		logicalrep_update_remoterel(entry, remoterel);
+		MemoryContextSwitchTo(oldctx);
+	}
+}
+
 /*
  * Add new entry or update existing entry in the relation map cache.
  *
@@ -159,7 +219,6 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 	MemoryContext oldctx;
 	LogicalRepRelMapEntry *entry;
 	bool		found;
-	int			i;
 
 	if (LogicalRepRelMap == NULL)
 		logicalrep_relmap_init();
@@ -177,20 +236,11 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 
 	/* Make cached copy of the data */
 	oldctx = MemoryContextSwitchTo(LogicalRepRelMapContext);
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
-	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
-	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
+	logicalrep_update_remoterel(entry, remoterel);
 	MemoryContextSwitchTo(oldctx);
+
+	/* Invalidate the corresponding partition map as well */
+	logicalrep_partmap_invalidate(remoterel);
 }
 
 /*
@@ -451,7 +501,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +514,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +531,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -545,31 +595,24 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
-
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!found)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+
+		part_entry->partoid = partOid;
+
+		/* Remote relation is copied as-is from the root entry. */
+		logicalrep_update_remoterel(entry, remoterel);
+
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
-- 
2.18.4

v1-0002-Check-partition-table-replica-identity-on-subscri.patchapplication/octet-stream; name=v1-0002-Check-partition-table-replica-identity-on-subscri.patchDownload
From c73f7262ffef170a0cf02a162bbc934a2e94817a Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Wed, 8 Jun 2022 11:11:44 +0800
Subject: [PATCH v1 2/2] Check partition table replica identity on subscriber

In logical replication, we will check if the target table on subscriber is
updatable. When the target table is a partitioned table, we should check the
target partition, instead of the partitioned table.

Author: Shi yu
---
 src/backend/replication/logical/relation.c | 114 +++++++++++----------
 src/backend/replication/logical/worker.c   |  27 +++--
 2 files changed, 83 insertions(+), 58 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 9cc94067d5..190d6d25aa 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -299,6 +299,64 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check that replica identity matches.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_check_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+		/*
+		 * If no replica identity index and no PK, the published table
+		 * must have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -357,7 +415,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -415,55 +472,8 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		/* be tidy */
 		bms_free(missingatts);
 
-		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
-		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		/* Check that replica identity matches. */
+		logicalrep_check_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -584,7 +594,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -648,7 +657,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Check that replica identity matches. */
+	logicalrep_check_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..4eee9c7bb6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -2118,6 +2125,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry;
+	AttrMap	   *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2149,8 +2158,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2158,6 +2170,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete. */
+	if(operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2179,15 +2199,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
-- 
2.18.4

backtrace.txttext/plain; name=backtrace.txtDownload
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#1)
Re: Replica Identity check of partition table on subscriber

On Wed, Jun 8, 2022 at 2:17 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

I saw a problem in logical replication, when the target table on subscriber is a
partitioned table, it only checks whether the Replica Identity of partitioned
table is consistent with the publisher, and doesn't check Replica Identity of
the partition.

...

It caused an assertion failure on subscriber:
TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c", Line: 2088, PID: 1616523)

The backtrace is attached.

We got the assertion failure because idxoid is invalid, as table child has no
Replica Identity or Primary Key. We have a check in check_relation_updatable(),
but what it checked is table tbl (the parent table) and it passed the check.

I can reproduce the problem. This seems to be the problem since commit
f1ac27bf (Add logical replication support to replicate into
partitioned tables), so adding Amit L. and Peter E.

I think one approach to fix it is to check the target partition in this case,
instead of the partitioned table.

This approach sounds reasonable to me. One minor point:
+/*
+ * Check that replica identity matches.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_check_updatable(LogicalRepRelMapEntry *entry)

Can we name this function as logicalrep_rel_mark_updatable as we are
doing that? If so, change the comments as well.

When trying to fix it, I saw some other problems about updating partition map
cache.

a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
LogicalRepRelMapEntry.

b) In logicalrep_partition_open(), it didn't check if the entry is valid.

c) When the publisher send new relation mapping, only relation map cache will be
updated, and partition map cache wouldn't. I think it also should be updated
because it has remote relation information, too.

Is there any test case that can show the problem due to your above observations?

--
With Regards,
Amit Kapila.

#3Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#2)
Re: Replica Identity check of partition table on subscriber

Hi Amit,

On Thu, Jun 9, 2022 at 8:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 8, 2022 at 2:17 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

I saw a problem in logical replication, when the target table on subscriber is a
partitioned table, it only checks whether the Replica Identity of partitioned
table is consistent with the publisher, and doesn't check Replica Identity of
the partition.

...

It caused an assertion failure on subscriber:
TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c", Line: 2088, PID: 1616523)

The backtrace is attached.

We got the assertion failure because idxoid is invalid, as table child has no
Replica Identity or Primary Key. We have a check in check_relation_updatable(),
but what it checked is table tbl (the parent table) and it passed the check.

I can reproduce the problem. This seems to be the problem since commit
f1ac27bf (Add logical replication support to replicate into
partitioned tables), so adding Amit L. and Peter E.

Thanks, I can see the problem.

I have looked at other mentioned problems with the code too and agree
they look like bugs.

Both patches look to be on the right track to fix the issues, but will
look more closely to see if I've anything to add.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#4shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#2)
2 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Thu, June 9, 2022 7:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think one approach to fix it is to check the target partition in this case,
instead of the partitioned table.

This approach sounds reasonable to me. One minor point:
+/*
+ * Check that replica identity matches.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_check_updatable(LogicalRepRelMapEntry *entry)

Can we name this function as logicalrep_rel_mark_updatable as we are
doing that? If so, change the comments as well.

OK. Modified.

When trying to fix it, I saw some other problems about updating partition

map

cache.

a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
LogicalRepRelMapEntry.

b) In logicalrep_partition_open(), it didn't check if the entry is valid.

c) When the publisher send new relation mapping, only relation map cache

will be

updated, and partition map cache wouldn't. I think it also should be updated
because it has remote relation information, too.

Is there any test case that can show the problem due to your above
observations?

Please see the following case.

-- publisher
create table tbl (a int primary key, b int);
create publication pub for table tbl;

-- subscriber
create table tbl (a int primary key, b int, c int) partition by range (a);
create table child partition of tbl default;

-- publisher, make cache
insert into tbl values (1,1);
update tbl set a=a+1;
alter table tbl add column c int;
update tbl set c=1 where a=2;

-- subscriber
postgres=# select * from tbl;
a | b | c
---+---+---
2 | 1 |
(1 row)

The value of column c updated failed on subscriber.
And after applying the first patch, it would work fine.

I have added this case to the first patch. Also add a test case for the second
patch.

Attach the new patches.

Regards,
Shi yu

Attachments:

v2-0001-Fix-partition-map-cache-issues.patchapplication/octet-stream; name=v2-0001-Fix-partition-map-cache-issues.patchDownload
From 0f066cefc8520af93ece7ea10618d037005af5dc Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Wed, 8 Jun 2022 11:10:21 +0800
Subject: [PATCH v2 1/2] Fix partition map cache issues.

1. Fix the bad structure in logicalrep_partmap_invalidate_cb().
2. Check whether the entry is valid in logicalrep_partition_open().
3. Update partition map cache when the publisher send new relation mapping.

Author: Shi yu, Hou Zhijie
---
 src/backend/replication/logical/relation.c | 120 ++++++++++++++-------
 src/test/subscription/t/013_partition.pl   |  48 ++++++++-
 2 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..85712fb0f4 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -147,6 +147,66 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 		pfree(entry->attrmap);
 }
 
+/*
+ * Update remote relation information in the relation map entry.
+ */
+static void
+logicalrep_update_remoterel(LogicalRepRelMapEntry *entry,
+							LogicalRepRelation *remoterel)
+{
+	int			i;
+
+	entry->remoterel.remoteid = remoterel->remoteid;
+	entry->remoterel.nspname = pstrdup(remoterel->nspname);
+	entry->remoterel.relname = pstrdup(remoterel->relname);
+	entry->remoterel.natts = remoterel->natts;
+	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+	for (i = 0; i < remoterel->natts; i++)
+	{
+		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+	}
+	entry->remoterel.replident = remoterel->replident;
+	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
+}
+
+/*
+ * Invalidate the existing entry in the partition map.
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ */
+static void
+logicalrep_partmap_invalidate(LogicalRepRelation *remoterel)
+{
+	MemoryContext oldctx;
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+
+		/* Make cached copy of the data */
+		oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
+		logicalrep_update_remoterel(entry, remoterel);
+		MemoryContextSwitchTo(oldctx);
+	}
+}
+
 /*
  * Add new entry or update existing entry in the relation map cache.
  *
@@ -159,7 +219,6 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 	MemoryContext oldctx;
 	LogicalRepRelMapEntry *entry;
 	bool		found;
-	int			i;
 
 	if (LogicalRepRelMap == NULL)
 		logicalrep_relmap_init();
@@ -177,20 +236,11 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 
 	/* Make cached copy of the data */
 	oldctx = MemoryContextSwitchTo(LogicalRepRelMapContext);
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
-	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
-	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
+	logicalrep_update_remoterel(entry, remoterel);
 	MemoryContextSwitchTo(oldctx);
+
+	/* Invalidate the corresponding partition map as well */
+	logicalrep_partmap_invalidate(remoterel);
 }
 
 /*
@@ -451,7 +501,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +514,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +531,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +584,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +594,24 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
-
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!found)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+
+		part_entry->partoid = partOid;
+
+		/* Remote relation is copied as-is from the root entry. */
+		logicalrep_update_remoterel(entry, remoterel);
+
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e7f4a94f19..b2183e0232 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -800,9 +800,49 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test the case that target table in subscriber is a partitioned table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+# Make cache.
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Alter table on publisher.
+$node_publisher->safe_psql('postgres',
+	"ALTER TABLE tab5 ADD COLUMN c INT");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(2|1|1), 'updates of tab5 replicated correctly after altering table on publisher');
 
 done_testing();
-- 
2.18.4

v2-0002-Check-partition-table-replica-identity-on-subscri.patchapplication/octet-stream; name=v2-0002-Check-partition-table-replica-identity-on-subscri.patchDownload
From a584fde8dac183e4c83fa6176ff8b81d8dcb6e80 Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Fri, 10 Jun 2022 16:29:52 +0800
Subject: [PATCH v2 2/2] Check partition table replica identity on subscriber

In logical replication, we will check if the target table on subscriber is
updatable. When the target table is a partitioned table, we should check the
target partition, instead of the partitioned table.

Author: Shi yu
---
 src/backend/replication/logical/relation.c | 113 +++++++++++----------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  14 +++
 3 files changed, 97 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 85712fb0f4..b614124227 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -299,6 +299,64 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the "updatable" flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+		/*
+		 * If no replica identity index and no PK, the published table
+		 * must have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -357,7 +415,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -415,55 +472,8 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		/* be tidy */
 		bms_free(missingatts);
 
-		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
-		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		/* Check that replica identity matches. */
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -647,7 +657,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Check that replica identity matches. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..4eee9c7bb6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -2118,6 +2125,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry;
+	AttrMap	   *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2149,8 +2158,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2158,6 +2170,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete. */
+	if(operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2179,15 +2199,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index b2183e0232..fde0b64a1b 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -845,4 +845,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(2|1|1), 'updates of tab5 replicated correctly after altering table on publisher');
 
+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(3|1|1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.18.4

#5Amit Langote
amitlangote09@gmail.com
In reply to: shiy.fnst@fujitsu.com (#1)
Re: Replica Identity check of partition table on subscriber

Hello,

On Wed, Jun 8, 2022 at 5:47 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

Hi hackers,

I saw a problem in logical replication, when the target table on subscriber is a
partitioned table, it only checks whether the Replica Identity of partitioned
table is consistent with the publisher, and doesn't check Replica Identity of
the partition.

For example:
-- publisher --
create table tbl (a int not null, b int);
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create publication pub for table tbl;

-- subscriber --
-- table tbl (parent table) has RI index, while table child has no RI index.
create table tbl (a int not null, b int) partition by range (a);
create table child partition of tbl default;
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into tbl values (11,11);
update tbl set a=a+1;

It caused an assertion failure on subscriber:
TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c", Line: 2088, PID: 1616523)

The backtrace is attached.

We got the assertion failure because idxoid is invalid, as table child has no
Replica Identity or Primary Key. We have a check in check_relation_updatable(),
but what it checked is table tbl (the parent table) and it passed the check.

I think one approach to fix it is to check the target partition in this case,
instead of the partitioned table.

That makes sense. A normal user update of a partitioned table will
only perform CheckCmdReplicaIdentity() for leaf partitions and the
logical replication updates should do the same. I may have been
confused at the time to think that ALTER TABLE REPLICA IDENTITY makes
sure that the replica identities of all relations in a partition tree
are forced to be the same at all times, though it seems that the patch
to do so [1]/messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql didn't actually get in. I guess adding a test case would
have helped.

When trying to fix it, I saw some other problems about updating partition map
cache.

a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
LogicalRepRelMapEntry.

Indeed.

b) In logicalrep_partition_open(), it didn't check if the entry is valid.

Yeah, that's bad. Actually, it seems that localrelvalid stuff for
LogicalRepRelMapEntry came in 3d65b0593c5, but it's likely we missed
in that commit that LogicalRepPartMapEntrys contain copies of, not
pointers to, the relevant parent's entry. This patch fixes that
oversight.

c) When the publisher send new relation mapping, only relation map cache will be
updated, and partition map cache wouldn't. I think it also should be updated
because it has remote relation information, too.

Yes, again a result of forgetting that the partition map entries have
copies of relation map entries.

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

+/*
+ * Invalidate the existing entry in the partition map.

I think logicalrep_partmap_invalidate() may update *multiple* entries,
because the hash table scan may find multiple PartMapEntrys containing
a copy of the RelMapEntry with given remoteid, that is, for multiple
partitions of a given local parent table mapped to that remote
relation. So, please fix the comment as:

Invalidate/Update the entries in the partition map that refer to 'remoterel'

Likewise:

+ /* Invalidate the corresponding partition map as well */

Maybe, this should say:

Also update all entries in the partition map that refer to 'remoterel'.

In 0002:

+logicalrep_check_updatable

+1 to Amit's suggestion to use "mark" instead of "check".

@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+   /*
+    * If it is a partitioned table, we don't check it, we will check its
+    * partition later.
+    */
+   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       return;

Why do this? I mean why if logicalrep_check_updatable() doesn't care
if the relation is partitioned or not -- it does all the work
regardless.

I suggest we don't add this check in check_relation_updatable().

+ /* Check if we can do the update or delete. */

Maybe mention "leaf partition", as:

Check if we can do the update or delete on the leaf partition.

BTW, the following hunk in patch 0002 should really be a part of 0001.

@@ -584,7 +594,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
Oid partOid = RelationGetRelid(partrel);
AttrMap *attrmap = root->attrmap;
bool found;
- int i;
MemoryContext oldctx;

if (LogicalRepPartMap == NULL)

Thanks to Hou Zhijie for helping me in the first patch.

Thank you both for the fixes.

I will add a test for it later

That would be very welcome.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/201902041630.gpadougzab7v@alvherre.pgsql

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#5)
Re: Replica Identity check of partition table on subscriber

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

I am thinking about why we need to update the relmap in this new
function logicalrep_partmap_invalidate()? I think it may be better to
do it in logicalrep_partition_open() when actually required,
otherwise, we end up doing a lot of work that may not be of use unless
the corresponding partition is accessed. Also, it seems awkward to me
that we do the same thing in this new function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

One more point about the 0001, it doesn't seem to have a test that
validates logicalrep_partmap_invalidate_cb() functionality. I think
for that we need to Alter the local table (table on the subscriber
side). Can we try to write a test for it?

--
With Regards,
Amit Kapila.

#7houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#6)
3 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Saturday, June 11, 2022 9:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com>
wrote:

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

I am thinking about why we need to update the relmap in this new function
logicalrep_partmap_invalidate()? I think it may be better to do it in
logicalrep_partition_open() when actually required, otherwise, we end up doing
a lot of work that may not be of use unless the corresponding partition is
accessed. Also, it seems awkward to me that we do the same thing in this new
function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

One more point about the 0001, it doesn't seem to have a test that validates
logicalrep_partmap_invalidate_cb() functionality. I think for that we need to Alter
the local table (table on the subscriber side). Can we try to write a test for it?

Thanks for Amit. L and Amit. K for your comments ! I agree with this point.
Here is the version patch set which try to address all these comments.

In addition, when reviewing the code, I found some other related
problems in the code.

1)
entry->attrmap = make_attrmap(map->maplen);
for (attno = 0; attno < entry->attrmap->maplen; attno++)
{
AttrNumber root_attno = map->attnums[attno];

entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
}

In this case, It’s possible that 'attno' points to a dropped column in which
case the root_attno would be '0'. I think in this case we should just set the
entry->attrmap->attnums[attno] to '-1' instead of accessing the
attrmap->attnums[]. I included this change in 0001 because the testcase which
can reproduce these problems are related(we need to ALTER the partition on
subscriber to reproduce it).

2)
if (entry->attrmap)
pfree(entry->attrmap);

I think we should use free_attrmap instead of pfree here to avoid memory leak.
And we also need to check the attrmap in logicalrep_rel_open() and
logicalrep_partition_open() and free it if needed. I am not sure shall we put this
in the 0001 patch, so attach a separate patch for this. We can merge later this if needed.

Best regards,
Hou zj

Attachments:

v3-0002-Check-partition-table-replica-identity-on-subscriber.patchapplication/octet-stream; name=v3-0002-Check-partition-table-replica-identity-on-subscriber.patchDownload
From 6346e91f251d6f237c805d4bcd8a82d3226100b4 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Sat, 11 Jun 2022 14:12:39 +0800
Subject: [PATCH] Check partition table replica identity on subscriber

In logical replication, we will check if the target table on subscriber is
updatable by comparing the replica identity of the table on publisher with the
table on subscriber. When the target table is a partitioned table, we should
check the replica identity key of target partition, instead of the partitioned
table.


Author: Shi yu, Hou Zhijie
---
 src/backend/replication/logical/relation.c | 121 +++++++++++++++++------------
 src/backend/replication/logical/worker.c   |  20 +++--
 src/test/subscription/t/013_partition.pl   |  14 ++++
 3 files changed, 98 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5928d13..6901ba5 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -295,6 +295,72 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 }
 
 /*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (entry->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+		/*
+		 * If no replica identity index and no PK, the published table
+		 * must have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
+/*
  * Open the local relation associated with the remote one.
  *
  * Rebuilds the Relcache mapping if it was invalidated by local DDL.
@@ -352,7 +418,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -410,55 +475,8 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		/* be tidy */
 		bms_free(missingatts);
 
-		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
-		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		/* Check that replica identity matches. */
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -641,7 +659,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Check that replica identity matches. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 81ce2e9..a1d0537 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2124,6 +2124,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry;
+	AttrMap	   *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2155,8 +2157,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2164,6 +2169,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if(operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2185,15 +2198,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index daff675..84b436f 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -868,4 +868,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on subscriber');
 
+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.7.2.windows.1

v3-0003-fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v3-0003-fix-memory-leak-about-attrmap.patchDownload
From 71ac8f2a7cb04e32dc37b3f7245e819c0b0335c9 Mon Sep 17 00:00:00 2001
From: "Hou Zhijie" <houzj.fnst@cn.fujitsu.com>
Date: Sat, 11 Jun 2022 16:37:51 +0800
Subject: [PATCH] fix memory leak about attrmap

Use free_attrmap instead of pfree to release AttrMap structure.
Check the attrmap again when opening the relation and clean up the
invalid AttrMap before rebuilding it.

---
 src/backend/replication/logical/relation.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 324b526..e6543ab 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -144,7 +144,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 	bms_free(remoterel->attkeys);
 
 	if (entry->attrmap)
-		pfree(entry->attrmap);
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
 }
 
 /*
@@ -389,6 +392,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		int			i;
 		Bitmapset  *missingatts;
 
+		/* cleanup the invalid attrmap */
+		if (entry->attrmap)
+		{
+			free_attrmap(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
@@ -621,6 +631,13 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		part_entry->partoid = partOid;
 	}
 
+	/* cleanup the invalid attrmap */
+	if (entry->attrmap)
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
+
 	/* Remote relation is copied as-is from the root entry. */
 	if (!entry->remoterel.remoteid)
 		logicalrep_update_remoterel(entry, remoterel);
-- 
2.7.2.windows.1

v3-0001-Fix-partition-map-cache-issues.patchapplication/octet-stream; name=v3-0001-Fix-partition-map-cache-issues.patchDownload
From e5fd69c0ec80cde40e3111fe580b8cb43ffabcda Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Wed, 8 Jun 2022 11:10:21 +0800
Subject: [PATCH] Fix partition map cache issues.

1. Fix the bad structure in logicalrep_partmap_invalidate_cb().
2. Check whether the entry is valid in logicalrep_partition_open() and rebuild
   the entry if not.
3. Invalidate partition map cache when the publisher send new relation mapping.
4. Fix the column map build for dropped column in partition.

Author: Shi yu, Hou Zhijie
---
 src/backend/replication/logical/relation.c | 120 +++++++++++++++++++----------
 src/backend/replication/logical/worker.c   |   6 ++
 src/include/replication/logicalrelation.h  |   1 +
 src/test/subscription/t/013_partition.pl   |  71 ++++++++++++++++-
 4 files changed, 154 insertions(+), 44 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561..34c2f53 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -148,6 +148,30 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 }
 
 /*
+ * Update remote relation information in the relation map entry.
+ */
+static void
+logicalrep_update_remoterel(LogicalRepRelMapEntry *entry,
+							LogicalRepRelation *remoterel)
+{
+	int			i;
+
+	entry->remoterel.remoteid = remoterel->remoteid;
+	entry->remoterel.nspname = pstrdup(remoterel->nspname);
+	entry->remoterel.relname = pstrdup(remoterel->relname);
+	entry->remoterel.natts = remoterel->natts;
+	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+	for (i = 0; i < remoterel->natts; i++)
+	{
+		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+	}
+	entry->remoterel.replident = remoterel->replident;
+	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
+}
+
+/*
  * Add new entry or update existing entry in the relation map cache.
  *
  * Called when new relation mapping is sent by the publisher to update
@@ -159,7 +183,6 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 	MemoryContext oldctx;
 	LogicalRepRelMapEntry *entry;
 	bool		found;
-	int			i;
 
 	if (LogicalRepRelMap == NULL)
 		logicalrep_relmap_init();
@@ -177,19 +200,7 @@ logicalrep_relmap_update(LogicalRepRelation *remoterel)
 
 	/* Make cached copy of the data */
 	oldctx = MemoryContextSwitchTo(LogicalRepRelMapContext);
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
-	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
-	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
+	logicalrep_update_remoterel(entry, remoterel);
 	MemoryContextSwitchTo(oldctx);
 }
 
@@ -451,7 +462,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +475,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +492,42 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
+	}
+}
+
+/*
+ * Invalidate the entries in the partition map that refer to remoterel
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to save
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_invalidate(LogicalRepRelation *remoterel)
+{
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
 	}
 }
 
@@ -534,7 +579,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +589,23 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
-
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!found)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
+
+	/* Remote relation is copied as-is from the root entry. */
+	if (!entry->remoterel.remoteid)
+		logicalrep_update_remoterel(entry, remoterel);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +630,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9..81ce2e9 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1562,6 +1562,12 @@ apply_handle_relation(StringInfo s)
 
 	rel = logicalrep_read_rel(s);
 	logicalrep_relmap_update(rel);
+
+	/*
+	 * Also invalidate all entries in the partition map that refer to
+	 * remoterel.
+	 */
+	logicalrep_partmap_invalidate(rel);
 }
 
 /*
diff --git a/src/include/replication/logicalrelation.h b/src/include/replication/logicalrelation.h
index 7bf8cd2..71a1fa7 100644
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
@@ -38,6 +38,7 @@ typedef struct LogicalRepRelMapEntry
 } LogicalRepRelMapEntry;
 
 extern void logicalrep_relmap_update(LogicalRepRelation *remoterel);
+extern void logicalrep_partmap_invalidate(LogicalRepRelation *remoterel);
 
 extern LogicalRepRelMapEntry *logicalrep_rel_open(LogicalRepRelId remoteid,
 												  LOCKMODE lockmode);
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e7f4a94..daff675 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -800,9 +800,72 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test the case that target table on subscriber is a partitioned table and
+# check that the changes are replicated correctly after changing the schema of
+# table on publisher and subcriber.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of table on publisher
+$node_publisher->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DROP COLUMN b, ADD COLUMN c INT;
+	ALTER TABLE tab5 ADD COLUMN b INT;});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(2||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on subscriber');
 
 done_testing();
-- 
2.7.2.windows.1

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: houzj.fnst@fujitsu.com (#7)
Re: Replica Identity check of partition table on subscriber

On Sat, Jun 11, 2022 at 2:36 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Saturday, June 11, 2022 9:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com>
wrote:

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

I am thinking about why we need to update the relmap in this new function
logicalrep_partmap_invalidate()? I think it may be better to do it in
logicalrep_partition_open() when actually required, otherwise, we end up doing
a lot of work that may not be of use unless the corresponding partition is
accessed. Also, it seems awkward to me that we do the same thing in this new
function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

One more point about the 0001, it doesn't seem to have a test that validates
logicalrep_partmap_invalidate_cb() functionality. I think for that we need to Alter
the local table (table on the subscriber side). Can we try to write a test for it?

Thanks for Amit. L and Amit. K for your comments ! I agree with this point.
Here is the version patch set which try to address all these comments.

In addition, when reviewing the code, I found some other related
problems in the code.

1)
entry->attrmap = make_attrmap(map->maplen);
for (attno = 0; attno < entry->attrmap->maplen; attno++)
{
AttrNumber root_attno = map->attnums[attno];

entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
}

In this case, It’s possible that 'attno' points to a dropped column in which
case the root_attno would be '0'. I think in this case we should just set the
entry->attrmap->attnums[attno] to '-1' instead of accessing the
attrmap->attnums[]. I included this change in 0001 because the testcase which
can reproduce these problems are related(we need to ALTER the partition on
subscriber to reproduce it).

Hmm, this appears to be a different issue. Can we separate out the
bug-fix code for the subscriber-side issue caused by the DDL on the
subscriber?

Few other comments:
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to save
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_invalidate(LogicalRepRelation *remoterel)

/to save/to avoid

Also, I agree with Amit L. that it is confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb() and both have somewhat different
kinds of logic. So, we can either name it as
logicalrep_partmap_reset_relmap() or logicalrep_partmap_update()
unless you have any other better suggestions? Accordingly, change the
comment atop this function.

--
With Regards,
Amit Kapila.

#9houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#8)
4 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Monday, June 13, 2022 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Jun 11, 2022 at 2:36 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Saturday, June 11, 2022 9:36 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote
<amitlangote09@gmail.com>
wrote:

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

I am thinking about why we need to update the relmap in this new
function logicalrep_partmap_invalidate()? I think it may be better
to do it in
logicalrep_partition_open() when actually required, otherwise, we
end up doing a lot of work that may not be of use unless the
corresponding partition is accessed. Also, it seems awkward to me
that we do the same thing in this new function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

One more point about the 0001, it doesn't seem to have a test that
validates
logicalrep_partmap_invalidate_cb() functionality. I think for that
we need to Alter the local table (table on the subscriber side). Can we try to

write a test for it?

Thanks for Amit. L and Amit. K for your comments ! I agree with this point.
Here is the version patch set which try to address all these comments.

In addition, when reviewing the code, I found some other related
problems in the code.

1)
entry->attrmap = make_attrmap(map->maplen);
for (attno = 0; attno < entry->attrmap->maplen; attno++)
{
AttrNumber root_attno =

map->attnums[attno];

entry->attrmap->attnums[attno] =

attrmap->attnums[root_attno - 1];

}

In this case, It’s possible that 'attno' points to a dropped column in
which case the root_attno would be '0'. I think in this case we should
just set the
entry->attrmap->attnums[attno] to '-1' instead of accessing the
attrmap->attnums[]. I included this change in 0001 because the
attrmap->testcase which
can reproduce these problems are related(we need to ALTER the
partition on subscriber to reproduce it).

Hmm, this appears to be a different issue. Can we separate out the bug-fix
code for the subscriber-side issue caused by the DDL on the subscriber?

Few other comments:
+ * Note that we don't update the remoterel information in the entry
+here,
+ * we will update the information in logicalrep_partition_open to save
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_invalidate(LogicalRepRelation *remoterel)

/to save/to avoid

Also, I agree with Amit L. that it is confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb() and both have somewhat different kinds of
logic. So, we can either name it as
logicalrep_partmap_reset_relmap() or logicalrep_partmap_update() unless you
have any other better suggestions? Accordingly, change the comment atop
this function.

Thanks for the comments.

I have separated out the bug-fix for the subscriber-side.
And fix the typo and function name.
Attach the new version patch set.

Best regards,
Hou zj

Attachments:

v4-0004-fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v4-0004-fix-memory-leak-about-attrmap.patchDownload
From 0231208c6ee8e61461a822c1d41b9c4021efed2d Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:42:55 +0800
Subject: [PATCH] fix memory leak about attrmap

Use free_attrmap instead of pfree to release AttrMap structure.
Check the attrmap again when opening the relation and clean up the
invalid AttrMap before rebuilding it.

---
 src/backend/replication/logical/relation.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index f763a38..4e0c644 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -144,7 +144,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 	bms_free(remoterel->attkeys);
 
 	if (entry->attrmap)
-		pfree(entry->attrmap);
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
 }
 
 /*
@@ -377,6 +380,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		int			i;
 		Bitmapset  *missingatts;
 
+		/* cleanup the invalid attrmap */
+		if (entry->attrmap)
+		{
+			free_attrmap(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
@@ -609,6 +619,13 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		part_entry->partoid = partOid;
 	}
 
+	/* cleanup the invalid attrmap */
+	if (entry->attrmap)
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
+
 	if (!entry->remoterel.remoteid)
 	{
 		int	i;
-- 
2.7.2.windows.1

v4-0001-Fix-partition-map-cache-invalidation-on-subscriber.patchapplication/octet-stream; name=v4-0001-Fix-partition-map-cache-invalidation-on-subscriber.patchDownload
From 8d0b799dbde9ef1b4d651c827759c56092ef4221 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:06:05 +0800
Subject: [PATCH] Fix partition map cache invalidation on subscriber

There are serval issues on subscriber's code which prevent the partition
map cache from being properly rebuilt.

When invalidating the entry in callback function
logicalrep_partmap_invalidate_cb(), we cast the hash entry to wrong type.
Fix it by using the correct entry type.

Besides, we lack the valid check in the beginning of
logicalrep_partition_open(). Add this check and rebuild the entry if it's
no longer valid.

In addition, when building partition's column map, we missed the check for
dropped column which cause cache lookup error. Fix this by ignoring the
dropped column.

---
 src/backend/replication/logical/relation.c | 62 ++++++++++++++++++------------
 src/test/subscription/t/013_partition.pl   | 57 +++++++++++++++++++++++++--
 2 files changed, 90 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561..c1de920 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -451,7 +451,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +464,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +481,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +534,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +544,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int	i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +602,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e7f4a94..85dd7f5 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -800,9 +800,58 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test the case that target table on subscriber is a partitioned table and
+# check that the changes are replicated correctly after changing the schema of
+# table on subcriber.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
 done_testing();
-- 
2.7.2.windows.1

v4-0003-Check-partition-table-replica-identity-on-subscriber.patchapplication/octet-stream; name=v4-0003-Check-partition-table-replica-identity-on-subscriber.patchDownload
From 09257f7f0680157a470014c3b5b7cc64424ec3d0 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:40:58 +0800
Subject: [PATCH] Check partition table replica identity on subscriber

In logical replication, we will check if the target table on subscriber is
updatable by comparing the replica identity of the table on publisher with the
table on subscriber. When the target table is a partitioned table, we should
check the replica identity key of target partition, instead of the partitioned
table.

---
 src/backend/replication/logical/relation.c | 121 +++++++++++++++++------------
 src/backend/replication/logical/worker.c   |  20 +++--
 src/test/subscription/t/013_partition.pl   |  14 ++++
 3 files changed, 98 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 81c73a0..f763a38 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,72 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 }
 
 /*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (entry->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+		/*
+		 * If no replica identity index and no PK, the published table
+		 * must have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
+/*
  * Open the local relation associated with the remote one.
  *
  * Rebuilds the Relcache mapping if it was invalidated by local DDL.
@@ -306,7 +372,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -364,55 +429,8 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		/* be tidy */
 		bms_free(missingatts);
 
-		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
-		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		/* Check that replica identity matches. */
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -650,7 +668,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Check that replica identity matches. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 7c28da3..c57f8d6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2123,6 +2123,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry;
+	AttrMap	   *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2154,8 +2156,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2163,6 +2168,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if(operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2184,15 +2197,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 1f9af6f..97ed834 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -868,4 +868,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
 
+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.7.2.windows.1

v4-0002-Reset-partition-map-cache-when-receiving-new-relatio.patchapplication/octet-stream; name=v4-0002-Reset-partition-map-cache-when-receiving-new-relatio.patchDownload
From c629f16a5280a6d508ccf37bd0b91bb1b5798e69 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:39:18 +0800
Subject: [PATCH] Reset partition map cache when receiving new relation mapping
 from publisher

Reset partition map cache when the publisher sends new relation mapping so
that the partition's column map can be rebuilt correctly.

---
 src/backend/replication/logical/relation.c | 34 ++++++++++++++++++++++++++++++
 src/backend/replication/logical/worker.c   |  5 +++++
 src/include/replication/logicalrelation.h  |  1 +
 src/test/subscription/t/013_partition.pl   | 14 ++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index c1de920..decd8b2 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -487,6 +487,40 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 }
 
 /*
+ * Reset the entries in the partition map that refer to remoterel
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+	}
+}
+
+/*
  * Initialize the partition map cache.
  */
 static void
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9..7c28da3 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1562,6 +1562,11 @@ apply_handle_relation(StringInfo s)
 
 	rel = logicalrep_read_rel(s);
 	logicalrep_relmap_update(rel);
+
+	/*
+	 * Also reset all entries in the partition map that refer to remoterel.
+	 */
+	logicalrep_partmap_reset_relmap(rel);
 }
 
 /*
diff --git a/src/include/replication/logicalrelation.h b/src/include/replication/logicalrelation.h
index 7bf8cd2..78cd7e7 100644
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
@@ -38,6 +38,7 @@ typedef struct LogicalRepRelMapEntry
 } LogicalRepRelMapEntry;
 
 extern void logicalrep_relmap_update(LogicalRepRelation *remoterel);
+extern void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel);
 
 extern LogicalRepRelMapEntry *logicalrep_rel_open(LogicalRepRelId remoteid,
 												  LOCKMODE lockmode);
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 85dd7f5..1f9af6f 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -854,4 +854,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
+# Change the column order of table on publisher
+$node_publisher->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DROP COLUMN b, ADD COLUMN c INT;
+	ALTER TABLE tab5 ADD COLUMN b INT;});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
 done_testing();
-- 
2.7.2.windows.1

#10Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#6)
Re: Replica Identity check of partition table on subscriber

On Sat, Jun 11, 2022 at 10:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

I am thinking about why we need to update the relmap in this new
function logicalrep_partmap_invalidate()? I think it may be better to
do it in logicalrep_partition_open() when actually required,
otherwise, we end up doing a lot of work that may not be of use unless
the corresponding partition is accessed. Also, it seems awkward to me
that we do the same thing in this new function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

Both logicalrep_rel_open() and logicalrel_partition_open() only ever
touch the local Relation, never the LogicalRepRelation. Updating the
latter is the responsibility of logicalrep_relmap_update(), which is
there to support handling of the RELATION message by
apply_handle_relation(). Given that we make a separate copy of the
parent's LogicalRepRelMapEntry for each partition to put into the
corresponding LogicalRepPartMapEntry, those copies must be updated as
well when a RELATION message targeting the parent's entry arrives. So
it seems fine that the patch is making it the
logicalrep_relmap_update()'s responsibility to update the partition
copies using the new logicalrep_partition_invalidate/update()
subroutine.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#10)
Re: Replica Identity check of partition table on subscriber

On Mon, Jun 13, 2022 at 2:20 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Jun 11, 2022 at 10:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

I am thinking about why we need to update the relmap in this new
function logicalrep_partmap_invalidate()? I think it may be better to
do it in logicalrep_partition_open() when actually required,
otherwise, we end up doing a lot of work that may not be of use unless
the corresponding partition is accessed. Also, it seems awkward to me
that we do the same thing in this new function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

Both logicalrep_rel_open() and logicalrel_partition_open() only ever
touch the local Relation, never the LogicalRepRelation.

We do make the copy of remote rel in logicalrel_partition_open() when
the entry is not found. I feel the same should happen when remote
relation is reset/invalidated by the RELATION message.

Updating the
latter is the responsibility of logicalrep_relmap_update(), which is
there to support handling of the RELATION message by
apply_handle_relation(). Given that we make a separate copy of the
parent's LogicalRepRelMapEntry for each partition to put into the
corresponding LogicalRepPartMapEntry, those copies must be updated as
well when a RELATION message targeting the parent's entry arrives. So
it seems fine that the patch is making it the
logicalrep_relmap_update()'s responsibility to update the partition
copies using the new logicalrep_partition_invalidate/update()
subroutine.

I think we can do that way as well but do you see any benefit in it?
The way I am suggesting will avoid the effort of updating the remote
rel copy till we try to access that particular partition.

--
With Regards,
Amit Kapila.

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: houzj.fnst@fujitsu.com (#9)
1 attachment(s)
Re: Replica Identity check of partition table on subscriber

On Mon, Jun 13, 2022 at 1:03 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Monday, June 13, 2022 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I have separated out the bug-fix for the subscriber-side.
And fix the typo and function name.
Attach the new version patch set.

The first patch looks good to me. I have slightly modified one of the
comments and the commit message. I think we need to backpatch this
through 13 where we introduced support to replicate into partitioned
tables (commit f1ac27bf). If you guys are fine, I'll push this once
the work for PG14.4 is done.

--
With Regards,
Amit Kapila.

Attachments:

v5-0001-Fix-cache-look-up-failures-while-applying-changes.patchapplication/octet-stream; name=v5-0001-Fix-cache-look-up-failures-while-applying-changes.patchDownload
From 4e80dc908ea1c517f80bdf8958d05e4510de88e9 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:06:05 +0800
Subject: [PATCH v5] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 57 ++++++++++++++++++--
 2 files changed, 90 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..b12f569702 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -451,7 +451,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +464,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +481,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +534,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +544,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int	i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +602,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e7f4a94f19..85dd7f5dd5 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -800,9 +800,58 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test the case that target table on subscriber is a partitioned table and
+# check that the changes are replicated correctly after changing the schema of
+# table on subcriber.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
 done_testing();
-- 
2.28.0.windows.1

#13Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#12)
Re: Replica Identity check of partition table on subscriber

On Mon, Jun 13, 2022 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 13, 2022 at 1:03 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Monday, June 13, 2022 1:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I have separated out the bug-fix for the subscriber-side.
And fix the typo and function name.
Attach the new version patch set.

The first patch looks good to me. I have slightly modified one of the
comments and the commit message. I think we need to backpatch this
through 13 where we introduced support to replicate into partitioned
tables (commit f1ac27bf). If you guys are fine, I'll push this once
the work for PG14.4 is done.

Both the code changes and test cases look good to me. Just a couple
of minor nitpicks with test changes:

+   CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+   ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+   ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});

Not sure if we follow it elsewhere, but should we maybe avoid using
the internally generated index name as in the partition's case above?

+# Test the case that target table on subscriber is a partitioned table and
+# check that the changes are replicated correctly after changing the schema of
+# table on subcriber.

The first sentence makes it sound like the tests that follow are the
first ones in the file where the target table is partitioned, which is
not true, so I think we should drop that part. Also how about being
more specific about the test intent, say:

Test that replication continues to work correctly after altering the
partition of a partitioned target table.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#14Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#11)
Re: Replica Identity check of partition table on subscriber

On Mon, Jun 13, 2022 at 6:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jun 13, 2022 at 2:20 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Jun 11, 2022 at 10:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

+logicalrep_partmap_invalidate

I wonder why not call this logicalrep_partmap_update() to go with
logicalrep_relmap_update()? It seems confusing to have
logicalrep_partmap_invalidate() right next to
logicalrep_partmap_invalidate_cb().

I am thinking about why we need to update the relmap in this new
function logicalrep_partmap_invalidate()? I think it may be better to
do it in logicalrep_partition_open() when actually required,
otherwise, we end up doing a lot of work that may not be of use unless
the corresponding partition is accessed. Also, it seems awkward to me
that we do the same thing in this new function
logicalrep_partmap_invalidate() and then also in
logicalrep_partition_open() under different conditions.

Both logicalrep_rel_open() and logicalrel_partition_open() only ever
touch the local Relation, never the LogicalRepRelation.

We do make the copy of remote rel in logicalrel_partition_open() when
the entry is not found. I feel the same should happen when remote
relation is reset/invalidated by the RELATION message.

Hmm, the problem is that a RELATION message will only invalidate the
LogicalRepRelation portion of the target parent's entry, while any
copies that have been made for partitions that were opened till that
point will continue to have the old LogicalRepRelation information.
As things stand, logicalrep_partition_open() won't know that the
parent entry's LogicalRepRelation may have been modified due to a
RELATION message. It will reconstruct the entry only if the partition
itself was modified locally, that is, if
logicalrep_partman_invalidate_cb() was called on the partition.

Updating the
latter is the responsibility of logicalrep_relmap_update(), which is
there to support handling of the RELATION message by
apply_handle_relation(). Given that we make a separate copy of the
parent's LogicalRepRelMapEntry for each partition to put into the
corresponding LogicalRepPartMapEntry, those copies must be updated as
well when a RELATION message targeting the parent's entry arrives. So
it seems fine that the patch is making it the
logicalrep_relmap_update()'s responsibility to update the partition
copies using the new logicalrep_partition_invalidate/update()
subroutine.

I think we can do that way as well but do you see any benefit in it?
The way I am suggesting will avoid the effort of updating the remote
rel copy till we try to access that particular partition.

I don't see any benefit as such to doing it the way the patch does,
it's just that that seems to be the only way to go given the way
things are.

This would have been unnecessary, for example, if the relation map
entry had contained a LogicalRepRelation pointer instead of the
struct. The partition entries would point to the same entry as the
parent's if that were the case and there would be no need to modify
the partitions' copies explicitly.

Am I missing something?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#15Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#14)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 14, 2022 at 3:31 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jun 13, 2022 at 6:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think we can do that way as well but do you see any benefit in it?
The way I am suggesting will avoid the effort of updating the remote
rel copy till we try to access that particular partition.

I don't see any benefit as such to doing it the way the patch does,
it's just that that seems to be the only way to go given the way
things are.

Oh, I see that v4-0002 has this:

+/*
+ * Reset the entries in the partition map that refer to remoterel
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+   HASH_SEQ_STATUS status;
+   LogicalRepPartMapEntry *part_entry;
+   LogicalRepRelMapEntry *entry;
+
+   if (LogicalRepPartMap == NULL)
+       return;
+
+   hash_seq_init(&status, LogicalRepPartMap);
+   while ((part_entry = (LogicalRepPartMapEntry *)
hash_seq_search(&status)) != NULL)
+   {
+       entry = &part_entry->relmapentry;
+
+       if (entry->remoterel.remoteid != remoterel->remoteid)
+           continue;
+
+       logicalrep_relmap_free_entry(entry);
+
+       memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+   }
+}

The previous versions would also call logicalrep_relmap_update() on
the entry after the memset, which is no longer done, so that is indeed
saving useless work. I also see that both logicalrep_relmap_update()
and the above function basically invalidate the whole
LogicalRepRelMapEntry before setting the new remote relation info so
that the next logicaprep_rel_open() or logicalrep_partition_open()
have to refill the other members too.

Though, I thought maybe you were saying that we shouldn't need this
function for resetting partitions in the first place, which I guess
you weren't.

v4-0002 looks good btw, except the bitpick about test comment similar
to my earlier comment regarding v5-0001:

+# Change the column order of table on publisher

I think it might be better to say something specific to describe the
test intent, like:

Test that replication into partitioned target table continues to works
correctly when the published table is altered

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#16shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Langote (#13)
6 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Tue, Jun 14, 2022 2:18 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jun 13, 2022 at 9:26 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Mon, Jun 13, 2022 at 1:03 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Monday, June 13, 2022 1:53 PM Amit Kapila

<amit.kapila16@gmail.com> wrote:

I have separated out the bug-fix for the subscriber-side.
And fix the typo and function name.
Attach the new version patch set.

The first patch looks good to me. I have slightly modified one of the
comments and the commit message. I think we need to backpatch this
through 13 where we introduced support to replicate into partitioned
tables (commit f1ac27bf). If you guys are fine, I'll push this once
the work for PG14.4 is done.

Both the code changes and test cases look good to me. Just a couple
of minor nitpicks with test changes:

Thanks for your comments.

+   CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+   ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+   ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});

Not sure if we follow it elsewhere, but should we maybe avoid using
the internally generated index name as in the partition's case above?

I saw some existing tests also use internally generated index name (e.g.
replica_identity.sql, ddl.sql and 031_column_list.pl), so maybe it's better to
fix them all in a separate patch. I didn't change this.

+# Test the case that target table on subscriber is a partitioned table and
+# check that the changes are replicated correctly after changing the schema
of
+# table on subcriber.

The first sentence makes it sound like the tests that follow are the
first ones in the file where the target table is partitioned, which is
not true, so I think we should drop that part. Also how about being
more specific about the test intent, say:

Test that replication continues to work correctly after altering the
partition of a partitioned target table.

OK, modified.

Attached the new version of patch set, and the patches for pg14 and pg13.

Regards,
Shi yu

Attachments:

v6-0001-Fix-cache-look-up-failures-while-applying-changes.patchapplication/octet-stream; name=v6-0001-Fix-cache-look-up-failures-while-applying-changes.patchDownload
From 6adcc1b4ee7456940dc23014adf0c703ac4e5b88 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:06:05 +0800
Subject: [PATCH v6 1/4] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 56 +++++++++++++++++--
 2 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..b12f569702 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -451,7 +451,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +464,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +481,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +534,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +544,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int	i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +602,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e7f4a94f19..06f9215018 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -800,9 +800,57 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test that replication continues to work correctly after altering the
+# partition of a partitioned target table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
 done_testing();
-- 
2.24.0.windows.2

v6-0002-Reset-partition-map-cache-when-receiving-new-rela.patchapplication/octet-stream; name=v6-0002-Reset-partition-map-cache-when-receiving-new-rela.patchDownload
From 122507ca283f2f983f5113707bc067585ca8a536 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:39:18 +0800
Subject: [PATCH v6 2/4] Reset partition map cache when receiving new relation
 mapping from publisher

Reset partition map cache when the publisher sends new relation mapping so
that the partition's column map can be rebuilt correctly.
---
 src/backend/replication/logical/relation.c | 34 ++++++++++++++++++++++
 src/backend/replication/logical/worker.c   |  5 ++++
 src/include/replication/logicalrelation.h  |  1 +
 src/test/subscription/t/013_partition.pl   | 14 +++++++++
 4 files changed, 54 insertions(+)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index b12f569702..b7b740361c 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -486,6 +486,40 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 	}
 }
 
+/*
+ * Reset the entries in the partition map that refer to remoterel
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+	}
+}
+
 /*
  * Initialize the partition map cache.
  */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..7c28da3714 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1562,6 +1562,11 @@ apply_handle_relation(StringInfo s)
 
 	rel = logicalrep_read_rel(s);
 	logicalrep_relmap_update(rel);
+
+	/*
+	 * Also reset all entries in the partition map that refer to remoterel.
+	 */
+	logicalrep_partmap_reset_relmap(rel);
 }
 
 /*
diff --git a/src/include/replication/logicalrelation.h b/src/include/replication/logicalrelation.h
index 7bf8cd22bd..78cd7e77f5 100644
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
@@ -38,6 +38,7 @@ typedef struct LogicalRepRelMapEntry
 } LogicalRepRelMapEntry;
 
 extern void logicalrep_relmap_update(LogicalRepRelation *remoterel);
+extern void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel);
 
 extern LogicalRepRelMapEntry *logicalrep_rel_open(LogicalRepRelId remoteid,
 												  LOCKMODE lockmode);
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 06f9215018..071a9d7b27 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -853,4 +853,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
+# Change the column order of table on publisher
+$node_publisher->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DROP COLUMN b, ADD COLUMN c INT;
+	ALTER TABLE tab5 ADD COLUMN b INT;});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
 done_testing();
-- 
2.24.0.windows.2

v6-0003-Check-partition-table-replica-identity-on-subscri.patchapplication/octet-stream; name=v6-0003-Check-partition-table-replica-identity-on-subscri.patchDownload
From 954965d2fd554687e9b9b792ded843c719112d00 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:40:58 +0800
Subject: [PATCH v6 3/4] Check partition table replica identity on subscriber

In logical replication, we will check if the target table on subscriber is
updatable by comparing the replica identity of the table on publisher with the
table on subscriber. When the target table is a partitioned table, we should
check the replica identity key of target partition, instead of the partitioned
table.
---
 src/backend/replication/logical/relation.c | 121 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  20 +++-
 src/test/subscription/t/013_partition.pl   |  14 +++
 3 files changed, 98 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index b7b740361c..1a9ed664dd 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,72 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (entry->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+		/*
+		 * If no replica identity index and no PK, the published table
+		 * must have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -307,7 +373,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -365,55 +430,8 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		/* be tidy */
 		bms_free(missingatts);
 
-		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
-		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		/* Check that replica identity matches. */
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -651,7 +669,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Check that replica identity matches. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 7c28da3714..c57f8d678b 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2123,6 +2123,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap	   *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2154,8 +2156,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2163,6 +2168,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if(operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2184,15 +2197,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 071a9d7b27..a1be3bc7c4 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -867,4 +867,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
 
+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.24.0.windows.2

v6-0004-fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v6-0004-fix-memory-leak-about-attrmap.patchDownload
From e52b5664f4ab5564ac0d2e2c9b238348f4a1e038 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:42:55 +0800
Subject: [PATCH v6 4/4] fix memory leak about attrmap

Use free_attrmap instead of pfree to release AttrMap structure.
Check the attrmap again when opening the relation and clean up the
invalid AttrMap before rebuilding it.
---
 src/backend/replication/logical/relation.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 1a9ed664dd..06be13504a 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -144,7 +144,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 	bms_free(remoterel->attkeys);
 
 	if (entry->attrmap)
-		pfree(entry->attrmap);
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
 }
 
 /*
@@ -378,6 +381,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		int			i;
 		Bitmapset  *missingatts;
 
+		/* cleanup the invalid attrmap */
+		if (entry->attrmap)
+		{
+			free_attrmap(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
@@ -610,6 +620,13 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		part_entry->partoid = partOid;
 	}
 
+	/* cleanup the invalid attrmap */
+	if (entry->attrmap)
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
+
 	if (!entry->remoterel.remoteid)
 	{
 		int	i;
-- 
2.24.0.windows.2

v6-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patchapplication/octet-stream; name=v6-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patchDownload
From 8adb1747c2349e1055ee759cfca865775e209ddf Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Tue, 14 Jun 2022 14:38:48 +0800
Subject: [PATCH v613] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 58 ++++++++++++++++++--
 2 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 901bff9974..24888479a0 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -418,7 +418,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -431,11 +431,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -448,8 +448,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -502,7 +502,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -513,31 +512,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int	i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -562,7 +570,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 7689cbb364..4eef52df77 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 67;
+use Test::More tests => 69;
 
 # setup
 
@@ -786,7 +786,55 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test that replication continues to work correctly after altering the
+# partition of a partitioned target table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
-- 
2.18.4

v6-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patchapplication/octet-stream; name=v6-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patchDownload
From 2120c23add0dfb94a4531f2cac46865c5270c7d9 Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Tue, 14 Jun 2022 14:01:54 +0800
Subject: [PATCH v614] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 58 ++++++++++++++++++--
 2 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index c37e2a7e29..7b18ca650a 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -451,7 +451,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +464,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +481,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +534,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +544,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int	i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +602,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 39946c735b..e53bc5b568 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 67;
+use Test::More tests => 69;
 
 # setup
 
@@ -789,7 +789,55 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test that replication continues to work correctly after altering the
+# partition of a partitioned target table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
-- 
2.18.4

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#15)
2 attachment(s)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 14, 2022 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jun 14, 2022 at 3:31 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jun 13, 2022 at 6:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I think we can do that way as well but do you see any benefit in it?
The way I am suggesting will avoid the effort of updating the remote
rel copy till we try to access that particular partition.

I don't see any benefit as such to doing it the way the patch does,
it's just that that seems to be the only way to go given the way
things are.

Oh, I see that v4-0002 has this:

+/*
+ * Reset the entries in the partition map that refer to remoterel
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+   HASH_SEQ_STATUS status;
+   LogicalRepPartMapEntry *part_entry;
+   LogicalRepRelMapEntry *entry;
+
+   if (LogicalRepPartMap == NULL)
+       return;
+
+   hash_seq_init(&status, LogicalRepPartMap);
+   while ((part_entry = (LogicalRepPartMapEntry *)
hash_seq_search(&status)) != NULL)
+   {
+       entry = &part_entry->relmapentry;
+
+       if (entry->remoterel.remoteid != remoterel->remoteid)
+           continue;
+
+       logicalrep_relmap_free_entry(entry);
+
+       memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+   }
+}

The previous versions would also call logicalrep_relmap_update() on
the entry after the memset, which is no longer done, so that is indeed
saving useless work. I also see that both logicalrep_relmap_update()
and the above function basically invalidate the whole
LogicalRepRelMapEntry before setting the new remote relation info so
that the next logicaprep_rel_open() or logicalrep_partition_open()
have to refill the other members too.

Though, I thought maybe you were saying that we shouldn't need this
function for resetting partitions in the first place, which I guess
you weren't.

Right.

v4-0002 looks good btw, except the bitpick about test comment similar
to my earlier comment regarding v5-0001:

+# Change the column order of table on publisher

I think it might be better to say something specific to describe the
test intent, like:

Test that replication into partitioned target table continues to works
correctly when the published table is altered

Okay changed this and slightly modify the comments and commit message.
I am just attaching the HEAD patches for the first two issues.

--
With Regards,
Amit Kapila.

Attachments:

v7-0001-Fix-cache-look-up-failures-while-applying-changes.patchapplication/octet-stream; name=v7-0001-Fix-cache-look-up-failures-while-applying-changes.patchDownload
From 60f15eeaaa4da70599fca0119947a3ff6695480e Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:06:05 +0800
Subject: [PATCH v7 1/2] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 56 +++++++++++++++++--
 2 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..b12f569702 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -451,7 +451,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +464,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +481,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +534,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +544,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int	i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +602,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e7f4a94f19..06f9215018 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -800,9 +800,57 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test that replication continues to work correctly after altering the
+# partition of a partitioned target table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
 done_testing();
-- 
2.28.0.windows.1

v7-0002-Fix-data-inconsistency-between-publisher-and-subs.patchapplication/octet-stream; name=v7-0002-Fix-data-inconsistency-between-publisher-and-subs.patchDownload
From 3b0e86839e823d973e9f6106919d124a57a97617 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:39:18 +0800
Subject: [PATCH v7 2/2] Fix data inconsistency between publisher and
 subscriber.

We were not updating the partition map cache in the subscriber even when
the corresponding remote rel is changed. Due to this data was getting
incorrectly replicated after the publisher has changed the table schema.

Fix it by resetting the required entries in the partition map cache after
receiving a new relation mapping from the publisher.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 34 ++++++++++++++++++++++
 src/backend/replication/logical/worker.c   |  3 ++
 src/include/replication/logicalrelation.h  |  1 +
 src/test/subscription/t/013_partition.pl   | 15 ++++++++++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index b12f569702..366a75d500 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -486,6 +486,40 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 	}
 }
 
+/*
+ * Reset the entries in the partition map that refer to remoterel.
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+	}
+}
+
 /*
  * Initialize the partition map cache.
  */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..607f719fd6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1562,6 +1562,9 @@ apply_handle_relation(StringInfo s)
 
 	rel = logicalrep_read_rel(s);
 	logicalrep_relmap_update(rel);
+
+	/* Also reset all entries in the partition map that refer to remoterel. */
+	logicalrep_partmap_reset_relmap(rel);
 }
 
 /*
diff --git a/src/include/replication/logicalrelation.h b/src/include/replication/logicalrelation.h
index 7bf8cd22bd..78cd7e77f5 100644
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
@@ -38,6 +38,7 @@ typedef struct LogicalRepRelMapEntry
 } LogicalRepRelMapEntry;
 
 extern void logicalrep_relmap_update(LogicalRepRelation *remoterel);
+extern void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel);
 
 extern LogicalRepRelMapEntry *logicalrep_rel_open(LogicalRepRelId remoteid,
 												  LOCKMODE lockmode);
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 06f9215018..69f4009a14 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -853,4 +853,19 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
+# Test that replication into the partitioned target table continues to
+# work correctly when the published table is altered.
+$node_publisher->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DROP COLUMN b, ADD COLUMN c INT;
+	ALTER TABLE tab5 ADD COLUMN b INT;});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
 done_testing();
-- 
2.28.0.windows.1

#18Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#17)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 14, 2022 at 9:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 14, 2022 at 1:02 PM Amit Langote <amitlangote09@gmail.com> wrote:

+# Change the column order of table on publisher

I think it might be better to say something specific to describe the
test intent, like:

Test that replication into partitioned target table continues to works
correctly when the published table is altered

Okay changed this and slightly modify the comments and commit message.
I am just attaching the HEAD patches for the first two issues.

LGTM, thanks.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#19shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#17)
6 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Tue, Jun 14, 2022 8:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

v4-0002 looks good btw, except the bitpick about test comment similar
to my earlier comment regarding v5-0001:

+# Change the column order of table on publisher

I think it might be better to say something specific to describe the
test intent, like:

Test that replication into partitioned target table continues to works
correctly when the published table is altered

Okay changed this and slightly modify the comments and commit message.
I am just attaching the HEAD patches for the first two issues.

Thanks for updating the patch.

Attached the new patch set which ran pgindent, and the patches for pg14 and
pg13. (Only the first two patches of the patch set.)

Regards,
Shi yu

Attachments:

v8-pg14-0002-Fix-data-inconsistency-between-publisher-and-su_patchapplication/octet-stream; name=v8-pg14-0002-Fix-data-inconsistency-between-publisher-and-su_patchDownload
From 232967797758d2ba44318a3d58182b11d4bd7592 Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Wed, 15 Jun 2022 10:48:21 +0800
Subject: [PATCH v814 2/2] Fix data inconsistency between publisher and
 subscriber.

We were not updating the partition map cache in the subscriber even when
the corresponding remote rel is changed. Due to this data was getting
incorrectly replicated after the publisher has changed the table schema.

Fix it by resetting the required entries in the partition map cache after
receiving a new relation mapping from the publisher.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 34 ++++++++++++++++++++++
 src/backend/replication/logical/worker.c   |  3 ++
 src/include/replication/logicalrelation.h  |  1 +
 src/test/subscription/t/013_partition.pl   | 17 ++++++++++-
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5f4689f182..5c7e9d11ac 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -486,6 +486,40 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 	}
 }
 
+/*
+ * Reset the entries in the partition map that refer to remoterel.
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+	}
+}
+
 /*
  * Initialize the partition map cache.
  */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 833b2809d0..bf97fa44ba 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1191,6 +1191,9 @@ apply_handle_relation(StringInfo s)
 
 	rel = logicalrep_read_rel(s);
 	logicalrep_relmap_update(rel);
+
+	/* Also reset all entries in the partition map that refer to remoterel. */
+	logicalrep_partmap_reset_relmap(rel);
 }
 
 /*
diff --git a/src/include/replication/logicalrelation.h b/src/include/replication/logicalrelation.h
index 3c662d3abc..10f91490b5 100644
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
@@ -38,6 +38,7 @@ typedef struct LogicalRepRelMapEntry
 } LogicalRepRelMapEntry;
 
 extern void logicalrep_relmap_update(LogicalRepRelation *remoterel);
+extern void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel);
 
 extern LogicalRepRelMapEntry *logicalrep_rel_open(LogicalRepRelId remoteid,
 												  LOCKMODE lockmode);
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e53bc5b568..568e4d104e 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 69;
+use Test::More tests => 70;
 
 # setup
 
@@ -841,3 +841,18 @@ $node_publisher->wait_for_catchup('sub2');
 $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
+
+# Test that replication into the partitioned target table continues to
+# work correctly when the published table is altered.
+$node_publisher->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DROP COLUMN b, ADD COLUMN c INT;
+	ALTER TABLE tab5 ADD COLUMN b INT;});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
-- 
2.18.4

v8-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patchapplication/octet-stream; name=v8-pg14-0001-Fix-cache-look-up-failures-while-applying-chang_patchDownload
From 9f60a7127eab8844e9a88a5c976963519bd54b03 Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Tue, 14 Jun 2022 14:01:54 +0800
Subject: [PATCH v814 1/2] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 58 ++++++++++++++++++--
 2 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index c37e2a7e29..5f4689f182 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -451,7 +451,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +464,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +481,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +534,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +544,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int			i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +602,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 39946c735b..e53bc5b568 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 67;
+use Test::More tests => 69;
 
 # setup
 
@@ -789,7 +789,55 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test that replication continues to work correctly after altering the
+# partition of a partitioned target table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
-- 
2.18.4

v8-pg13-0002-Fix-data-inconsistency-between-publisher-and-su_patchapplication/octet-stream; name=v8-pg13-0002-Fix-data-inconsistency-between-publisher-and-su_patchDownload
From 3bfea2ac0c1deafaa4adaf03ffa0673c186a3786 Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Wed, 15 Jun 2022 10:48:21 +0800
Subject: [PATCH v813 2/2] Fix data inconsistency between publisher and
 subscriber.

We were not updating the partition map cache in the subscriber even when
the corresponding remote rel is changed. Due to this data was getting
incorrectly replicated after the publisher has changed the table schema.

Fix it by resetting the required entries in the partition map cache after
receiving a new relation mapping from the publisher.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 34 ++++++++++++++++++++++
 src/backend/replication/logical/worker.c   |  3 ++
 src/include/replication/logicalrelation.h  |  1 +
 src/test/subscription/t/013_partition.pl   | 17 ++++++++++-
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 283afa5d9d..026d2c2af4 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -453,6 +453,40 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 	}
 }
 
+/*
+ * Reset the entries in the partition map that refer to remoterel.
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+	}
+}
+
 /*
  * Initialize the partition map cache.
  */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index c04abd79e7..8eafd61cdb 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -612,6 +612,9 @@ apply_handle_relation(StringInfo s)
 
 	rel = logicalrep_read_rel(s);
 	logicalrep_relmap_update(rel);
+
+	/* Also reset all entries in the partition map that refer to remoterel. */
+	logicalrep_partmap_reset_relmap(rel);
 }
 
 /*
diff --git a/src/include/replication/logicalrelation.h b/src/include/replication/logicalrelation.h
index e369b27e7f..847e6f1846 100644
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
@@ -33,6 +33,7 @@ typedef struct LogicalRepRelMapEntry
 } LogicalRepRelMapEntry;
 
 extern void logicalrep_relmap_update(LogicalRepRelation *remoterel);
+extern void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel);
 
 extern LogicalRepRelMapEntry *logicalrep_rel_open(LogicalRepRelId remoteid,
 												  LOCKMODE lockmode);
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 4eef52df77..8a1ec55f24 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 69;
+use Test::More tests => 70;
 
 # setup
 
@@ -838,3 +838,18 @@ $node_publisher->wait_for_catchup('sub2');
 $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
+
+# Test that replication into the partitioned target table continues to
+# work correctly when the published table is altered.
+$node_publisher->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DROP COLUMN b, ADD COLUMN c INT;
+	ALTER TABLE tab5 ADD COLUMN b INT;});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
-- 
2.18.4

v8-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patchapplication/octet-stream; name=v8-pg13-0001-Fix-cache-look-up-failures-while-applying-chang_patchDownload
From f3bb0079d980a2254003474f4888d7161045ce27 Mon Sep 17 00:00:00 2001
From: "shiy.fnst" <shiy.fnst@fujitsu.com>
Date: Tue, 14 Jun 2022 14:38:48 +0800
Subject: [PATCH v813 1/2] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 58 ++++++++++++++++++--
 2 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 901bff9974..283afa5d9d 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -418,7 +418,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -431,11 +431,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -448,8 +448,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -502,7 +502,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -513,31 +512,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int			i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -562,7 +570,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 7689cbb364..4eef52df77 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 67;
+use Test::More tests => 69;
 
 # setup
 
@@ -786,7 +786,55 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test that replication continues to work correctly after altering the
+# partition of a partitioned target table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
-- 
2.18.4

v8-0002-Fix-data-inconsistency-between-publisher-and-subs.patchapplication/octet-stream; name=v8-0002-Fix-data-inconsistency-between-publisher-and-subs.patchDownload
From 29096a74cdf8d6e4aa84711175e53faa5f4a09b0 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:39:18 +0800
Subject: [PATCH v8 2/2] Fix data inconsistency between publisher and
 subscriber.

We were not updating the partition map cache in the subscriber even when
the corresponding remote rel is changed. Due to this data was getting
incorrectly replicated after the publisher has changed the table schema.

Fix it by resetting the required entries in the partition map cache after
receiving a new relation mapping from the publisher.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 34 ++++++++++++++++++++++
 src/backend/replication/logical/worker.c   |  3 ++
 src/include/replication/logicalrelation.h  |  1 +
 src/test/subscription/t/013_partition.pl   | 15 ++++++++++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 9c9ec144d8..34c55c04e3 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -486,6 +486,40 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 	}
 }
 
+/*
+ * Reset the entries in the partition map that refer to remoterel.
+ *
+ * Called when new relation mapping is sent by the publisher to update our
+ * expected view of incoming data from said publisher.
+ *
+ * Note that we don't update the remoterel information in the entry here,
+ * we will update the information in logicalrep_partition_open to avoid
+ * unnecessary work.
+ */
+void
+logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel)
+{
+	HASH_SEQ_STATUS status;
+	LogicalRepPartMapEntry *part_entry;
+	LogicalRepRelMapEntry *entry;
+
+	if (LogicalRepPartMap == NULL)
+		return;
+
+	hash_seq_init(&status, LogicalRepPartMap);
+	while ((part_entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+	{
+		entry = &part_entry->relmapentry;
+
+		if (entry->remoterel.remoteid != remoterel->remoteid)
+			continue;
+
+		logicalrep_relmap_free_entry(entry);
+
+		memset(entry, 0, sizeof(LogicalRepRelMapEntry));
+	}
+}
+
 /*
  * Initialize the partition map cache.
  */
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index fc210a9e7b..607f719fd6 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1562,6 +1562,9 @@ apply_handle_relation(StringInfo s)
 
 	rel = logicalrep_read_rel(s);
 	logicalrep_relmap_update(rel);
+
+	/* Also reset all entries in the partition map that refer to remoterel. */
+	logicalrep_partmap_reset_relmap(rel);
 }
 
 /*
diff --git a/src/include/replication/logicalrelation.h b/src/include/replication/logicalrelation.h
index 7bf8cd22bd..78cd7e77f5 100644
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
@@ -38,6 +38,7 @@ typedef struct LogicalRepRelMapEntry
 } LogicalRepRelMapEntry;
 
 extern void logicalrep_relmap_update(LogicalRepRelation *remoterel);
+extern void logicalrep_partmap_reset_relmap(LogicalRepRelation *remoterel);
 
 extern LogicalRepRelMapEntry *logicalrep_rel_open(LogicalRepRelId remoteid,
 												  LOCKMODE lockmode);
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 06f9215018..69f4009a14 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -853,4 +853,19 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
+# Test that replication into the partitioned target table continues to
+# work correctly when the published table is altered.
+$node_publisher->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DROP COLUMN b, ADD COLUMN c INT;
+	ALTER TABLE tab5 ADD COLUMN b INT;});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET c = 1 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
 done_testing();
-- 
2.18.4

v8-0001-Fix-cache-look-up-failures-while-applying-changes.patchapplication/octet-stream; name=v8-0001-Fix-cache-look-up-failures-while-applying-changes.patchDownload
From 0714b68643f4c2079a14c0b5652d4fdf39a4c654 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:06:05 +0800
Subject: [PATCH v8 1/2] Fix cache look-up failures while applying changes in
 logical replication.

While building a new attrmap which maps partition attribute numbers to
remoterel's, we incorrectly update the map for dropped column attributes.
Later, it caused cache look-up failure when we tried to use the map to
fetch the information about attributes.

This also fixes the partition map cache invalidation which was using the
wrong type cast to fetch the entry. We were using stale partition map
entry after invalidation which leads to the assertion or cache look-up
failure.

Reported-by: Shi Yu
Author: Hou Zhijie, Shi Yu
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 62 +++++++++++++---------
 src/test/subscription/t/013_partition.pl   | 56 +++++++++++++++++--
 2 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 80fb561a9a..9c9ec144d8 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -451,7 +451,7 @@ logicalrep_rel_close(LogicalRepRelMapEntry *rel, LOCKMODE lockmode)
 static void
 logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 {
-	LogicalRepRelMapEntry *entry;
+	LogicalRepPartMapEntry *entry;
 
 	/* Just to be sure. */
 	if (LogicalRepPartMap == NULL)
@@ -464,11 +464,11 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 		hash_seq_init(&status, LogicalRepPartMap);
 
 		/* TODO, use inverse lookup hashtable? */
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
 		{
-			if (entry->localreloid == reloid)
+			if (entry->relmapentry.localreloid == reloid)
 			{
-				entry->localrelvalid = false;
+				entry->relmapentry.localrelvalid = false;
 				hash_seq_term(&status);
 				break;
 			}
@@ -481,8 +481,8 @@ logicalrep_partmap_invalidate_cb(Datum arg, Oid reloid)
 
 		hash_seq_init(&status, LogicalRepPartMap);
 
-		while ((entry = (LogicalRepRelMapEntry *) hash_seq_search(&status)) != NULL)
-			entry->localrelvalid = false;
+		while ((entry = (LogicalRepPartMapEntry *) hash_seq_search(&status)) != NULL)
+			entry->relmapentry.localrelvalid = false;
 	}
 }
 
@@ -534,7 +534,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 	Oid			partOid = RelationGetRelid(partrel);
 	AttrMap    *attrmap = root->attrmap;
 	bool		found;
-	int			i;
 	MemoryContext oldctx;
 
 	if (LogicalRepPartMap == NULL)
@@ -545,31 +544,40 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
-	if (found)
-		return &part_entry->relmapentry;
+	entry = &part_entry->relmapentry;
 
-	memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+	if (found && entry->localrelvalid)
+		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	part_entry->partoid = partOid;
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
 
-	/* Remote relation is copied as-is from the root entry. */
-	entry = &part_entry->relmapentry;
-	entry->remoterel.remoteid = remoterel->remoteid;
-	entry->remoterel.nspname = pstrdup(remoterel->nspname);
-	entry->remoterel.relname = pstrdup(remoterel->relname);
-	entry->remoterel.natts = remoterel->natts;
-	entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
-	entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
-	for (i = 0; i < remoterel->natts; i++)
+	if (!entry->remoterel.remoteid)
 	{
-		entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
-		entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		int			i;
+
+		/* Remote relation is copied as-is from the root entry. */
+		entry = &part_entry->relmapentry;
+		entry->remoterel.remoteid = remoterel->remoteid;
+		entry->remoterel.nspname = pstrdup(remoterel->nspname);
+		entry->remoterel.relname = pstrdup(remoterel->relname);
+		entry->remoterel.natts = remoterel->natts;
+		entry->remoterel.attnames = palloc(remoterel->natts * sizeof(char *));
+		entry->remoterel.atttyps = palloc(remoterel->natts * sizeof(Oid));
+		for (i = 0; i < remoterel->natts; i++)
+		{
+			entry->remoterel.attnames[i] = pstrdup(remoterel->attnames[i]);
+			entry->remoterel.atttyps[i] = remoterel->atttyps[i];
+		}
+		entry->remoterel.replident = remoterel->replident;
+		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
-	entry->remoterel.replident = remoterel->replident;
-	entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 
 	entry->localrel = partrel;
 	entry->localreloid = partOid;
@@ -594,7 +602,11 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		{
 			AttrNumber	root_attno = map->attnums[attno];
 
-			entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
+			/* 0 means it's a dropped attribute.  See comments atop AttrMap. */
+			if (root_attno == 0)
+				entry->attrmap->attnums[attno] = -1;
+			else
+				entry->attrmap->attnums[attno] = attrmap->attnums[root_attno - 1];
 		}
 	}
 	else
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index e7f4a94f19..06f9215018 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -800,9 +800,57 @@ ok( $logfile =~
 	  qr/logical replication did not find row to be deleted in replication target relation "tab2_1"/,
 	'delete target row is missing in tab2_1');
 
-# No need for this until more tests are added.
-# $node_subscriber1->append_conf('postgresql.conf',
-# 	"log_min_messages = warning");
-# $node_subscriber1->reload;
+$node_subscriber1->append_conf('postgresql.conf',
+	"log_min_messages = warning");
+$node_subscriber1->reload;
+
+# Test that replication continues to work correctly after altering the
+# partition of a partitioned target table.
+
+$node_publisher->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int);
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;});
+
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	CREATE TABLE tab5 (a int NOT NULL, b int, c int) PARTITION BY LIST (a);
+	CREATE TABLE tab5_1 PARTITION OF tab5 DEFAULT;
+	CREATE UNIQUE INDEX tab5_a_idx ON tab5 (a);
+	ALTER TABLE tab5 REPLICA IDENTITY USING INDEX tab5_a_idx;
+	ALTER TABLE tab5_1 REPLICA IDENTITY USING INDEX tab5_1_a_idx;});
+
+$node_subscriber2->safe_psql('postgres',
+	"ALTER SUBSCRIPTION sub2 REFRESH PUBLICATION");
+
+$node_subscriber2->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+# Make partition map cache
+$node_publisher->safe_psql('postgres', "INSERT INTO tab5 VALUES (1, 1)");
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 2 WHERE a = 1");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b FROM tab5 ORDER BY 1");
+is($result, qq(2|1), 'updates of tab5 replicated correctly');
+
+# Change the column order of partition on subscriber
+$node_subscriber2->safe_psql(
+	'postgres', q{
+	ALTER TABLE tab5 DETACH PARTITION tab5_1;
+	ALTER TABLE tab5_1 DROP COLUMN b;
+	ALTER TABLE tab5_1 ADD COLUMN b int;
+	ALTER TABLE tab5 ATTACH PARTITION tab5_1 DEFAULT});
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 3 WHERE a = 2");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5 ORDER BY 1");
+is($result, qq(3|1|), 'updates of tab5 replicated correctly after altering table on subscriber');
 
 done_testing();
-- 
2.18.4

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#19)
Re: Replica Identity check of partition table on subscriber

On Wed, Jun 15, 2022 at 8:52 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Tue, Jun 14, 2022 8:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

v4-0002 looks good btw, except the bitpick about test comment similar
to my earlier comment regarding v5-0001:

+# Change the column order of table on publisher

I think it might be better to say something specific to describe the
test intent, like:

Test that replication into partitioned target table continues to works
correctly when the published table is altered

Okay changed this and slightly modify the comments and commit message.
I am just attaching the HEAD patches for the first two issues.

Thanks for updating the patch.

Attached the new patch set which ran pgindent, and the patches for pg14 and
pg13. (Only the first two patches of the patch set.)

I have pushed the first bug-fix patch today.

--
With Regards,
Amit Kapila.

#21shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#20)
2 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I have pushed the first bug-fix patch today.

Thanks.

Attached the remaining patches which are rebased.

Regards,
Shi yu

Attachments:

v9-0002-fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v9-0002-fix-memory-leak-about-attrmap.patchDownload
From bb906411e53e310762ec121b625b666afd7d9750 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:42:55 +0800
Subject: [PATCH v9 2/2] fix memory leak about attrmap

Use free_attrmap instead of pfree to release AttrMap structure.
Check the attrmap again when opening the relation and clean up the
invalid AttrMap before rebuilding it.
---
 src/backend/replication/logical/relation.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 18e657c..84515b8 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -144,7 +144,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 	bms_free(remoterel->attkeys);
 
 	if (entry->attrmap)
-		pfree(entry->attrmap);
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
 }
 
 /*
@@ -378,6 +381,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		int			i;
 		Bitmapset  *missingatts;
 
+		/* cleanup the invalid attrmap */
+		if (entry->attrmap)
+		{
+			free_attrmap(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
@@ -610,6 +620,13 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		part_entry->partoid = partOid;
 	}
 
+	/* cleanup the invalid attrmap */
+	if (entry->attrmap)
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
+
 	if (!entry->remoterel.remoteid)
 	{
 		int			i;
-- 
2.18.4

v9-0001-Check-partition-table-replica-identity-on-subscri.patchapplication/octet-stream; name=v9-0001-Check-partition-table-replica-identity-on-subscri.patchDownload
From d6c155506b13e1cb7a993564f628c1c3d2bebc68 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:40:58 +0800
Subject: [PATCH v9 1/2] Check partition table replica identity on subscriber

In logical replication, we will check if the target table on subscriber is
updatable by comparing the replica identity of the table on publisher with the
table on subscriber. When the target table is a partitioned table, we should
check the replica identity key of target partition, instead of the partitioned
table.
---
 src/backend/replication/logical/relation.c | 121 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  20 +++-
 src/test/subscription/t/013_partition.pl   |  14 +++
 3 files changed, 98 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 34c55c0..18e657c 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,72 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (entry->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+		/*
+		 * If no replica identity index and no PK, the published table
+		 * must have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -307,7 +373,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -365,55 +430,8 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		/* be tidy */
 		bms_free(missingatts);
 
-		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
-		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		/* Check that replica identity matches. */
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -651,7 +669,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Check that replica identity matches. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 607f719..8e32a48 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2121,6 +2121,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap	   *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2152,8 +2154,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2161,6 +2166,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if(operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2182,15 +2195,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 69f4009..e07a921 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -868,4 +868,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
 
+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.18.4

#22Amit Langote
amitlangote09@gmail.com
In reply to: shiy.fnst@fujitsu.com (#21)
Re: Replica Identity check of partition table on subscriber

Hi,

On Thu, Jun 16, 2022 at 2:07 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I have pushed the first bug-fix patch today.

Attached the remaining patches which are rebased.

Thanks.

Comments on v9-0001:

+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.

I know you're simply copying the old comment, but I think we can
rewrite it to be slightly more useful:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient and leave it to
check_relation_updatable() to throw the actual error if needed.

+   /* Check that replica identity matches. */
+   logicalrep_rel_mark_updatable(entry);

Maybe the comment (there are 2 instances) should say:

Set if the table's replica identity is enough to apply update/delete.

Finally,

+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.

For consistency with other recently added comments, I'd suggest the
following wording:

Test that replication works correctly as long as the leaf partition
has the necessary REPLICA IDENTITY, even though the actual target
partitioned table does not.

On v9-0002:

+ /* cleanup the invalid attrmap */

It seems that "invalid" here really means no-longer-useful, so we
should use that phrase as a nearby comment does:

Release the no-longer-useful attrmap, if any.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#22)
Re: Replica Identity check of partition table on subscriber

On Thu, Jun 16, 2022 at 11:43 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jun 16, 2022 at 2:07 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I have pushed the first bug-fix patch today.

Attached the remaining patches which are rebased.

Thanks.

Comments on v9-0001:

+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.

I know you're simply copying the old comment, but I think we can
rewrite it to be slightly more useful:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient and leave it to
check_relation_updatable() to throw the actual error if needed.

I am fine with improving this comment but it would be better if in
some way we keep the following part of the comment: "as replica
identity is only for updates and deletes but inserts can be replicated
even without it." as that makes it more clear why it is okay to just
mark the entry as not updatable. One idea could be: "We just mark the
relation entry as not updatable here if the local replica identity is
found to be insufficient and leave it to check_relation_updatable() to
throw the actual error if needed. This is because replica identity is
only for updates and deletes but inserts can be replicated even
without it.". Feel free to suggest if you have any better ideas?

--
With Regards,
Amit Kapila.

#24Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#23)
Re: Replica Identity check of partition table on subscriber

On Thu, Jun 16, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 16, 2022 at 11:43 AM Amit Langote <amitlangote09@gmail.com> wrote:

+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.

I know you're simply copying the old comment, but I think we can
rewrite it to be slightly more useful:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient and leave it to
check_relation_updatable() to throw the actual error if needed.

I am fine with improving this comment but it would be better if in
some way we keep the following part of the comment: "as replica
identity is only for updates and deletes but inserts can be replicated
even without it." as that makes it more clear why it is okay to just
mark the entry as not updatable. One idea could be: "We just mark the
relation entry as not updatable here if the local replica identity is
found to be insufficient and leave it to check_relation_updatable() to
throw the actual error if needed. This is because replica identity is
only for updates and deletes but inserts can be replicated even
without it.". Feel free to suggest if you have any better ideas?

I thought mentioning check_relation_updatable() would make it clear
that only updates (and deletes) care about a valid local replica
identity, because only apply_handle_{update|delete}() call that
function. Anyway, how about this:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient for applying
updates/deletes (inserts don't care!) and leave it to
check_relation_updatable() to throw the actual error if needed.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#25Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#24)
Re: Replica Identity check of partition table on subscriber

On Thu, Jun 16, 2022 at 12:30 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jun 16, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 16, 2022 at 11:43 AM Amit Langote <amitlangote09@gmail.com> wrote:

+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.

I know you're simply copying the old comment, but I think we can
rewrite it to be slightly more useful:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient and leave it to
check_relation_updatable() to throw the actual error if needed.

I am fine with improving this comment but it would be better if in
some way we keep the following part of the comment: "as replica
identity is only for updates and deletes but inserts can be replicated
even without it." as that makes it more clear why it is okay to just
mark the entry as not updatable. One idea could be: "We just mark the
relation entry as not updatable here if the local replica identity is
found to be insufficient and leave it to check_relation_updatable() to
throw the actual error if needed. This is because replica identity is
only for updates and deletes but inserts can be replicated even
without it.". Feel free to suggest if you have any better ideas?

I thought mentioning check_relation_updatable() would make it clear
that only updates (and deletes) care about a valid local replica
identity, because only apply_handle_{update|delete}() call that
function. Anyway, how about this:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient for applying
updates/deletes (inserts don't care!) and leave it to
check_relation_updatable() to throw the actual error if needed.

This sounds better to me than the previous text.

--
With Regards,
Amit Kapila.

#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#5)
Re: Replica Identity check of partition table on subscriber

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
static void
check_relation_updatable(LogicalRepRelMapEntry *rel)
{
+   /*
+    * If it is a partitioned table, we don't check it, we will check its
+    * partition later.
+    */
+   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       return;

Why do this? I mean why if logicalrep_check_updatable() doesn't care
if the relation is partitioned or not -- it does all the work
regardless.

I suggest we don't add this check in check_relation_updatable().

I think based on this suggestion patch has moved this check to
logicalrep_rel_mark_updatable(). For a partitioned table, it won't
even validate whether it can mark updatable as false which seems odd
to me even though there might not be any bug due to that. Was your
suggestion actually intended to move it to
logicalrep_rel_mark_updatable? If so, why do you think that is a
better place?

I think it is important to have this check to avoid giving error via
check_relation_updatable() when partitioned tables don't have RI but
not clear which is the right place. I think check_relation_updatable()
is better place than logicalrep_rel_mark_updatable() but may be there
is a reason why that is not a good idea.

--
With Regards,
Amit Kapila.

#27Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#26)
Re: Replica Identity check of partition table on subscriber

On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
static void
check_relation_updatable(LogicalRepRelMapEntry *rel)
{
+   /*
+    * If it is a partitioned table, we don't check it, we will check its
+    * partition later.
+    */
+   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       return;

Why do this? I mean why if logicalrep_check_updatable() doesn't care
if the relation is partitioned or not -- it does all the work
regardless.

I suggest we don't add this check in check_relation_updatable().

I think based on this suggestion patch has moved this check to
logicalrep_rel_mark_updatable(). For a partitioned table, it won't
even validate whether it can mark updatable as false which seems odd
to me even though there might not be any bug due to that. Was your
suggestion actually intended to move it to
logicalrep_rel_mark_updatable?

No, I didn't intend to suggest that we move this check to
logicalrep_rel_mark_updatable(); didn't notice that that's what the
latest patch did.

What I said is that we shouldn't ignore the updatable flag for a
partitioned table in check_relation_updatable(), because
logicalrep_rel_mark_updatable() would have set the updatable flag
correctly even for partitioned tables. IOW, we should not
special-case partitioned tables anywhere.

I guess the point of adding the check is to allow the case where a
leaf partition's replica identity can be used to apply an update
originally targeting its ancestor that doesn't itself have one.

I wonder if it wouldn't be better to move the
check_relation_updatable() call to
apply_handle_{update|delete}_internal()? We know for sure that we
only ever get there for leaf tables. If we do that, we won't need the
relkind check.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#27)
Re: Replica Identity check of partition table on subscriber

On Thu, Jun 16, 2022 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
static void
check_relation_updatable(LogicalRepRelMapEntry *rel)
{
+   /*
+    * If it is a partitioned table, we don't check it, we will check its
+    * partition later.
+    */
+   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       return;

Why do this? I mean why if logicalrep_check_updatable() doesn't care
if the relation is partitioned or not -- it does all the work
regardless.

I suggest we don't add this check in check_relation_updatable().

I think based on this suggestion patch has moved this check to
logicalrep_rel_mark_updatable(). For a partitioned table, it won't
even validate whether it can mark updatable as false which seems odd
to me even though there might not be any bug due to that. Was your
suggestion actually intended to move it to
logicalrep_rel_mark_updatable?

No, I didn't intend to suggest that we move this check to
logicalrep_rel_mark_updatable(); didn't notice that that's what the
latest patch did.

What I said is that we shouldn't ignore the updatable flag for a
partitioned table in check_relation_updatable(), because
logicalrep_rel_mark_updatable() would have set the updatable flag
correctly even for partitioned tables. IOW, we should not
special-case partitioned tables anywhere.

I guess the point of adding the check is to allow the case where a
leaf partition's replica identity can be used to apply an update
originally targeting its ancestor that doesn't itself have one.

I wonder if it wouldn't be better to move the
check_relation_updatable() call to
apply_handle_{update|delete}_internal()? We know for sure that we
only ever get there for leaf tables. If we do that, we won't need the
relkind check.

I think this won't work for updates via apply_handle_tuple_routing()
unless we call it from some other place(s) as well. It will do
FindReplTupleInLocalRel() before doing update/delete for CMD_UPDATE
case and will lead to assertion failure.

--
With Regards,
Amit Kapila.

#29Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#28)
Re: Replica Identity check of partition table on subscriber

On Thu, Jun 16, 2022 at 9:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 16, 2022 at 5:24 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 10, 2022 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

@@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
static void
check_relation_updatable(LogicalRepRelMapEntry *rel)
{
+   /*
+    * If it is a partitioned table, we don't check it, we will check its
+    * partition later.
+    */
+   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+       return;

Why do this? I mean why if logicalrep_check_updatable() doesn't care
if the relation is partitioned or not -- it does all the work
regardless.

I suggest we don't add this check in check_relation_updatable().

I think based on this suggestion patch has moved this check to
logicalrep_rel_mark_updatable(). For a partitioned table, it won't
even validate whether it can mark updatable as false which seems odd
to me even though there might not be any bug due to that. Was your
suggestion actually intended to move it to
logicalrep_rel_mark_updatable?

No, I didn't intend to suggest that we move this check to
logicalrep_rel_mark_updatable(); didn't notice that that's what the
latest patch did.

What I said is that we shouldn't ignore the updatable flag for a
partitioned table in check_relation_updatable(), because
logicalrep_rel_mark_updatable() would have set the updatable flag
correctly even for partitioned tables. IOW, we should not
special-case partitioned tables anywhere.

I guess the point of adding the check is to allow the case where a
leaf partition's replica identity can be used to apply an update
originally targeting its ancestor that doesn't itself have one.

I wonder if it wouldn't be better to move the
check_relation_updatable() call to
apply_handle_{update|delete}_internal()? We know for sure that we
only ever get there for leaf tables. If we do that, we won't need the
relkind check.

I think this won't work for updates via apply_handle_tuple_routing()
unless we call it from some other place(s) as well. It will do
FindReplTupleInLocalRel() before doing update/delete for CMD_UPDATE
case and will lead to assertion failure.

You're right. I guess it's fine then to check the relkind in
check_relation_updatable() the way the original patch did, even though
it would've been nice if it didn't need to.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#30shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Langote (#22)
2 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Thu, Jun 16, 2022 2:13 PM Amit Langote <amitlangote09@gmail.com> wrote:

Hi,

On Thu, Jun 16, 2022 at 2:07 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Wed, Jun 15, 2022 8:30 PM Amit Kapila <amit.kapila16@gmail.com>

wrote:

I have pushed the first bug-fix patch today.

Attached the remaining patches which are rebased.

Thanks.

Comments on v9-0001:

Thanks for your comments.

+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.

I know you're simply copying the old comment, but I think we can
rewrite it to be slightly more useful:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient and leave it to
check_relation_updatable() to throw the actual error if needed.

Modified as you suggested in another mail [1]/messages/by-id/CA+HiwqG3Xi=wH4rBHm61ku-j0gm+-rc5VmDHxf=TeFkUsHtooA@mail.gmail.com.

+   /* Check that replica identity matches. */
+   logicalrep_rel_mark_updatable(entry);

Maybe the comment (there are 2 instances) should say:

Set if the table's replica identity is enough to apply update/delete.

Modified as suggested.

Finally,

+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we
check
+# is the partition, so it works fine.

For consistency with other recently added comments, I'd suggest the
following wording:

Test that replication works correctly as long as the leaf partition
has the necessary REPLICA IDENTITY, even though the actual target
partitioned table does not.

Modified as suggested.

On v9-0002:

+ /* cleanup the invalid attrmap */

It seems that "invalid" here really means no-longer-useful, so we
should use that phrase as a nearby comment does:

Release the no-longer-useful attrmap, if any.

Modified as suggested.

Attached the new version of patch set. I also moved the partitioned table check
in logicalrep_rel_mark_updatable() to check_relation_updatable() as discussed
[2]: /messages/by-id/CA+HiwqHfN789ekiYVE+0xsLswMosMrWBwv4cPvYgWREWejw7HA@mail.gmail.com

[1]: /messages/by-id/CA+HiwqG3Xi=wH4rBHm61ku-j0gm+-rc5VmDHxf=TeFkUsHtooA@mail.gmail.com
[2]: /messages/by-id/CA+HiwqHfN789ekiYVE+0xsLswMosMrWBwv4cPvYgWREWejw7HA@mail.gmail.com

Regards,
Shi yu

Attachments:

v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patchapplication/octet-stream; name=v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patchDownload
From 30acfd582685a83e1e31d35fbcf9ce14b2857f0f Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Fri, 17 Jun 2022 10:15:37 +0800
Subject: [PATCH v10 1/2] Fix partition table's RI checking on the subscriber.

In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect that to succeed corresponding partition
table must have a primary key or have a replica identity defined.

Fix it by checking the replica identity of partition table.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 115 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  14 +++
 3 files changed, 101 insertions(+), 55 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 34c55c04e3..5f511701d9 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,67 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * We just mark the relation entry as not updatable here if the local
+ * replica identity is found to be insufficient for applying
+ * updates/deletes (inserts don't care!) and leave it to
+ * check_relation_updatable() to throw the actual error if needed.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		/*
+		 * If no replica identity index and no PK, the published table must
+		 * have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -307,7 +368,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -366,54 +426,10 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		bms_free(missingatts);
 
 		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
+		 * Set if the table's replica identity is enough to apply
+		 * update/delete.
 		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -651,7 +667,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Set if the table's replica identity is enough to apply update/delete. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 607f719fd6..ee4249d73d 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1738,6 +1738,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -2121,6 +2128,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap    *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2152,8 +2161,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2161,6 +2173,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if (operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2182,15 +2202,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 69f4009a14..f2b75c11de 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -868,4 +868,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
 
+# Test that replication works correctly as long as the leaf partition
+# has the necessary REPLICA IDENTITY, even though the actual target
+# partitioned table does not.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.18.4

v10-0002-Fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v10-0002-Fix-memory-leak-about-attrmap.patchDownload
From 3ffa79dbf27aec61fb9d7ac348a19e42a307d06d Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:42:55 +0800
Subject: [PATCH v10 2/2] Fix memory leak about attrmap

When rebuilding the cached attribute map, it used pfree instead of
free_attrmap to release memory of the map which could result in memory
leak. Besides, we didn't release memory in some other places where the map
would be rebuilt. Fix it by using free_attrmap to release memory at all
necessary places.

Author: Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5f511701d9..7346208388 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -144,7 +144,7 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 	bms_free(remoterel->attkeys);
 
 	if (entry->attrmap)
-		pfree(entry->attrmap);
+		free_attrmap(entry->attrmap);
 }
 
 /*
@@ -373,6 +373,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		int			i;
 		Bitmapset  *missingatts;
 
+		/* Release the no-longer-useful attrmap, if any. */
+		if (entry->attrmap)
+		{
+			free_attrmap(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
@@ -608,6 +615,13 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		part_entry->partoid = partOid;
 	}
 
+	/* Release the no-longer-useful attrmap, if any. */
+	if (entry->attrmap)
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
+
 	if (!entry->remoterel.remoteid)
 	{
 		int			i;
-- 
2.18.4

#31shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: shiy.fnst@fujitsu.com (#30)
3 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Fri Jun 17, 2022 11:06 AM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:

Attached the new version of patch set. I also moved the partitioned table
check
in logicalrep_rel_mark_updatable() to check_relation_updatable() as
discussed
[2].

Attached back-branch patches of the first patch.

Regards,
Shi yu

Attachments:

v10-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchapplication/octet-stream; name=v10-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchDownload
From 3230f5824014f2093daa6d717364b83755893a3f Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Fri, 17 Jun 2022 10:15:37 +0800
Subject: [PATCH v1013] Fix partition table's RI checking on the subscriber.

In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect that to succeed corresponding partition
table must have a primary key or have a replica identity defined.

Fix it by checking the replica identity of partition table.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 115 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  16 ++-
 3 files changed, 102 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 026d2c2af4..b764ca97e3 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -213,6 +213,67 @@ logicalrep_rel_att_by_name(LogicalRepRelation *remoterel, const char *attname)
 	return -1;
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * We just mark the relation entry as not updatable here if the local
+ * replica identity is found to be insufficient for applying
+ * updates/deletes (inserts don't care!) and leave it to
+ * check_relation_updatable() to throw the actual error if needed.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		/*
+		 * If no replica identity index and no PK, the published table must
+		 * have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -272,7 +333,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	{
 		Oid			relid;
 		int			found;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -332,54 +392,10 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 							remoterel->nspname, remoterel->relname)));
 
 		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
+		 * Set if the table's replica identity is enough to apply
+		 * update/delete.
 		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -619,7 +635,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Set if the table's replica identity is enough to apply update/delete. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 8eafd61cdb..db20be5668 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -730,6 +730,13 @@ apply_handle_insert_internal(ResultRelInfo *relinfo,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -1064,6 +1071,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	PartitionRoutingInfo *partinfo;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap    *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -1097,8 +1106,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partinfo->pi_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -1106,6 +1118,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if (operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	estate->es_result_relation_info = partrelinfo;
 	switch (operation)
 	{
@@ -1129,15 +1149,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 8a1ec55f24..4debbfe55f 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 70;
+use Test::More tests => 71;
 
 # setup
 
@@ -853,3 +853,17 @@ $node_publisher->wait_for_catchup('sub2');
 $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
+# Test that replication works correctly as long as the leaf partition
+# has the necessary REPLICA IDENTITY, even though the actual target
+# partitioned table does not.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
-- 
2.18.4

v10-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchapplication/octet-stream; name=v10-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchDownload
From 7defe585f9f4f6d548184412e0716d727c7cd009 Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Fri, 17 Jun 2022 10:15:37 +0800
Subject: [PATCH v1014] Fix partition table's RI checking on the subscriber.

In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect that to succeed corresponding partition
table must have a primary key or have a replica identity defined.

Fix it by checking the replica identity of partition table.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 115 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  16 ++-
 3 files changed, 102 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5c7e9d11ac..1fc34b18a4 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,67 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * We just mark the relation entry as not updatable here if the local
+ * replica identity is found to be insufficient for applying
+ * updates/deletes (inserts don't care!) and leave it to
+ * check_relation_updatable() to throw the actual error if needed.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		/*
+		 * If no replica identity index and no PK, the published table must
+		 * have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -307,7 +368,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -366,54 +426,10 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		bms_free(missingatts);
 
 		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
+		 * Set if the table's replica identity is enough to apply
+		 * update/delete.
 		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -651,7 +667,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Set if the table's replica identity is enough to apply update/delete. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bf97fa44ba..ca8bc46aa2 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1323,6 +1323,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -1676,6 +1683,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap    *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -1707,8 +1716,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -1716,6 +1728,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if (operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -1737,15 +1757,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 568e4d104e..dfe2cb6dea 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 70;
+use Test::More tests => 71;
 
 # setup
 
@@ -856,3 +856,17 @@ $node_publisher->wait_for_catchup('sub2');
 $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
+# Test that replication works correctly as long as the leaf partition
+# has the necessary REPLICA IDENTITY, even though the actual target
+# partitioned table does not.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
-- 
2.18.4

v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patchapplication/octet-stream; name=v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patchDownload
From 30acfd582685a83e1e31d35fbcf9ce14b2857f0f Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Fri, 17 Jun 2022 10:15:37 +0800
Subject: [PATCH v10 1/2] Fix partition table's RI checking on the subscriber.

In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect that to succeed corresponding partition
table must have a primary key or have a replica identity defined.

Fix it by checking the replica identity of partition table.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 115 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  14 +++
 3 files changed, 101 insertions(+), 55 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 34c55c04e3..5f511701d9 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,67 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * We just mark the relation entry as not updatable here if the local
+ * replica identity is found to be insufficient for applying
+ * updates/deletes (inserts don't care!) and leave it to
+ * check_relation_updatable() to throw the actual error if needed.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		/*
+		 * If no replica identity index and no PK, the published table must
+		 * have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -307,7 +368,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -366,54 +426,10 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		bms_free(missingatts);
 
 		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
+		 * Set if the table's replica identity is enough to apply
+		 * update/delete.
 		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -651,7 +667,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Set if the table's replica identity is enough to apply update/delete. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 607f719fd6..ee4249d73d 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1738,6 +1738,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * If it is a partitioned table, we don't check it, we will check its
+	 * partition later.
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -2121,6 +2128,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap    *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2152,8 +2161,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2161,6 +2173,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if (operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2182,15 +2202,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 69f4009a14..f2b75c11de 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -868,4 +868,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
 
+# Test that replication works correctly as long as the leaf partition
+# has the necessary REPLICA IDENTITY, even though the actual target
+# partitioned table does not.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.18.4

#32Amit Kapila
amit.kapila16@gmail.com
In reply to: shiy.fnst@fujitsu.com (#31)
Re: Replica Identity check of partition table on subscriber

On Fri, Jun 17, 2022 at 11:22 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Fri Jun 17, 2022 11:06 AM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:

Attached the new version of patch set. I also moved the partitioned table
check
in logicalrep_rel_mark_updatable() to check_relation_updatable() as
discussed
[2].

Attached back-branch patches of the first patch.

One minor comment:
+ /*
+ * If it is a partitioned table, we don't check it, we will check its
+ * partition later.
+ */

Can we change the above comment to: "For partitioned tables, we only
need to care if the target partition is updatable (aka has PK or RI
defined for it)."?

Apart from this looks good to me. I'll push this tomorrow unless there
are any more suggestions/comments.

--
With Regards,
Amit Kapila.

#33shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#32)
3 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Mon, Jun 20, 2022 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jun 17, 2022 at 11:22 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Fri Jun 17, 2022 11:06 AM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com>

wrote:

Attached the new version of patch set. I also moved the partitioned table
check
in logicalrep_rel_mark_updatable() to check_relation_updatable() as
discussed
[2].

Attached back-branch patches of the first patch.

One minor comment:
+ /*
+ * If it is a partitioned table, we don't check it, we will check its
+ * partition later.
+ */

Can we change the above comment to: "For partitioned tables, we only
need to care if the target partition is updatable (aka has PK or RI
defined for it)."?

Thanks for your comment. Modified in the attached patches.

Regards,
Shi yu

Attachments:

v11-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patchapplication/octet-stream; name=v11-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patchDownload
From 8e1eb54fa2296889187fc5b7e8a698a952405e15 Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Fri, 17 Jun 2022 10:15:37 +0800
Subject: [PATCH v11] Fix partition table's RI checking on the subscriber.

In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect those to succeed only when the
corresponding partition table has a primary key or has a replica
identity defined.

Fix it by checking the replica identity of the partition table while
applying changes.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 115 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  14 +++
 3 files changed, 101 insertions(+), 55 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 34c55c04e3..5f511701d9 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,67 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * We just mark the relation entry as not updatable here if the local
+ * replica identity is found to be insufficient for applying
+ * updates/deletes (inserts don't care!) and leave it to
+ * check_relation_updatable() to throw the actual error if needed.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		/*
+		 * If no replica identity index and no PK, the published table must
+		 * have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -307,7 +368,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -366,54 +426,10 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		bms_free(missingatts);
 
 		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
+		 * Set if the table's replica identity is enough to apply
+		 * update/delete.
 		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -651,7 +667,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Set if the table's replica identity is enough to apply update/delete. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 607f719fd6..38e3b1c1b3 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1738,6 +1738,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * For partitioned tables, we only need to care if the target partition is
+	 * updatable (aka has PK or RI defined for it).
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -2121,6 +2128,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap    *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -2152,8 +2161,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -2161,6 +2173,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if (operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -2182,15 +2202,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 69f4009a14..f2b75c11de 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -868,4 +868,18 @@ $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
 
+# Test that replication works correctly as long as the leaf partition
+# has the necessary REPLICA IDENTITY, even though the actual target
+# partitioned table does not.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
+
 done_testing();
-- 
2.18.4

v11-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchapplication/octet-stream; name=v11-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchDownload
From d7d2113e68e1ce877454de69281ace0927367916 Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Fri, 17 Jun 2022 10:15:37 +0800
Subject: [PATCH v1113] Fix partition table's RI checking on the subscriber.

In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect those to succeed only when the
corresponding partition table has a primary key or has a replica
identity defined.

Fix it by checking the replica identity of the partition table while
applying changes.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 115 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  16 ++-
 3 files changed, 102 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 026d2c2af4..b764ca97e3 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -213,6 +213,67 @@ logicalrep_rel_att_by_name(LogicalRepRelation *remoterel, const char *attname)
 	return -1;
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * We just mark the relation entry as not updatable here if the local
+ * replica identity is found to be insufficient for applying
+ * updates/deletes (inserts don't care!) and leave it to
+ * check_relation_updatable() to throw the actual error if needed.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		/*
+		 * If no replica identity index and no PK, the published table must
+		 * have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -272,7 +333,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	{
 		Oid			relid;
 		int			found;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -332,54 +392,10 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 							remoterel->nspname, remoterel->relname)));
 
 		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
+		 * Set if the table's replica identity is enough to apply
+		 * update/delete.
 		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -619,7 +635,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Set if the table's replica identity is enough to apply update/delete. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 8eafd61cdb..4c4165da1e 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -730,6 +730,13 @@ apply_handle_insert_internal(ResultRelInfo *relinfo,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * For partitioned tables, we only need to care if the target partition is
+	 * updatable (aka has PK or RI defined for it).
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -1064,6 +1071,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	PartitionRoutingInfo *partinfo;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap    *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -1097,8 +1106,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partinfo->pi_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -1106,6 +1118,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if (operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	estate->es_result_relation_info = partrelinfo;
 	switch (operation)
 	{
@@ -1129,15 +1149,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 8a1ec55f24..4debbfe55f 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 70;
+use Test::More tests => 71;
 
 # setup
 
@@ -853,3 +853,17 @@ $node_publisher->wait_for_catchup('sub2');
 $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
+# Test that replication works correctly as long as the leaf partition
+# has the necessary REPLICA IDENTITY, even though the actual target
+# partitioned table does not.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
-- 
2.18.4

v11-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchapplication/octet-stream; name=v11-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patchDownload
From 4a1789f46f322039a6c05396f097f93b36f0403a Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Fri, 17 Jun 2022 10:15:37 +0800
Subject: [PATCH v1114] Fix partition table's RI checking on the subscriber.

In logical replication, we will check if the target table on the
subscriber is updatable by comparing the replica identity of the table on
the publisher with the table on the subscriber. When the target table is a
partitioned table, we only check its replica identity but not for the
partition tables. This leads to assertion failure while applying changes
for update/delete as we expect those to succeed only when the
corresponding partition table has a primary key or has a replica
identity defined.

Fix it by checking the replica identity of the partition table while
applying changes.

Reported-by: Shi Yu
Author: Shi Yu, Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 115 ++++++++++++---------
 src/backend/replication/logical/worker.c   |  27 +++--
 src/test/subscription/t/013_partition.pl   |  16 ++-
 3 files changed, 102 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5c7e9d11ac..1fc34b18a4 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -249,6 +249,67 @@ logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
 	}
 }
 
+/*
+ * Check if replica identity matches and mark the updatable flag.
+ *
+ * We allow for stricter replica identity (fewer columns) on subscriber as
+ * that will not stop us from finding unique tuple. IE, if publisher has
+ * identity (id,timestamp) and subscriber just (id) this will not be a
+ * problem, but in the opposite scenario it will.
+ *
+ * We just mark the relation entry as not updatable here if the local
+ * replica identity is found to be insufficient for applying
+ * updates/deletes (inserts don't care!) and leave it to
+ * check_relation_updatable() to throw the actual error if needed.
+ */
+static void
+logicalrep_rel_mark_updatable(LogicalRepRelMapEntry *entry)
+{
+	Bitmapset  *idkey;
+	LogicalRepRelation *remoterel = &entry->remoterel;
+	int			i;
+
+	entry->updatable = true;
+
+	idkey = RelationGetIndexAttrBitmap(entry->localrel,
+									   INDEX_ATTR_BITMAP_IDENTITY_KEY);
+	/* fallback to PK if no replica identity */
+	if (idkey == NULL)
+	{
+		idkey = RelationGetIndexAttrBitmap(entry->localrel,
+										   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		/*
+		 * If no replica identity index and no PK, the published table must
+		 * have replica identity FULL.
+		 */
+		if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
+			entry->updatable = false;
+	}
+
+	i = -1;
+	while ((i = bms_next_member(idkey, i)) >= 0)
+	{
+		int			attnum = i + FirstLowInvalidHeapAttributeNumber;
+
+		if (!AttrNumberIsForUserDefinedAttr(attnum))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("logical replication target relation \"%s.%s\" uses "
+							"system columns in REPLICA IDENTITY index",
+							remoterel->nspname, remoterel->relname)));
+
+		attnum = AttrNumberGetAttrOffset(attnum);
+
+		if (entry->attrmap->attnums[attnum] < 0 ||
+			!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
+		{
+			entry->updatable = false;
+			break;
+		}
+	}
+}
+
 /*
  * Open the local relation associated with the remote one.
  *
@@ -307,7 +368,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 	if (!entry->localrelvalid)
 	{
 		Oid			relid;
-		Bitmapset  *idkey;
 		TupleDesc	desc;
 		MemoryContext oldctx;
 		int			i;
@@ -366,54 +426,10 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		bms_free(missingatts);
 
 		/*
-		 * Check that replica identity matches. We allow for stricter replica
-		 * identity (fewer columns) on subscriber as that will not stop us
-		 * from finding unique tuple. IE, if publisher has identity
-		 * (id,timestamp) and subscriber just (id) this will not be a problem,
-		 * but in the opposite scenario it will.
-		 *
-		 * Don't throw any error here just mark the relation entry as not
-		 * updatable, as replica identity is only for updates and deletes but
-		 * inserts can be replicated even without it.
+		 * Set if the table's replica identity is enough to apply
+		 * update/delete.
 		 */
-		entry->updatable = true;
-		idkey = RelationGetIndexAttrBitmap(entry->localrel,
-										   INDEX_ATTR_BITMAP_IDENTITY_KEY);
-		/* fallback to PK if no replica identity */
-		if (idkey == NULL)
-		{
-			idkey = RelationGetIndexAttrBitmap(entry->localrel,
-											   INDEX_ATTR_BITMAP_PRIMARY_KEY);
-
-			/*
-			 * If no replica identity index and no PK, the published table
-			 * must have replica identity FULL.
-			 */
-			if (idkey == NULL && remoterel->replident != REPLICA_IDENTITY_FULL)
-				entry->updatable = false;
-		}
-
-		i = -1;
-		while ((i = bms_next_member(idkey, i)) >= 0)
-		{
-			int			attnum = i + FirstLowInvalidHeapAttributeNumber;
-
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
-				ereport(ERROR,
-						(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-						 errmsg("logical replication target relation \"%s.%s\" uses "
-								"system columns in REPLICA IDENTITY index",
-								remoterel->nspname, remoterel->relname)));
-
-			attnum = AttrNumberGetAttrOffset(attnum);
-
-			if (entry->attrmap->attnums[attnum] < 0 ||
-				!bms_is_member(entry->attrmap->attnums[attnum], remoterel->attkeys))
-			{
-				entry->updatable = false;
-				break;
-			}
-		}
+		logicalrep_rel_mark_updatable(entry);
 
 		entry->localrelvalid = true;
 	}
@@ -651,7 +667,8 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 			   attrmap->maplen * sizeof(AttrNumber));
 	}
 
-	entry->updatable = root->updatable;
+	/* Set if the table's replica identity is enough to apply update/delete. */
+	logicalrep_rel_mark_updatable(entry);
 
 	entry->localrelvalid = true;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bf97fa44ba..8c9a4b5038 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -1323,6 +1323,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
 static void
 check_relation_updatable(LogicalRepRelMapEntry *rel)
 {
+	/*
+	 * For partitioned tables, we only need to care if the target partition is
+	 * updatable (aka has PK or RI defined for it).
+	 */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+		return;
+
 	/* Updatable, no error. */
 	if (rel->updatable)
 		return;
@@ -1676,6 +1683,8 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	TupleTableSlot *remoteslot_part;
 	TupleConversionMap *map;
 	MemoryContext oldctx;
+	LogicalRepRelMapEntry *part_entry = NULL;
+	AttrMap    *attrmap = NULL;
 
 	/* ModifyTableState is needed for ExecFindPartition(). */
 	edata->mtstate = mtstate = makeNode(ModifyTableState);
@@ -1707,8 +1716,11 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 		remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
 	map = partrelinfo->ri_RootToPartitionMap;
 	if (map != NULL)
-		remoteslot_part = execute_attr_map_slot(map->attrMap, remoteslot,
+	{
+		attrmap = map->attrMap;
+		remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
 												remoteslot_part);
+	}
 	else
 	{
 		remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
@@ -1716,6 +1728,14 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 	}
 	MemoryContextSwitchTo(oldctx);
 
+	/* Check if we can do the update or delete on the leaf partition. */
+	if (operation == CMD_UPDATE || operation == CMD_DELETE)
+	{
+		part_entry = logicalrep_partition_open(relmapentry, partrel,
+											   attrmap);
+		check_relation_updatable(part_entry);
+	}
+
 	switch (operation)
 	{
 		case CMD_INSERT:
@@ -1737,15 +1757,10 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
 			 * suitable partition.
 			 */
 			{
-				AttrMap    *attrmap = map ? map->attrMap : NULL;
-				LogicalRepRelMapEntry *part_entry;
 				TupleTableSlot *localslot;
 				ResultRelInfo *partrelinfo_new;
 				bool		found;
 
-				part_entry = logicalrep_partition_open(relmapentry, partrel,
-													   attrmap);
-
 				/* Get the matching local tuple from the partition. */
 				found = FindReplTupleInLocalRel(estate, partrel,
 												&part_entry->remoterel,
diff --git a/src/test/subscription/t/013_partition.pl b/src/test/subscription/t/013_partition.pl
index 568e4d104e..dfe2cb6dea 100644
--- a/src/test/subscription/t/013_partition.pl
+++ b/src/test/subscription/t/013_partition.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 70;
+use Test::More tests => 71;
 
 # setup
 
@@ -856,3 +856,17 @@ $node_publisher->wait_for_catchup('sub2');
 $result = $node_subscriber2->safe_psql('postgres',
 	"SELECT a, b, c FROM tab5 ORDER BY 1");
 is($result, qq(3||1), 'updates of tab5 replicated correctly after altering table on publisher');
+
+# Test that replication works correctly as long as the leaf partition
+# has the necessary REPLICA IDENTITY, even though the actual target
+# partitioned table does not.
+$node_subscriber2->safe_psql('postgres',
+	"ALTER TABLE tab5 REPLICA IDENTITY NOTHING");
+
+$node_publisher->safe_psql('postgres', "UPDATE tab5 SET a = 4 WHERE a = 3");
+
+$node_publisher->wait_for_catchup('sub2');
+
+$result = $node_subscriber2->safe_psql('postgres',
+	"SELECT a, b, c FROM tab5_1 ORDER BY 1");
+is($result, qq(4||1), 'updates of tab5 replicated correctly');
-- 
2.18.4

#34Amit Langote
amitlangote09@gmail.com
In reply to: shiy.fnst@fujitsu.com (#33)
Re: Replica Identity check of partition table on subscriber

On Mon, Jun 20, 2022 at 3:46 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Mon, Jun 20, 2022 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One minor comment:
+ /*
+ * If it is a partitioned table, we don't check it, we will check its
+ * partition later.
+ */

Can we change the above comment to: "For partitioned tables, we only
need to care if the target partition is updatable (aka has PK or RI
defined for it)."?

Thanks for your comment. Modified in the attached patches.

How about: ...target "leaf" partition is updatable

Regarding the commit message's top line, which is this:

Fix partition table's RI checking on the subscriber.

I think it should spell out REPLICA IDENTITY explicitly to avoid the
commit being confused to have to do with "Referential Integrity
checking".

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#34)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 21, 2022 at 7:49 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jun 20, 2022 at 3:46 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:

On Mon, Jun 20, 2022 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

One minor comment:
+ /*
+ * If it is a partitioned table, we don't check it, we will check its
+ * partition later.
+ */

Can we change the above comment to: "For partitioned tables, we only
need to care if the target partition is updatable (aka has PK or RI
defined for it)."?

Thanks for your comment. Modified in the attached patches.

How about: ...target "leaf" partition is updatable

I am not very sure if this is an improvement over the current.

Regarding the commit message's top line, which is this:

Fix partition table's RI checking on the subscriber.

I think it should spell out REPLICA IDENTITY explicitly to avoid the
commit being confused to have to do with "Referential Integrity
checking".

This makes sense. I'll take care of this.

--
With Regards,
Amit Kapila.

#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#35)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 21, 2022 at 8:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 21, 2022 at 7:49 AM Amit Langote <amitlangote09@gmail.com> wrote:

I think it should spell out REPLICA IDENTITY explicitly to avoid the
commit being confused to have to do with "Referential Integrity
checking".

This makes sense. I'll take care of this.

After pushing this patch, buildfarm member prion has failed.
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&amp;br=HEAD

It seems to me that the problem could be due to the reason that the
entry returned by logicalrep_partition_open() may not have the correct
value for localrel when we found the entry and localrelvalid is also
true. The point is that before this commit we never use localrel value
from the rel entry returned by logicalrep_partition_open. I think we
need to always update the localrel value in
logicalrep_partition_open().

--
With Regards,
Amit Kapila.

#37houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#36)
1 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Tuesday, June 21, 2022 1:29 PM Amit Kapila <amit.kapila16@gmail.com>:

On Tue, Jun 21, 2022 at 8:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 21, 2022 at 7:49 AM Amit Langote <amitlangote09@gmail.com>

wrote:

I think it should spell out REPLICA IDENTITY explicitly to avoid the
commit being confused to have to do with "Referential Integrity
checking".

This makes sense. I'll take care of this.

After pushing this patch, buildfarm member prion has failed.
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&amp;br=HE
AD

It seems to me that the problem could be due to the reason that the
entry returned by logicalrep_partition_open() may not have the correct
value for localrel when we found the entry and localrelvalid is also
true. The point is that before this commit we never use localrel value
from the rel entry returned by logicalrep_partition_open. I think we
need to always update the localrel value in
logicalrep_partition_open().

Agreed.

And I have confirmed that the failure is due to the segmentation violation when
access the cached relation. I reproduced this by using -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE option which was hinted by Tom.

Stack:
#0 check_relation_updatable (rel=0x1cf4548) at worker.c:1745
#1 0x0000000000909cbb in apply_handle_tuple_routing (edata=0x1cbf4e8, remoteslot=0x1cbf908, newtup=0x0, operation=CMD_DELETE) at worker.c:2181
#2 0x00000000009097a5 in apply_handle_delete (s=0x7ffcef7fd730) at worker.c:2005
#3 0x000000000090a794 in apply_dispatch (s=0x7ffcef7fd730) at worker.c:2503
#4 0x000000000090ad43 in LogicalRepApplyLoop (last_received=22299920) at worker.c:2775
#5 0x000000000090c2ab in start_apply (origin_startpos=0) at worker.c:3549
#6 0x000000000090ca8d in ApplyWorkerMain (main_arg=0) at worker.c:3805
#7 0x00000000008c4c64 in StartBackgroundWorker () at bgworker.c:858
#8 0x00000000008ceaeb in do_start_bgworker (rw=0x1c3c6b0) at postmaster.c:5815
#9 0x00000000008cee97 in maybe_start_bgworkers () at postmaster.c:6039
#10 0x00000000008cdf4e in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5204
#11 <signal handler called>
#12 0x00007fd8fbe0d4ab in select () from /lib64/libc.so.6
#13 0x00000000008c9cfb in ServerLoop () at postmaster.c:1770
#14 0x00000000008c96e4 in PostmasterMain (argc=4, argv=0x1c110a0) at postmaster.c:1478
#15 0x00000000007c665b in main (argc=4, argv=0x1c110a0) at main.c:202
(gdb) p rel->localrel->rd_rel
$5 = (Form_pg_class) 0x7f7f7f7f7f7f7f7f

We didn't hit this problem because we only access that relation when we plan to
report an error[1]/* * We are in error mode so it's fine this is somewhat slow. It's better to * give user correct error. */ if (OidIsValid(GetRelationIdentityOrPK(rel->localrel))) and then the worker will restart and cache will be built, so
everything seems OK.

The problem seems already existed and we hit this because we started to access
the cached relation in more places.

I think we should try to update the relation every time as the relation is
opened and closed by caller and here is the patch to do that.

[1]: /* * We are in error mode so it's fine this is somewhat slow. It's better to * give user correct error. */ if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
/*
* We are in error mode so it's fine this is somewhat slow. It's better to
* give user correct error.
*/
if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))

Best regards,
Hou zj

Attachments:

0001-Fix-segmentation-fault.patchapplication/octet-stream; name=0001-Fix-segmentation-fault.patchDownload
From 3d8d09033467eaad7fb1922c740f4934bc52ca3e Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Tue, 21 Jun 2022 13:39:54 +0800
Subject: [PATCH] Fix segmentation fault

When building the partition map cache, we didn't update the cached relation if
the partition map is valid. But since the relation is opened and closed by
caller, it caused segmentation violation when we tried to access the cached
relation which was closed.

Fix it by updating the cached relation every time we try to get the partition
map entry.
---
 src/backend/replication/logical/relation.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5f51170..ac2b090 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -594,20 +594,26 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 														(void *) &partOid,
 														HASH_ENTER, &found);
 
+	if (!found)
+	{
+		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
+		part_entry->partoid = partOid;
+	}
+
 	entry = &part_entry->relmapentry;
 
-	if (found && entry->localrelvalid)
+	/*
+	 * Relation is opened and closed by caller, so we need to always update the
+	 * partrel in case the cached relation was closed.
+	 */
+	entry->localrel = partrel;
+
+	if (entry->localrelvalid)
 		return entry;
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
 
-	if (!found)
-	{
-		memset(part_entry, 0, sizeof(LogicalRepPartMapEntry));
-		part_entry->partoid = partOid;
-	}
-
 	if (!entry->remoterel.remoteid)
 	{
 		int			i;
@@ -629,7 +635,6 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		entry->remoterel.attkeys = bms_copy(remoterel->attkeys);
 	}
 
-	entry->localrel = partrel;
 	entry->localreloid = partOid;
 
 	/*
-- 
2.7.2.windows.1

#38Amit Langote
amitlangote09@gmail.com
In reply to: houzj.fnst@fujitsu.com (#37)
1 attachment(s)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, June 21, 2022 1:29 PM Amit Kapila <amit.kapila16@gmail.com>:

After pushing this patch, buildfarm member prion has failed.
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&amp;br=HE
AD

It seems to me that the problem could be due to the reason that the
entry returned by logicalrep_partition_open() may not have the correct
value for localrel when we found the entry and localrelvalid is also
true. The point is that before this commit we never use localrel value
from the rel entry returned by logicalrep_partition_open. I think we
need to always update the localrel value in
logicalrep_partition_open().

Agreed.

And I have confirmed that the failure is due to the segmentation violation when
access the cached relation. I reproduced this by using -DRELCACHE_FORCE_RELEASE
-DCATCACHE_FORCE_RELEASE option which was hinted by Tom.

Stack:
#0 check_relation_updatable (rel=0x1cf4548) at worker.c:1745
#1 0x0000000000909cbb in apply_handle_tuple_routing (edata=0x1cbf4e8, remoteslot=0x1cbf908, newtup=0x0, operation=CMD_DELETE) at worker.c:2181
#2 0x00000000009097a5 in apply_handle_delete (s=0x7ffcef7fd730) at worker.c:2005
#3 0x000000000090a794 in apply_dispatch (s=0x7ffcef7fd730) at worker.c:2503
#4 0x000000000090ad43 in LogicalRepApplyLoop (last_received=22299920) at worker.c:2775
#5 0x000000000090c2ab in start_apply (origin_startpos=0) at worker.c:3549
#6 0x000000000090ca8d in ApplyWorkerMain (main_arg=0) at worker.c:3805
#7 0x00000000008c4c64 in StartBackgroundWorker () at bgworker.c:858
#8 0x00000000008ceaeb in do_start_bgworker (rw=0x1c3c6b0) at postmaster.c:5815
#9 0x00000000008cee97 in maybe_start_bgworkers () at postmaster.c:6039
#10 0x00000000008cdf4e in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5204
#11 <signal handler called>
#12 0x00007fd8fbe0d4ab in select () from /lib64/libc.so.6
#13 0x00000000008c9cfb in ServerLoop () at postmaster.c:1770
#14 0x00000000008c96e4 in PostmasterMain (argc=4, argv=0x1c110a0) at postmaster.c:1478
#15 0x00000000007c665b in main (argc=4, argv=0x1c110a0) at main.c:202
(gdb) p rel->localrel->rd_rel
$5 = (Form_pg_class) 0x7f7f7f7f7f7f7f7f

We didn't hit this problem because we only access that relation when we plan to
report an error[1] and then the worker will restart and cache will be built, so
everything seems OK.

The problem seems already existed and we hit this because we started to access
the cached relation in more places.

I think we should try to update the relation every time as the relation is
opened and closed by caller and here is the patch to do that.

Thanks for the patch.

I agree it's an old bug. A partition map entry's localrel may point
to a stale Relation pointer, because once the caller had closed the
relation, the relcache subsystem is free to "clear" it, like in the
case of a RELCACHE_FORCE_RELEASE build.

Fixing it the way patch does seems fine, though it feels like
localrelvalid will lose some of its meaning for the partition map
entries -- we will now overwrite localrel even if localrelvalid is
true.

+   /*
+    * Relation is opened and closed by caller, so we need to always update the
+    * partrel in case the cached relation was closed.
+    */
+   entry->localrel = partrel;
+
+   if (entry->localrelvalid)
        return entry;

Maybe we should add a comment here about why it's okay to overwrite
localrel even if localrelvalid is true. How about the following hunk:

@@ -596,8 +596,20 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,

entry = &part_entry->relmapentry;

+   /*
+    * We must always overwrite entry->localrel with the latest partition
+    * Relation pointer, because the Relation pointed to by the old value may
+    * have been cleared after the caller would have closed the partition
+    * relation after the last use of this entry.  Note that localrelvalid is
+    * only updated by the relcache invalidation callback, so it may still be
+    * true irrespective of whether the Relation pointed to by localrel has
+    * been cleared or not.
+    */
    if (found && entry->localrelvalid)
+   {
+       entry->localrel = partrel;
        return entry;
+   }

Attached a patch containing the above to consider as an alternative.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

logicalrep_partition_open-always-set-localrel.patchapplication/octet-stream; name=logicalrep_partition_open-always-set-localrel.patchDownload
diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5f511701d9..46475f3248 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -596,8 +596,20 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 
 	entry = &part_entry->relmapentry;
 
+	/*
+	 * We must always overwrite entry->localrel with the latest partition
+	 * Relation pointer, because the Relation pointed to by the old value may
+	 * have been cleared after the caller would have closed the partition
+	 * relation after the last use of this entry.  Note that localrelvalid is
+	 * only updated by the relcache invalidation callback, so it may still be
+	 * true irrespective of whether the Relation pointed to by localrel has
+	 * been cleared or not.
+	 */
 	if (found && entry->localrelvalid)
+	{
+		entry->localrel = partrel;
 		return entry;
+	}
 
 	/* Switch to longer-lived context. */
 	oldctx = MemoryContextSwitchTo(LogicalRepPartMapContext);
#39houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Langote (#38)
RE: Replica Identity check of partition table on subscriber

On Tuesday, June 21, 2022 3:21 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, June 21, 2022 1:29 PM Amit Kapila <amit.kapila16@gmail.com>:

After pushing this patch, buildfarm member prion has failed.

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prion&amp;br=HE

AD

It seems to me that the problem could be due to the reason that the
entry returned by logicalrep_partition_open() may not have the correct
value for localrel when we found the entry and localrelvalid is also
true. The point is that before this commit we never use localrel value
from the rel entry returned by logicalrep_partition_open. I think we
need to always update the localrel value in
logicalrep_partition_open().

Agreed.

And I have confirmed that the failure is due to the segmentation violation

when

access the cached relation. I reproduced this by using

-DRELCACHE_FORCE_RELEASE

-DCATCACHE_FORCE_RELEASE option which was hinted by Tom.

Stack:
#0 check_relation_updatable (rel=0x1cf4548) at worker.c:1745
#1 0x0000000000909cbb in apply_handle_tuple_routing (edata=0x1cbf4e8,

remoteslot=0x1cbf908, newtup=0x0, operation=CMD_DELETE) at
worker.c:2181

#2 0x00000000009097a5 in apply_handle_delete (s=0x7ffcef7fd730) at

worker.c:2005

#3 0x000000000090a794 in apply_dispatch (s=0x7ffcef7fd730) at

worker.c:2503

#4 0x000000000090ad43 in LogicalRepApplyLoop

(last_received=22299920) at worker.c:2775

#5 0x000000000090c2ab in start_apply (origin_startpos=0) at worker.c:3549
#6 0x000000000090ca8d in ApplyWorkerMain (main_arg=0) at

worker.c:3805

#7 0x00000000008c4c64 in StartBackgroundWorker () at bgworker.c:858
#8 0x00000000008ceaeb in do_start_bgworker (rw=0x1c3c6b0) at

postmaster.c:5815

#9 0x00000000008cee97 in maybe_start_bgworkers () at postmaster.c:6039
#10 0x00000000008cdf4e in sigusr1_handler (postgres_signal_arg=10) at

postmaster.c:5204

#11 <signal handler called>
#12 0x00007fd8fbe0d4ab in select () from /lib64/libc.so.6
#13 0x00000000008c9cfb in ServerLoop () at postmaster.c:1770
#14 0x00000000008c96e4 in PostmasterMain (argc=4, argv=0x1c110a0) at

postmaster.c:1478

#15 0x00000000007c665b in main (argc=4, argv=0x1c110a0) at main.c:202
(gdb) p rel->localrel->rd_rel
$5 = (Form_pg_class) 0x7f7f7f7f7f7f7f7f

We didn't hit this problem because we only access that relation when we plan

to

report an error[1] and then the worker will restart and cache will be built, so
everything seems OK.

The problem seems already existed and we hit this because we started to

access

the cached relation in more places.

I think we should try to update the relation every time as the relation is
opened and closed by caller and here is the patch to do that.

Thanks for the patch.

I agree it's an old bug. A partition map entry's localrel may point
to a stale Relation pointer, because once the caller had closed the
relation, the relcache subsystem is free to "clear" it, like in the
case of a RELCACHE_FORCE_RELEASE build.

Hi,

Thanks for replying.

Fixing it the way patch does seems fine, though it feels like
localrelvalid will lose some of its meaning for the partition map
entries -- we will now overwrite localrel even if localrelvalid is
true.

To me, it seems localrelvalid doesn't have the meaning that the cached relation
pointer is valid. In logicalrep_rel_open(), we also reopen and update the
relation even if the localrelvalid is true.

+   /*
+    * Relation is opened and closed by caller, so we need to always update the
+    * partrel in case the cached relation was closed.
+    */
+   entry->localrel = partrel;
+
+   if (entry->localrelvalid)
return entry;

Maybe we should add a comment here about why it's okay to overwrite
localrel even if localrelvalid is true. How about the following hunk:

@@ -596,8 +596,20 @@ logicalrep_partition_open(LogicalRepRelMapEntry
*root,

entry = &part_entry->relmapentry;

+   /*
+    * We must always overwrite entry->localrel with the latest partition
+    * Relation pointer, because the Relation pointed to by the old value may
+    * have been cleared after the caller would have closed the partition
+    * relation after the last use of this entry.  Note that localrelvalid is
+    * only updated by the relcache invalidation callback, so it may still be
+    * true irrespective of whether the Relation pointed to by localrel has
+    * been cleared or not.
+    */
if (found && entry->localrelvalid)
+   {
+       entry->localrel = partrel;
return entry;
+   }

Attached a patch containing the above to consider as an alternative.

This looks fine to me as well.

Best regards,
Hou zj

#40Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#38)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 21, 2022 at 12:50 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attached a patch containing the above to consider as an alternative.

Thanks, the patch looks good to me. I'll push this after doing some testing.

--
With Regards,
Amit Kapila.

#41shiy.fnst@fujitsu.com
shiy.fnst@fujitsu.com
In reply to: Amit Kapila (#40)
RE: Replica Identity check of partition table on subscriber

On Tuesday, June 21, 2022 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Jun 21, 2022 at 12:50 PM Amit Langote <amitlangote09@gmail.com>
wrote:

On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attached a patch containing the above to consider as an alternative.

Thanks, the patch looks good to me. I'll push this after doing some testing.

The patch looks good to me as well.

I also verified that the patch can be applied cleanly on back-branches and I
confirmed that the bug exists on back branches before this patch and is fixed
after applying this patch. The regression tests also passed with and without
RELCACHE_FORCE_RELEASE option in my machine.

Regards,
Shi yu

#42Amit Langote
amitlangote09@gmail.com
In reply to: houzj.fnst@fujitsu.com (#39)
Re: Replica Identity check of partition table on subscriber

On Tue, Jun 21, 2022 at 5:08 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Tuesday, June 21, 2022 3:21 PM Amit Langote <amitlangote09@gmail.com> wrote:

Thanks for the patch.

I agree it's an old bug. A partition map entry's localrel may point
to a stale Relation pointer, because once the caller had closed the
relation, the relcache subsystem is free to "clear" it, like in the
case of a RELCACHE_FORCE_RELEASE build.

Hi,

Thanks for replying.

Fixing it the way patch does seems fine, though it feels like
localrelvalid will lose some of its meaning for the partition map
entries -- we will now overwrite localrel even if localrelvalid is
true.

To me, it seems localrelvalid doesn't have the meaning that the cached relation
pointer is valid. In logicalrep_rel_open(), we also reopen and update the
relation even if the localrelvalid is true.

Ah, right. I guess only the localrelvalid=false case is really
interesting then. Only in that case, we need to (re-)build other
fields that are computed using localrel. In the localrelvalid=true
case, we don't need to worry about other fields, but still need to
make sure that localrel points to an up to date relcache entry of the
relation.

+   /*
+    * Relation is opened and closed by caller, so we need to always update the
+    * partrel in case the cached relation was closed.
+    */
+   entry->localrel = partrel;
+
+   if (entry->localrelvalid)
return entry;

Maybe we should add a comment here about why it's okay to overwrite
localrel even if localrelvalid is true. How about the following hunk:

@@ -596,8 +596,20 @@ logicalrep_partition_open(LogicalRepRelMapEntry
*root,

entry = &part_entry->relmapentry;

+   /*
+    * We must always overwrite entry->localrel with the latest partition
+    * Relation pointer, because the Relation pointed to by the old value may
+    * have been cleared after the caller would have closed the partition
+    * relation after the last use of this entry.  Note that localrelvalid is
+    * only updated by the relcache invalidation callback, so it may still be
+    * true irrespective of whether the Relation pointed to by localrel has
+    * been cleared or not.
+    */
if (found && entry->localrelvalid)
+   {
+       entry->localrel = partrel;
return entry;
+   }

Attached a patch containing the above to consider as an alternative.

This looks fine to me as well.

Thank you.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#43houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#40)
3 attachment(s)
RE: Replica Identity check of partition table on subscriber

On Tuesday, June 21, 2022 4:49 PM Amit Kapila <amit.kapila16@gmail.com>

On Tue, Jun 21, 2022 at 12:50 PM Amit Langote <amitlangote09@gmail.com>
wrote:

On Tue, Jun 21, 2022 at 3:35 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Attached a patch containing the above to consider as an alternative.

Thanks, the patch looks good to me. I'll push this after doing some testing.

Since the patch has been committed. Attach the last patch to fix the memory leak.

The bug exists on PG10 ~ PG15(HEAD).

For HEAD,PG14,PG13, to fix the memory leak, I think we should use
free_attrmap instead of pfree and release the no-longer-useful attrmap
When rebuilding the map info.

For PG12,PG11,PG10, we only need to add the code to release the
no-longer-useful attrmap when rebuilding the map info. We can still use
pfree() because the attrmap in back-branch is a single array like:

entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

Best regards,
Hou zj

Attachments:

v12-HEAD-PG14-0001-fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v12-HEAD-PG14-0001-fix-memory-leak-about-attrmap.patchDownload
From 3ffa79dbf27aec61fb9d7ac348a19e42a307d06d Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Mon, 13 Jun 2022 14:42:55 +0800
Subject: [PATCH] Fix memory leak about attrmap

When rebuilding the relcache mapping on subscriber, we didn't release the
no longer useful attribute mapping's memory which would result in memory
leak. Fix it by releasing the no longer useful mapping's memory when
rebuilding.

Since attribute mappings was refactored on PG13~HEAD, we should use
free_attrmap instead of pfree to release its memory on these branches. Fix
one place where we still use pfree to release the mapping memory.

Author: Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila, Shi yu
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index 5f511701d9..7346208388 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -144,7 +144,7 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 	bms_free(remoterel->attkeys);
 
 	if (entry->attrmap)
-		pfree(entry->attrmap);
+		free_attrmap(entry->attrmap);
 }
 
 /*
@@ -373,6 +373,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		int			i;
 		Bitmapset  *missingatts;
 
+		/* Release the no-longer-useful attrmap, if any. */
+		if (entry->attrmap)
+		{
+			free_attrmap(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
@@ -608,6 +615,13 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		part_entry->partoid = partOid;
 	}
 
+	/* Release the no-longer-useful attrmap, if any. */
+	if (entry->attrmap)
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
+
 	if (!entry->remoterel.remoteid)
 	{
 		int			i;
-- 
2.18.4

v12-PG13-0001-fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v12-PG13-0001-fix-memory-leak-about-attrmap.patchDownload
From 2ed915ec2653dff6f8e8ddef9793a14744eb6437 Mon Sep 17 00:00:00 2001
From: Shi Yu <shiy.fnst@fujitsu.com>
Date: Wed, 22 Jun 2022 10:03:32 +0800
Subject: [PATCH] Fix memory leak about attrmap

When rebuilding the relcache mapping on subscriber, we didn't release the
no longer useful attribute mapping's memory which would result in memory
leak. Fix it by releasing the no longer useful mapping's memory when
rebuilding.

Since attribute mappings was refactored on PG13~HEAD, we should use
free_attrmap instead of pfree to release its memory on these branches. Fix
one place where we still use pfree to release the mapping memory.

Author: Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila, Shi yu
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index f2258c7667..db1b686fad 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -145,7 +145,7 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry)
 	bms_free(remoterel->attkeys);
 
 	if (entry->attrmap)
-		pfree(entry->attrmap);
+		free_attrmap(entry->attrmap);
 }
 
 /*
@@ -337,6 +337,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		MemoryContext oldctx;
 		int			i;
 
+		/* Release the no-longer-useful attrmap, if any. */
+		if (entry->attrmap)
+		{
+			free_attrmap(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
@@ -588,6 +595,13 @@ logicalrep_partition_open(LogicalRepRelMapEntry *root,
 		part_entry->partoid = partOid;
 	}
 
+	/* Release the no-longer-useful attrmap, if any. */
+	if (entry->attrmap)
+	{
+		free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+	}
+
 	if (!entry->remoterel.remoteid)
 	{
 		int			i;
-- 
2.18.4

v12-PG10-11-12-0001-Fix-memory-leak-about-attrmap.patchapplication/octet-stream; name=v12-PG10-11-12-0001-Fix-memory-leak-about-attrmap.patchDownload
From d1e8d4c7fe49cdf754e5ebad75fa05e9ef17da78 Mon Sep 17 00:00:00 2001
From: "houzj.fnst" <houzj.fnst@cn.fujitsu.com>
Date: Wed, 22 Jun 2022 09:48:20 +0800
Subject: [PATCH] Fix memory leak about attrmap

When rebuilding the relcache mapping on subscriber, we didn't release the
no longer useful attribute mapping's memory which would result in memory
leak. Fix it by releasing the no longer useful mapping's memory when
rebuilding.

Since attribute mappings was refactored on PG13~HEAD, we should use
free_attrmap instead of pfree to release its memory on these branches. Fix
one place where we still use pfree to release the mapping memory.

Author: Hou Zhijie
Reviewed-by: Amit Langote, Amit Kapila, Shi yu
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com
---
 src/backend/replication/logical/relation.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index bc87105..aa41b79 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -260,6 +260,13 @@ logicalrep_rel_open(LogicalRepRelId remoteid, LOCKMODE lockmode)
 		MemoryContext oldctx;
 		int			i;
 
+		/* Release the no-longer-useful attrmap, if any. */
+		if (entry->attrmap)
+		{
+			pfree(entry->attrmap);
+			entry->attrmap = NULL;
+		}
+
 		/* Try to find and lock the relation by name. */
 		relid = RangeVarGetRelid(makeRangeVar(remoterel->nspname,
 											  remoterel->relname, -1),
-- 
2.7.2.windows.1

#44Amit Langote
amitlangote09@gmail.com
In reply to: houzj.fnst@fujitsu.com (#43)
Re: Replica Identity check of partition table on subscriber

Hi,

On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Since the patch has been committed. Attach the last patch to fix the memory leak.

The bug exists on PG10 ~ PG15(HEAD).

For HEAD,PG14,PG13, to fix the memory leak, I think we should use
free_attrmap instead of pfree and release the no-longer-useful attrmap
When rebuilding the map info.

For PG12,PG11,PG10, we only need to add the code to release the
no-longer-useful attrmap when rebuilding the map info. We can still use
pfree() because the attrmap in back-branch is a single array like:

entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

LGTM, thank you.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#45Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Langote (#44)
Re: Replica Identity check of partition table on subscriber

On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Since the patch has been committed. Attach the last patch to fix the memory leak.

The bug exists on PG10 ~ PG15(HEAD).

For HEAD,PG14,PG13, to fix the memory leak, I think we should use
free_attrmap instead of pfree and release the no-longer-useful attrmap
When rebuilding the map info.

For PG12,PG11,PG10, we only need to add the code to release the
no-longer-useful attrmap when rebuilding the map info. We can still use
pfree() because the attrmap in back-branch is a single array like:

entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

LGTM, thank you.

LGTM as well. One thing I am not completely sure about is whether to
make this change in PG10 for which the final release is in Nov?
AFAICS, the leak can only occur after the relcache invalidation on the
subscriber which may or may not be a very frequent case. What do you
guys think?

Personally, I feel it is good to fix it in all branches including PG10.

--
With Regards,
Amit Kapila.

#46Amit Langote
amitlangote09@gmail.com
In reply to: Amit Kapila (#45)
Re: Replica Identity check of partition table on subscriber

On Wed, Jun 22, 2022 at 8:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Since the patch has been committed. Attach the last patch to fix the memory leak.

The bug exists on PG10 ~ PG15(HEAD).

For HEAD,PG14,PG13, to fix the memory leak, I think we should use
free_attrmap instead of pfree and release the no-longer-useful attrmap
When rebuilding the map info.

For PG12,PG11,PG10, we only need to add the code to release the
no-longer-useful attrmap when rebuilding the map info. We can still use
pfree() because the attrmap in back-branch is a single array like:

entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

LGTM, thank you.

LGTM as well. One thing I am not completely sure about is whether to
make this change in PG10 for which the final release is in Nov?
AFAICS, the leak can only occur after the relcache invalidation on the
subscriber which may or may not be a very frequent case. What do you
guys think?

Agree that the leak does not seem very significant, though...

Personally, I feel it is good to fix it in all branches including PG10.

...yes, why not.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#47houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#45)
RE: Replica Identity check of partition table on subscriber

On Wednesday, June 22, 2022 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com>
wrote:

On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Since the patch has been committed. Attach the last patch to fix the

memory leak.

The bug exists on PG10 ~ PG15(HEAD).

For HEAD,PG14,PG13, to fix the memory leak, I think we should use
free_attrmap instead of pfree and release the no-longer-useful
attrmap When rebuilding the map info.

For PG12,PG11,PG10, we only need to add the code to release the
no-longer-useful attrmap when rebuilding the map info. We can still
use
pfree() because the attrmap in back-branch is a single array like:

entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

LGTM, thank you.

LGTM as well. One thing I am not completely sure about is whether to make this
change in PG10 for which the final release is in Nov?
AFAICS, the leak can only occur after the relcache invalidation on the subscriber
which may or may not be a very frequent case. What do you guys think?

Personally, I feel it is good to fix it in all branches including PG10.

+1

Best regards,
Hou zj

#48Amit Kapila
amit.kapila16@gmail.com
In reply to: houzj.fnst@fujitsu.com (#47)
Re: Replica Identity check of partition table on subscriber

On Wed, Jun 22, 2022 at 5:05 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, June 22, 2022 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jun 22, 2022 at 10:09 AM Amit Langote <amitlangote09@gmail.com>
wrote:

On Wed, Jun 22, 2022 at 12:02 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Since the patch has been committed. Attach the last patch to fix the

memory leak.

The bug exists on PG10 ~ PG15(HEAD).

For HEAD,PG14,PG13, to fix the memory leak, I think we should use
free_attrmap instead of pfree and release the no-longer-useful
attrmap When rebuilding the map info.

For PG12,PG11,PG10, we only need to add the code to release the
no-longer-useful attrmap when rebuilding the map info. We can still
use
pfree() because the attrmap in back-branch is a single array like:

entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));

LGTM, thank you.

LGTM as well. One thing I am not completely sure about is whether to make this
change in PG10 for which the final release is in Nov?
AFAICS, the leak can only occur after the relcache invalidation on the subscriber
which may or may not be a very frequent case. What do you guys think?

Personally, I feel it is good to fix it in all branches including PG10.

+1

Pushed!

--
With Regards,
Amit Kapila.