bug in update tuple routing with foreign partitions

Started by Amit Langotealmost 7 years ago19 messages

Langote_Amit_f8@lab.ntt.co.jp

almost 7 years ago

2 attachment(s)

Hi,

(added Fujita-san)

I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
to use for partition routing targets. Specifically, the bug occurs when
UPDATE targets include a foreign partition that is locally modified (as
opposed to being modified directly on the remove server) and its
ResultRelInfo (called subplan result rel in the source code) is reused for
tuple routing:

-- setup
create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2base (a int check (a = 2));
create foreign table p2 partition of p for values in (2) server loopback
options (table_name 'p2base');
insert into p values (1), (2);

-- see in the plan that foreign partition p2 is locally modified
explain verbose update p set a = 2 from (values (1), (2)) s(x) where a =
s.x returning *;
QUERY PLAN

─────────────────────────────────────────────────────────────────────────────────
Update on public.p (cost=0.05..236.97 rows=50 width=42)
Output: p1.a, "*VALUES*".column1
Update on public.p1
Foreign Update on public.p2
Remote SQL: UPDATE public.p2base SET a = $2 WHERE ctid = $1 RETURNING a
-> Hash Join (cost=0.05..45.37 rows=26 width=42)
Output: 2, p1.ctid, "*VALUES*".*, "*VALUES*".column1
Hash Cond: (p1.a = "*VALUES*".column1)
-> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
Output: p1.ctid, p1.a
-> Hash (cost=0.03..0.03 rows=2 width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Hash Join (cost=100.05..191.59 rows=24 width=42)
Output: 2, p2.ctid, "*VALUES*".*, "*VALUES*".column1
Hash Cond: (p2.a = "*VALUES*".column1)
-> Foreign Scan on public.p2 (cost=100.00..182.27 rows=2409
width=10)
Output: p2.ctid, p2.a
Remote SQL: SELECT a, ctid FROM public.p2base FOR UPDATE
-> Hash (cost=0.03..0.03 rows=2 width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=32)
Output: "*VALUES*".*, "*VALUES*".column1

-- as opposed to directly on remote side (because there's no local join)
explain verbose update p set a = 2 returning *;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────
Update on public.p (cost=0.00..227.40 rows=5280 width=10)
Output: p1.a
Update on public.p1
Foreign Update on public.p2
-> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
Output: 2, p1.ctid
-> Foreign Update on public.p2 (cost=100.00..191.90 rows=2730 width=10)
Remote SQL: UPDATE public.p2base SET a = 2 RETURNING a
(8 rows)

Running the first update query crashes:

update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning
tableoid::regclass, *;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The problem seems to occur because ExecSetupPartitionTupleRouting thinks
it can reuse p2's ResultRelInfo for tuple routing. In this case, it can't
be used, because its ri_FdwState contains information that will be needed
for postgresExecForeignUpdate to work, but it's still used today. Because
it's assigned to be used for tuple routing, its ri_FdwState will be
overwritten by postgresBeginForeignInsert that's invoked by the tuple
routing code using the aforementioned ResultRelInfo. Crash occurs when
postgresExecForeignUpdate () can't find the information it's expecting in
the ri_FdwState.

The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
subplan result rel if its ri_FdwState is already set. I was wondering if
it should also check ri_usesFdwDirectModify is true, but in that case,
ri_FdwState is unused, so it's perhaps safe for tuple routing code to
scribble on it.

I have attached 2 patches, one for PG 11 where the problem first appeared
and another for HEAD. The patch for PG 11 is significantly bigger due to
having to handle the complexities of mapping of subplan result rel indexes
to leaf partition indexes. Most of that complexity is gone in HEAD due to
3f2393edefa5, so the patch for HEAD is much simpler. I've also added a
test in postgres_fdw.sql to exercise this test case.

Thanks,
Amit

Attachments:

pg11-bug-update-tuple-routing-with-foreign-parts.patchtext/plain; charset=UTF-8; name=pg11-bug-update-tuple-routing-with-foreign-parts.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index cdd788f825..b108aad897 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7730,6 +7730,45 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *;
  1 | qux triggered !
 (1 row)
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
+
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+ a |        b        | x 
+---+-----------------+---
+ 1 | qux triggered ! | 1
+(1 row)
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition isn't a subplan target rel
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 813286bba5..a27783cede 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1932,6 +1932,13 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *;
 -- The new values are concatenated with ' triggered !'
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 011e3cff1a..090f86c340 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -136,6 +136,11 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	{
 		update_rri = mtstate->resultRelInfo;
 		num_update_rri = list_length(node->plans);
+		/*
+		 * Each of these will be set to either the leaf partition index
+		 * to which a given subplan resultrel is mapped to or -1 if it
+		 * cannot be used for tuple routing.
+		 */
 		proute->subplan_partition_offsets =
 			palloc(num_update_rri * sizeof(int));
 		proute->num_subplan_partition_offsets = num_update_rri;
@@ -174,18 +179,29 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 		if (update_rri_index < num_update_rri &&
 			RelationGetRelid(update_rri[update_rri_index].ri_RelationDesc) == leaf_oid)
 		{
-			leaf_part_rri = &update_rri[update_rri_index];
-
 			/*
-			 * This is required in order to convert the partition's tuple to
-			 * be compatible with the root partitioned table's tuple
-			 * descriptor.  When generating the per-subplan result rels, this
-			 * was not set.
+			 * Skip if an FDW is already using it.  We must use a different
+			 * one for tuple routing, because the latter will need its own
+			 * FdwState.
 			 */
-			leaf_part_rri->ri_PartitionRoot = rel;
+			if (update_rri[update_rri_index].ri_FdwState == NULL)
+			{
+				leaf_part_rri = &update_rri[update_rri_index];
 
-			/* Remember the subplan offset for this ResultRelInfo */
-			proute->subplan_partition_offsets[update_rri_index] = i;
+				/*
+				 * This is required in order to convert the partition's tuple
+				 * to be compatible with the root partitioned table's tuple
+				 * descriptor.  When generating the per-subplan result rels,
+				 * this was not set.
+				 */
+				leaf_part_rri->ri_PartitionRoot = rel;
+
+				/* Remember the subplan offset for this ResultRelInfo */
+				proute->subplan_partition_offsets[update_rri_index] = i;
+			}
+			else
+				/* Mark this subplan's resultrel as unused. */
+				proute->subplan_partition_offsets[update_rri_index] = -1;
 
 			update_rri_index++;
 		}
@@ -195,8 +211,9 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel)
 	}
 
 	/*
-	 * For UPDATE, we should have found all the per-subplan resultrels in the
-	 * leaf partitions.  (If this is an INSERT, both values will be zero.)
+	 * For UPDATE, we should have gone through all the per-subplan resultrels
+	 * while matching against leaf partitions.  (If this is an INSERT, both
+	 * values will be zero.)
 	 */
 	Assert(update_rri_index == num_update_rri);
 
@@ -892,21 +909,26 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
 														   resultRelInfo);
 
 		/*
-		 * If this result rel is one of the UPDATE subplan result rels, let
-		 * ExecEndPlan() close it. For INSERT or COPY,
-		 * proute->subplan_partition_offsets will always be NULL. Note that
-		 * the subplan_partition_offsets array and the partitions array have
-		 * the partitions in the same order. So, while we iterate over
-		 * partitions array, we also iterate over the
-		 * subplan_partition_offsets array in order to figure out which of the
-		 * result rels are present in the UPDATE subplans.
+		 * If this partition routing result rel is one of the UPDATE subplan
+		 * result rels, don't close it here, ExecEndPlan() will close it.
+		 * For INSERT or COPY, proute->subplan_partition_offsets will always
+		 * be NULL, because there are no subplan result rels in that case.
+		 * Note that since partitions appear in the same order in both the
+		 * subplan_partition_offsets array and the the partitions array, it's
+		 * okay to match them like this.
 		 */
 		if (proute->subplan_partition_offsets &&
-			subplan_index < proute->num_subplan_partition_offsets &&
-			proute->subplan_partition_offsets[subplan_index] == i)
+			subplan_index < proute->num_subplan_partition_offsets)
 		{
-			subplan_index++;
-			continue;
+			/* skip subplan result rels that were not reused. */
+			while (proute->subplan_partition_offsets[subplan_index] < 0)
+				subplan_index++;
+
+			if (proute->subplan_partition_offsets[subplan_index] == i)
+			{
+				subplan_index++;
+				continue;
+			}
 		}
 
 		ExecCloseIndices(resultRelInfo);

HEAD-bug-update-tuple-routing-with-foreign-parts.patchtext/plain; charset=UTF-8; name=HEAD-bug-update-tuple-routing-with-foreign-parts.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 42108bd3d4..6471d69063 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7828,6 +7828,45 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *;
  1 | qux triggered !
 (1 row)
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
+
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+ a |        b        | x 
+---+-----------------+---
+ 1 | qux triggered ! | 1
+(1 row)
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition isn't a subplan target rel
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index eb9d1ad59d..a18b0cc097 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1971,6 +1971,13 @@ update utrtest set a = 1 where a = 1 or a = 2 returning *;
 -- The new values are concatenated with ' triggered !'
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
 
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+
 delete from utrtest;
 insert into utrtest values (2, 'qux');
 
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e41801662b..967a8f78b6 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -466,6 +466,14 @@ ExecHashSubPlanResultRelsByOid(ModifyTableState *mtstate,
 		Oid			partoid = RelationGetRelid(rri->ri_RelationDesc);
 		SubplanResultRelHashElem   *elem;
 
+		/*
+		 * Skip if an FDW is already using it.  We must use a different
+		 * one for tuple routing, because the latter will need its own
+		 * FdwState.
+		 */
+		if (rri->ri_FdwState)
+			continue;
+
 		elem = (SubplanResultRelHashElem *)
 			hash_search(htab, &partoid, HASH_ENTER, &found);
 		Assert(!found);

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 7 years ago

In reply to: Amit Langote (#1)

Re: bug in update tuple routing with foreign partitions

Hi Amit,

(2019/03/06 18:33), Amit Langote wrote:

I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
to use for partition routing targets. Specifically, the bug occurs when
UPDATE targets include a foreign partition that is locally modified (as
opposed to being modified directly on the remove server) and its
ResultRelInfo (called subplan result rel in the source code) is reused for
tuple routing:

Will look into this.

Best regards,
Etsuro Fujita

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 7 years ago

In reply to: Amit Langote (#1)

1 attachment(s)

Re: bug in update tuple routing with foreign partitions

(2019/03/06 18:33), Amit Langote wrote:

I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
to use for partition routing targets. Specifically, the bug occurs when
UPDATE targets include a foreign partition that is locally modified (as
opposed to being modified directly on the remove server) and its
ResultRelInfo (called subplan result rel in the source code) is reused for
tuple routing:

-- setup
create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2base (a int check (a = 2));
create foreign table p2 partition of p for values in (2) server loopback
options (table_name 'p2base');
insert into p values (1), (2);

-- see in the plan that foreign partition p2 is locally modified
explain verbose update p set a = 2 from (values (1), (2)) s(x) where a =
s.x returning *;
QUERY PLAN

─────────────────────────────────────────────────────────────────────────────────
Update on public.p (cost=0.05..236.97 rows=50 width=42)
Output: p1.a, "*VALUES*".column1
Update on public.p1
Foreign Update on public.p2
Remote SQL: UPDATE public.p2base SET a = $2 WHERE ctid = $1 RETURNING a
-> Hash Join (cost=0.05..45.37 rows=26 width=42)
Output: 2, p1.ctid, "*VALUES*".*, "*VALUES*".column1
Hash Cond: (p1.a = "*VALUES*".column1)
-> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
Output: p1.ctid, p1.a
-> Hash (cost=0.03..0.03 rows=2 width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Hash Join (cost=100.05..191.59 rows=24 width=42)
Output: 2, p2.ctid, "*VALUES*".*, "*VALUES*".column1
Hash Cond: (p2.a = "*VALUES*".column1)
-> Foreign Scan on public.p2 (cost=100.00..182.27 rows=2409
width=10)
Output: p2.ctid, p2.a
Remote SQL: SELECT a, ctid FROM public.p2base FOR UPDATE
-> Hash (cost=0.03..0.03 rows=2 width=32)
Output: "*VALUES*".*, "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2
width=32)
Output: "*VALUES*".*, "*VALUES*".column1

-- as opposed to directly on remote side (because there's no local join)
explain verbose update p set a = 2 returning *;
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────
Update on public.p (cost=0.00..227.40 rows=5280 width=10)
Output: p1.a
Update on public.p1
Foreign Update on public.p2
-> Seq Scan on public.p1 (cost=0.00..35.50 rows=2550 width=10)
Output: 2, p1.ctid
-> Foreign Update on public.p2 (cost=100.00..191.90 rows=2730 width=10)
Remote SQL: UPDATE public.p2base SET a = 2 RETURNING a
(8 rows)

Running the first update query crashes:

update p set a = 2 from (values (1), (2)) s(x) where a = s.x returning
tableoid::regclass, *;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The problem seems to occur because ExecSetupPartitionTupleRouting thinks
it can reuse p2's ResultRelInfo for tuple routing. In this case, it can't
be used, because its ri_FdwState contains information that will be needed
for postgresExecForeignUpdate to work, but it's still used today. Because
it's assigned to be used for tuple routing, its ri_FdwState will be
overwritten by postgresBeginForeignInsert that's invoked by the tuple
routing code using the aforementioned ResultRelInfo. Crash occurs when
postgresExecForeignUpdate () can't find the information it's expecting in
the ri_FdwState.

Agreed, as I said before in another thread.

The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
subplan result rel if its ri_FdwState is already set.

I'm not sure that is a good idea, because that requires to create a new
ResultRelInfo, which is not free; I think it requires a lot of work.
Another solution to avoid that would be to store the fmstate created by
postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
ri_FdwState, like the attached. This would not need any changes to the
core, and this would not cause any overheads either, IIUC. What do you
think about that?

I have attached 2 patches, one for PG 11 where the problem first appeared
and another for HEAD. The patch for PG 11 is significantly bigger due to
having to handle the complexities of mapping of subplan result rel indexes
to leaf partition indexes. Most of that complexity is gone in HEAD due to
3f2393edefa5, so the patch for HEAD is much simpler.

Thanks for the patches!

I've also added a
test in postgres_fdw.sql to exercise this test case.

Thanks again, but the test case you added works well without any fix:

+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x 
returning *;
+                                  QUERY PLAN

+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 
RETURNING a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)

In this test case, the foreign partition is updated before ri_FdwState
is overwritten by postgresBeginForeignInsert() that's invoked by the
tuple routing code, so it would work without any fix. So I modified
this test case as such.

Sorry for the long delay, again.

Best regards,
Etsuro Fujita

Attachments:

HEAD-bug-update-tuple-routing-with-foreign-parts-efujita.patchtext/x-patch; name=HEAD-bug-update-tuple-routing-with-foreign-parts-efujita.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 7733,7738 **** update utrtest set a = 1 where a = 2 returning *;
--- 7733,7789 ----
   1 | qux triggered !
  (1 row)
  
+ delete from utrtest;
+ -- Change the definition of utrtest so that the local partition is updated
+ -- before the foreign partition
+ alter table utrtest detach partition remp;
+ drop foreign table remp;
+ alter table loct drop constraint loct_a_check;
+ alter table loct add check (a in (3));
+ create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+ alter table utrtest attach partition remp for values in (3);
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+ -- Check case where the foreign partition is a subplan target rel and is
+ -- updated with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+                                   QUERY PLAN                                  
+ ------------------------------------------------------------------------------
+  Update on public.utrtest
+    Output: locp.a, locp.b, "*VALUES*".column1
+    Update on public.locp
+    Foreign Update on public.remp
+      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+    ->  Hash Join
+          Output: 3, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+          Hash Cond: (locp.a = "*VALUES*".column1)
+          ->  Seq Scan on public.locp
+                Output: locp.b, locp.ctid, locp.a
+          ->  Hash
+                Output: "*VALUES*".*, "*VALUES*".column1
+                ->  Values Scan on "*VALUES*"
+                      Output: "*VALUES*".*, "*VALUES*".column1
+    ->  Hash Join
+          Output: 3, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+          Hash Cond: (remp.a = "*VALUES*".column1)
+          ->  Foreign Scan on public.remp
+                Output: remp.b, remp.ctid, remp.a
+                Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+          ->  Hash
+                Output: "*VALUES*".*, "*VALUES*".column1
+                ->  Values Scan on "*VALUES*"
+                      Output: "*VALUES*".*, "*VALUES*".column1
+ (24 rows)
+ 
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+  a |         b         | x 
+ ---+-------------------+---
+  3 | qux triggered !   | 2
+  3 | xyzzy triggered ! | 3
+  3 | qux triggered !   | 3
+ (3 rows)
+ 
  drop trigger loct_br_insert_trigger on loct;
  drop table utrtest;
  drop table loct;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 185,190 **** typedef struct PgFdwModifyState
--- 185,194 ----
  
  	/* working memory context */
  	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+ 
+ 	/* for update row movement, if subplan resultrel */
+ 	struct PgFdwModifyState *sub_fmstate;	/* foreign-insert state, if
+ 											 * created */
  } PgFdwModifyState;
  
  /*
***************
*** 1844,1851 **** postgresExecForeignInsert(EState *estate,
  						  TupleTableSlot *slot,
  						  TupleTableSlot *planSlot)
  {
! 	return execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
  								  slot, planSlot);
  }
  
  /*
--- 1848,1871 ----
  						  TupleTableSlot *slot,
  						  TupleTableSlot *planSlot)
  {
! 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
! 	TupleTableSlot *rslot;
! 
! 	/*
! 	 * If the fmstate has sub_fmstate set, it means the foreign table is an
! 	 * UPDATE subplan resultrel, in which case, the sub_fmstate is provided
! 	 * for the INSERT operation on the table, whereas the fmstate is provided
! 	 * for the UPDATE operation on the table; use the sub_fmstate
! 	 */
! 	if (fmstate->sub_fmstate)
! 		resultRelInfo->ri_FdwState = fmstate->sub_fmstate;
! 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
  								  slot, planSlot);
+ 	/* Revert that change */
+ 	if (fmstate->sub_fmstate)
+ 		resultRelInfo->ri_FdwState = fmstate;
+ 
+ 	return rslot;
  }
  
  /*
***************
*** 1983,1989 **** postgresBeginForeignInsert(ModifyTableState *mtstate,
  									retrieved_attrs != NIL,
  									retrieved_attrs);
  
! 	resultRelInfo->ri_FdwState = fmstate;
  }
  
  /*
--- 2003,2021 ----
  									retrieved_attrs != NIL,
  									retrieved_attrs);
  
! 	/*
! 	 * If the given resultRelInfo already has PgFdwModifyState set, it means
! 	 * the foreign table is an UPDATE subplan resultrel; in which case, store
! 	 * the resulting state into the PgFdwModifyState.
! 	 */
! 	if (resultRelInfo->ri_FdwState)
! 	{
! 		Assert(plan && plan->operation == CMD_UPDATE);
! 		Assert(resultRelInfo->ri_usesFdwDirectModify == false);
! 		((PgFdwModifyState *) resultRelInfo->ri_FdwState)->sub_fmstate = fmstate;
! 	}
! 	else
! 		resultRelInfo->ri_FdwState = fmstate;
  }
  
  /*
***************
*** 1998,2003 **** postgresEndForeignInsert(EState *estate,
--- 2030,2044 ----
  
  	Assert(fmstate != NULL);
  
+ 	/*
+ 	 * If the fmstate has sub_fmstate set, it means the foreign table is an
+ 	 * UPDATE subplan resultrel, in which case, the sub_fmstate is provided
+ 	 * for the INSERT operation on the table, whereas the fmstate is provided
+ 	 * for the UPDATE operation on the table; get the sub_fmstate
+ 	 */
+ 	if (fmstate->sub_fmstate)
+ 		fmstate = fmstate->sub_fmstate;
+ 
  	/* Destroy the execution state */
  	finish_foreign_modify(fmstate);
  }
***************
*** 3482,3487 **** create_foreign_modify(EState *estate,
--- 3523,3531 ----
  
  	Assert(fmstate->p_nums <= n_params);
  
+ 	/* Initialize sub execution state */
+ 	fmstate->sub_fmstate = NULL;
+ 
  	return fmstate;
  }
  
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 2013,2018 **** update utrtest set a = 1 where a = 2 returning *;
--- 2013,2038 ----
  -- The new values are concatenated with ' triggered !'
  update utrtest set a = 1 where a = 2 returning *;
  
+ delete from utrtest;
+ 
+ -- Change the definition of utrtest so that the local partition is updated
+ -- before the foreign partition
+ alter table utrtest detach partition remp;
+ drop foreign table remp;
+ alter table loct drop constraint loct_a_check;
+ alter table loct add check (a in (3));
+ create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+ alter table utrtest attach partition remp for values in (3);
+ 
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+ 
+ -- Check case where the foreign partition is a subplan target rel and is
+ -- updated with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+ 
  drop trigger loct_br_insert_trigger on loct;
  
  drop table utrtest;

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

almost 7 years ago

In reply to: Etsuro Fujita (#3)

Re: bug in update tuple routing with foreign partitions

Fujita-san,

Thanks for the review.

On 2019/04/10 17:38, Etsuro Fujita wrote:

(2019/03/06 18:33), Amit Langote wrote:

I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
to use for partition routing targets. Specifically, the bug occurs when
UPDATE targets include a foreign partition that is locally modified (as
opposed to being modified directly on the remove server) and its
ResultRelInfo (called subplan result rel in the source code) is reused for
tuple routing:

The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
subplan result rel if its ri_FdwState is already set.

I'm not sure that is a good idea, because that requires to create a new
ResultRelInfo, which is not free; I think it requires a lot of work.

After considering this a bit more and studying your proposed solution, I
tend to agree. Beside the performance considerations you point out that I
too think are valid, I realize that going with my approach would mean
embedding some assumptions in the core code about ri_FdwState; in this
case, assuming that a subplan result rel's ri_FdwState is not to be
re-purposed for tuple routing insert, which is only based on looking at
postgres_fdw's implementation. Not to mention the ensuing complexity in
the core tuple routing code -- it now needs to account for the fact that
not all subplan result rels may have been re-purposed for tuple-routing,
something that's too complicated to handle in PG 11.

IOW, let's call this a postgres_fdw bug and fix it there as you propose.

Another solution to avoid that would be to store the fmstate created by
postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
ri_FdwState, like the attached. This would not need any changes to the
core, and this would not cause any overheads either, IIUC. What do you
think about that?

+1. Just one comment:

+    /*
+     * If the given resultRelInfo already has PgFdwModifyState set, it means
+     * the foreign table is an UPDATE subplan resultrel; in which case, store
+     * the resulting state into the PgFdwModifyState.
+     */

The last "PgFdwModifyState" should be something else? Maybe,
sub-PgModifyState or sub_fmstate?

I've also added a
test in postgres_fdw.sql to exercise this test case.

Thanks again, but the test case you added works well without any fix:

Oops, I should really adopt a habit of adding a test case before code.

+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
returning *;
+                                  QUERY PLAN
+------------------------------------------------------------------------------

+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)

In this test case, the foreign partition is updated before ri_FdwState is
overwritten by postgresBeginForeignInsert() that's invoked by the tuple
routing code, so it would work without any fix. So I modified this test
case as such.

Thanks for fixing that. I hadn't noticed that foreign partition remp is
updated before local partition locp; should've added a locp0 for values
(0) so that remp appears second in the ModifyTable's subplans. Or, well,
make the foreign table remp appear later by making it a partition for
values in (3) as your patch does.

BTW, have you noticed that the RETURNING clause returns the same row twice?

+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a |         b         | x
+---+-------------------+---
+ 3 | qux triggered !   | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered !   | 3
+(3 rows)

You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?
I checked the behavior with moving into a local partition just to be sure
and the changed row is returned only once.

create table p (a int, b text) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
insert into p values (1, 'p1'), (2, 'p2');

select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p1 │ 1 │ p1
p2 │ 2 │ p2
(2 rows)

update p set a = 2 returning tableoid::regclass, *;
tableoid │ a │ b
──────────┼───┼────
p2 │ 2 │ p1
p2 │ 2 │ p2
(2 rows)

Add a foreign table partition in the mix and that's no longer true.

create extension postgres_fdw ;
create server loopback foreign data wrapper postgres_fdw;
create user mapping for current_user server loopback;
create table p3base (a int check (a = 3), b text);
create foreign table p3 partition of p for values in (3) server loopback
options (table_name 'p3base');
delete from p;
insert into p values (1, 'p1'), (2, 'p2'), (3, 'p3');

select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p1 │ 1 │ p1
p2 │ 2 │ p2
p3 │ 3 │ p3
(3 rows)

update p set a = 3 returning tableoid::regclass, *;
tableoid │ a │ b
──────────┼───┼────
p3 │ 3 │ p1
p3 │ 3 │ p2
p3 │ 3 │ p3
p3 │ 3 │ p1
p3 │ 3 │ p2
(5 rows)

select tableoid::regclass, * from p;
tableoid │ a │ b
──────────┼───┼────
p3 │ 3 │ p3
p3 │ 3 │ p1
p3 │ 3 │ p2
(3 rows)

I'm not sure if it's a bug that ought to be fixed, but thought I should
highlight it just in case.

Thanks,
Amit

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

almost 7 years ago

In reply to: Amit Langote (#4)

Re: bug in update tuple routing with foreign partitions

(2019/04/11 14:33), Amit Langote wrote:

On 2019/04/10 17:38, Etsuro Fujita wrote:

(2019/03/06 18:33), Amit Langote wrote:

I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
to use for partition routing targets. Specifically, the bug occurs when
UPDATE targets include a foreign partition that is locally modified (as
opposed to being modified directly on the remove server) and its
ResultRelInfo (called subplan result rel in the source code) is reused for
tuple routing:

The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
subplan result rel if its ri_FdwState is already set.

I'm not sure that is a good idea, because that requires to create a new
ResultRelInfo, which is not free; I think it requires a lot of work.

After considering this a bit more and studying your proposed solution, I
tend to agree. Beside the performance considerations you point out that I
too think are valid, I realize that going with my approach would mean
embedding some assumptions in the core code about ri_FdwState; in this
case, assuming that a subplan result rel's ri_FdwState is not to be
re-purposed for tuple routing insert, which is only based on looking at
postgres_fdw's implementation.

Yeah, that assumption might not apply to some other FDWs.

Not to mention the ensuing complexity in
the core tuple routing code -- it now needs to account for the fact that
not all subplan result rels may have been re-purposed for tuple-routing,
something that's too complicated to handle in PG 11.

I think so too.

IOW, let's call this a postgres_fdw bug and fix it there as you propose.

Agreed.

Another solution to avoid that would be to store the fmstate created by
postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
ri_FdwState, like the attached. This would not need any changes to the
core, and this would not cause any overheads either, IIUC. What do you
think about that?

+1. Just one comment:
+    /*
+     * If the given resultRelInfo already has PgFdwModifyState set, it means
+     * the foreign table is an UPDATE subplan resultrel; in which case, store
+     * the resulting state into the PgFdwModifyState.
+     */
The last "PgFdwModifyState" should be something else? Maybe,
sub-PgModifyState or sub_fmstate?

OK, will revise. My favorite would be the latter.

I've also added a
test in postgres_fdw.sql to exercise this test case.

Thanks again, but the test case you added works well without any fix:

Oops, I should really adopt a habit of adding a test case before code.
+-- Check case where the foreign partition is a subplan target rel and
+-- foreign parttion is locally modified (target table being joined
+-- locally prevents a direct/remote modification plan).
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
returning *;
+                                  QUERY PLAN
+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
a, b
+   Update on public.locp
+   ->   Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->   Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->   Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->   Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->   Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->   Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->   Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->   Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
In this test case, the foreign partition is updated before ri_FdwState is
overwritten by postgresBeginForeignInsert() that's invoked by the tuple
routing code, so it would work without any fix. So I modified this test
case as such.
Thanks for fixing that. I hadn't noticed that foreign partition remp is
updated before local partition locp; should've added a locp0 for values
(0) so that remp appears second in the ModifyTable's subplans. Or, well,
make the foreign table remp appear later by making it a partition for
values in (3) as your patch does.

BTW, have you noticed that the RETURNING clause returns the same row twice?

I noticed this, but I didn't think it hard. :(

+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a |         b         | x
+---+-------------------+---
+ 3 | qux triggered !   | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered !   | 3
+(3 rows)
You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?

Yeah, this is unexpected behavior, so will look into this. Thanks for
pointing that out!

Best regards,
Etsuro Fujita

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#5)

Re: bug in update tuple routing with foreign partitions

(2019/04/11 20:31), Etsuro Fujita wrote:

(2019/04/11 14:33), Amit Langote wrote:

BTW, have you noticed that the RETURNING clause returns the same row
twice?

I noticed this, but I didn't think it hard. :(
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a | b | x
+---+-------------------+---
+ 3 | qux triggered ! | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered ! | 3
+(3 rows)
You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?
Yeah, this is unexpected behavior, so will look into this.

I think the reason for that is: the row routed to remp is incorrectly
fetched as a to-be-updated row when updating remp as a subplan
targetrel. The right way to fix this would be to have some way in
postgres_fdw in which we don't fetch such rows when updating such
subplan targetrels. I tried to figure out a (simple) way to do that,
but I couldn't. One probably-simple solution I came up with is to sort
subplan targetrels into the order in which foreign-table subplan
targetrels get processed first in ExecModifyTable(). (Note: currently,
rows can be moved from local partitions to a foreign-table partition,
but rows cannot be moved from foreign-table partitions to another
partition, so we wouldn't encounter this situation once we sort like
that.) But I think that's ugly, and I don't think it's a good idea to
change the core, just for postgres_fdw. So what I'm thinking is to
throw an error for cases like this. (Though, I think we should keep to
allow rows to be moved from local partitions to a foreign-table subplan
targetrel that has been updated already.)

What do you think about that?

Best regards,
Etsuro Fujita

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#6)

Re: bug in update tuple routing with foreign partitions

Fujita-san,

On 2019/04/17 21:49, Etsuro Fujita wrote:

(2019/04/11 20:31), Etsuro Fujita wrote:
(2019/04/11 14:33), Amit Langote wrote:

BTW, have you noticed that the RETURNING clause returns the same row
twice?

I noticed this, but I didn't think it hard. :(
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a | b | x
+---+-------------------+---
+ 3 | qux triggered ! | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered ! | 3
+(3 rows)
You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?
Yeah, this is unexpected behavior, so will look into this.

Thanks for investigating.

I think the reason for that is: the row routed to remp is incorrectly
fetched as a to-be-updated row when updating remp as a subplan targetrel.

Yeah. In the fully-local case, that is, where both the source and the
target partition of a row movement operation are local tables, heap AM
ensures that tuples that's moved into a given relation in the same command
(by way of row movement) are not returned as to-be-updated, because it
deems such tuples invisible. The "same command" part being crucial for
that to work.

In the case where the target of a row movement operation is a foreign
table partition, the INSERT used as part of row movement and subsequent
UPDATE of the same foreign table are distinct commands for the remote
server. So, the rows inserted by the 1st command (as part of the row
movement) are deemed visible by the 2nd command (UPDATE) even if both are
operating within the same transaction.

I guess there's no easy way for postgres_fdw to make the remote server
consider them as a single command. IOW, no way to make the remote server
not touch those "moved-in" rows during the UPDATE part of the local query.

The right way to fix this would be to have some way in postgres_fdw in
which we don't fetch such rows when updating such subplan targetrels. I
tried to figure out a (simple) way to do that, but I couldn't.

Yeah, that seems a bit hard to ensure with our current infrastructure.

One
probably-simple solution I came up with is to sort subplan targetrels into
the order in which foreign-table subplan targetrels get processed first in
ExecModifyTable(). (Note: currently, rows can be moved from local
partitions to a foreign-table partition, but rows cannot be moved from
foreign-table partitions to another partition, so we wouldn't encounter
this situation once we sort like that.) But I think that's ugly, and I
don't think it's a good idea to change the core, just for postgres_fdw.

Agreed that it seems like contorting the core code to accommodate
limitations of postgres_fdw.

So what I'm thinking is to throw an error for cases like this. (Though, I
think we should keep to allow rows to be moved from local partitions to a
foreign-table subplan targetrel that has been updated already.)

Hmm, how would you distinguish (totally inside postgred_fdw I presume) the
two cases?

Thanks,
Amit

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Amit Langote (#7)

Re: bug in update tuple routing with foreign partitions

On 2019/04/18 14:06, Amit Langote wrote:

Fujita-san,

On 2019/04/17 21:49, Etsuro Fujita wrote:
(2019/04/11 20:31), Etsuro Fujita wrote:
(2019/04/11 14:33), Amit Langote wrote:

BTW, have you noticed that the RETURNING clause returns the same row
twice?

I noticed this, but I didn't think it hard. :(
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
returning *;
+ a | b | x
+---+-------------------+---
+ 3 | qux triggered ! | 2
+ 3 | xyzzy triggered ! | 3
+ 3 | qux triggered ! | 3
+(3 rows)
You can see that the row that's moved into remp is returned twice (one
with "qux triggered!" in b column), whereas it should've been only once?
Yeah, this is unexpected behavior, so will look into this.
Thanks for investigating.

I think the reason for that is: the row routed to remp is incorrectly
fetched as a to-be-updated row when updating remp as a subplan targetrel.

Yeah. In the fully-local case, that is, where both the source and the
target partition of a row movement operation are local tables, heap AM
ensures that tuples that's moved into a given relation in the same command
(by way of row movement) are not returned as to-be-updated, because it
deems such tuples invisible. The "same command" part being crucial for
that to work.

In the case where the target of a row movement operation is a foreign
table partition, the INSERT used as part of row movement and subsequent
UPDATE of the same foreign table are distinct commands for the remote
server. So, the rows inserted by the 1st command (as part of the row
movement) are deemed visible by the 2nd command (UPDATE) even if both are
operating within the same transaction.

I guess there's no easy way for postgres_fdw to make the remote server
consider them as a single command. IOW, no way to make the remote server
not touch those "moved-in" rows during the UPDATE part of the local query.

The right way to fix this would be to have some way in postgres_fdw in
which we don't fetch such rows when updating such subplan targetrels. I
tried to figure out a (simple) way to do that, but I couldn't.

Yeah, that seems a bit hard to ensure with our current infrastructure.

One
probably-simple solution I came up with is to sort subplan targetrels into
the order in which foreign-table subplan targetrels get processed first in
ExecModifyTable(). (Note: currently, rows can be moved from local
partitions to a foreign-table partition, but rows cannot be moved from
foreign-table partitions to another partition, so we wouldn't encounter
this situation once we sort like that.) But I think that's ugly, and I
don't think it's a good idea to change the core, just for postgres_fdw.

Agreed that it seems like contorting the core code to accommodate
limitations of postgres_fdw.

So what I'm thinking is to throw an error for cases like this. (Though, I
think we should keep to allow rows to be moved from local partitions to a
foreign-table subplan targetrel that has been updated already.)

Hmm, how would you distinguish (totally inside postgred_fdw I presume) the
two cases?

Forgot to say that since this is a separate issue from the original bug
report, maybe we can first finish fixing the latter. What to do you think?

Thanks,
Amit

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Amit Langote (#8)

1 attachment(s)

Re: bug in update tuple routing with foreign partitions

Amit-san,

Thanks for the comments!

(2019/04/18 14:08), Amit Langote wrote:

On 2019/04/18 14:06, Amit Langote wrote:

On 2019/04/17 21:49, Etsuro Fujita wrote:

I think the reason for that is: the row routed to remp is incorrectly
fetched as a to-be-updated row when updating remp as a subplan targetrel.

Yeah. In the fully-local case, that is, where both the source and the
target partition of a row movement operation are local tables, heap AM
ensures that tuples that's moved into a given relation in the same command
(by way of row movement) are not returned as to-be-updated, because it
deems such tuples invisible. The "same command" part being crucial for
that to work.

In the case where the target of a row movement operation is a foreign
table partition, the INSERT used as part of row movement and subsequent
UPDATE of the same foreign table are distinct commands for the remote
server. So, the rows inserted by the 1st command (as part of the row
movement) are deemed visible by the 2nd command (UPDATE) even if both are
operating within the same transaction.

Yeah, I think so too.

I guess there's no easy way for postgres_fdw to make the remote server
consider them as a single command. IOW, no way to make the remote server
not touch those "moved-in" rows during the UPDATE part of the local query.

I agree.

The right way to fix this would be to have some way in postgres_fdw in
which we don't fetch such rows when updating such subplan targetrels. I
tried to figure out a (simple) way to do that, but I couldn't.

Yeah, that seems a bit hard to ensure with our current infrastructure.

Yeah, I think we should leave that for future work.

So what I'm thinking is to throw an error for cases like this. (Though, I
think we should keep to allow rows to be moved from local partitions to a
foreign-table subplan targetrel that has been updated already.)

Hmm, how would you distinguish (totally inside postgred_fdw I presume) the
two cases?

One thing I came up with to do that is this:

@@ -1917,6 +1937,23 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,

initStringInfo(&sql);

+       /*
+        * If the foreign table is an UPDATE subplan resultrel that 
hasn't yet
+        * been updated, routing tuples to the table might yield incorrect
+        * results, because if routing tuples, routed tuples will be 
mistakenly
+        * read from the table and updated twice when updating the table 
--- it
+        * would be nice if we could handle this case; but for now, 
throw an error
+        * for safety.
+        */
+       if (plan && plan->operation == CMD_UPDATE &&
+               (resultRelInfo->ri_usesFdwDirectModify ||
+                resultRelInfo->ri_FdwState) &&
+               resultRelInfo > mtstate->resultRelInfo + 
mtstate->mt_whichplan)
+               ereport(ERROR,
+                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                errmsg("cannot route tuples into 
foreign table to be updated \"%s\"",
+ 
RelationGetRelationName(rel))));

Forgot to say that since this is a separate issue from the original bug
report, maybe we can first finish fixing the latter. What to do you think?

Yeah, I think we can do that, but my favorite would be to fix the two
issues in a single shot, because they seem to me rather close to each
other. So I updated a new version in a single patch, which I'm attaching.

Notes:

* I kept all the changes in the previous patch, because otherwise
postgres_fdw would fail to release resources for a foreign-insert
operation created by postgresBeginForeignInsert() for a tuple-routable
foreign table (ie, a foreign-table subplan resultrel that has been
updated already) during postgresEndForeignInsert().

* I revised a comment according to your previous comment, though I
changed a state name: s/sub_fmstate/aux_fmstate/

* DirectModify also has the latter issue. Here is an example:

postgres=# create table p (a int, b text) partition by list (a);
postgres=# create table p1 partition of p for values in (1);
postgres=# create table p2base (a int check (a = 2 or a = 3), b text);
postgres=# create foreign table p2 partition of p for values in (2, 3)
server loopback options (table_name 'p2base');

postgres=# insert into p values (1, 'foo');
INSERT 0 1
postgres=# explain verbose update p set a = a + 1;
QUERY PLAN
-----------------------------------------------------------------------------
Update on public.p (cost=0.00..176.21 rows=2511 width=42)
Update on public.p1
Foreign Update on public.p2
-> Seq Scan on public.p1 (cost=0.00..25.88 rows=1270 width=42)
Output: (p1.a + 1), p1.b, p1.ctid
-> Foreign Update on public.p2 (cost=100.00..150.33 rows=1241
width=42)
Remote SQL: UPDATE public.p2base SET a = (a + 1)
(7 rows)

postgres=# update p set a = a + 1;
UPDATE 2
postgres=# select * from p;
a | b
---+-----
3 | foo
(1 row)

As shown in the above bit added to postgresBeginForeignInsert(), I
modified the patch so that we throw an error for cases like this even
when using a direct modification plan, and I also added regression test
cases for that.

Best regards,
Etsuro Fujita

Attachments:

HEAD-bug-update-tuple-routing-with-foreign-parts-efujita-2.patchtext/x-patch; name=HEAD-bug-update-tuple-routing-with-foreign-parts-efujita-2.patchDownload

*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***************
*** 7734,7739 **** update utrtest set a = 1 where a = 2 returning *;
--- 7734,7873 ----
  (1 row)
  
  drop trigger loct_br_insert_trigger on loct;
+ delete from utrtest;
+ insert into utrtest values (1, 'foo');
+ insert into utrtest values (2, 'qux');
+ -- We can move rows to a foreign partition that has been updated already,
+ -- but can't move rows to a foreign partition that hasn't yet been updated
+ -- Test the former case:
+ -- with a direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 1 returning *;
+                            QUERY PLAN                            
+ -----------------------------------------------------------------
+  Update on public.utrtest
+    Output: remp.a, remp.b
+    Foreign Update on public.remp
+    Update on public.locp
+    ->  Foreign Update on public.remp
+          Remote SQL: UPDATE public.loct SET a = 1 RETURNING a, b
+    ->  Seq Scan on public.locp
+          Output: 1, locp.b, locp.ctid
+ (8 rows)
+ 
+ update utrtest set a = 1 returning *;
+  a |  b  
+ ---+-----
+  1 | foo
+  1 | qux
+ (2 rows)
+ 
+ delete from utrtest;
+ insert into utrtest values (1, 'foo');
+ insert into utrtest values (2, 'qux');
+ -- with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+                                   QUERY PLAN                                  
+ ------------------------------------------------------------------------------
+  Update on public.utrtest
+    Output: remp.a, remp.b, "*VALUES*".column1
+    Foreign Update on public.remp
+      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+    Update on public.locp
+    ->  Hash Join
+          Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+          Hash Cond: (remp.a = "*VALUES*".column1)
+          ->  Foreign Scan on public.remp
+                Output: remp.b, remp.ctid, remp.a
+                Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+          ->  Hash
+                Output: "*VALUES*".*, "*VALUES*".column1
+                ->  Values Scan on "*VALUES*"
+                      Output: "*VALUES*".*, "*VALUES*".column1
+    ->  Hash Join
+          Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+          Hash Cond: (locp.a = "*VALUES*".column1)
+          ->  Seq Scan on public.locp
+                Output: locp.b, locp.ctid, locp.a
+          ->  Hash
+                Output: "*VALUES*".*, "*VALUES*".column1
+                ->  Values Scan on "*VALUES*"
+                      Output: "*VALUES*".*, "*VALUES*".column1
+ (24 rows)
+ 
+ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+  a |  b  | x 
+ ---+-----+---
+  1 | foo | 1
+  1 | qux | 2
+ (2 rows)
+ 
+ -- Change the definition of utrtest so that the foreign partition get
+ -- processed after the local partition
+ delete from utrtest;
+ alter table utrtest detach partition remp;
+ drop foreign table remp;
+ alter table loct drop constraint loct_a_check;
+ alter table loct add check (a in (3));
+ create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+ alter table utrtest attach partition remp for values in (3);
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+ -- Test the latter case:
+ -- with a direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 returning *;
+                            QUERY PLAN                            
+ -----------------------------------------------------------------
+  Update on public.utrtest
+    Output: locp.a, locp.b
+    Update on public.locp
+    Foreign Update on public.remp
+    ->  Seq Scan on public.locp
+          Output: 3, locp.b, locp.ctid
+    ->  Foreign Update on public.remp
+          Remote SQL: UPDATE public.loct SET a = 3 RETURNING a, b
+ (8 rows)
+ 
+ update utrtest set a = 3 returning *; -- ERROR
+ ERROR:  cannot route tuples into foreign table to be updated "remp"
+ delete from utrtest;
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+ -- with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+                                   QUERY PLAN                                  
+ ------------------------------------------------------------------------------
+  Update on public.utrtest
+    Output: locp.a, locp.b, "*VALUES*".column1
+    Update on public.locp
+    Foreign Update on public.remp
+      Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+    ->  Hash Join
+          Output: 3, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+          Hash Cond: (locp.a = "*VALUES*".column1)
+          ->  Seq Scan on public.locp
+                Output: locp.b, locp.ctid, locp.a
+          ->  Hash
+                Output: "*VALUES*".*, "*VALUES*".column1
+                ->  Values Scan on "*VALUES*"
+                      Output: "*VALUES*".*, "*VALUES*".column1
+    ->  Hash Join
+          Output: 3, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+          Hash Cond: (remp.a = "*VALUES*".column1)
+          ->  Foreign Scan on public.remp
+                Output: remp.b, remp.ctid, remp.a
+                Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+          ->  Hash
+                Output: "*VALUES*".*, "*VALUES*".column1
+                ->  Values Scan on "*VALUES*"
+                      Output: "*VALUES*".*, "*VALUES*".column1
+ (24 rows)
+ 
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
+ ERROR:  cannot route tuples into foreign table to be updated "remp"
  drop table utrtest;
  drop table loct;
  -- Test copy tuple routing
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***************
*** 185,190 **** typedef struct PgFdwModifyState
--- 185,194 ----
  
  	/* working memory context */
  	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+ 
+ 	/* for update row movement, if subplan resultrel */
+ 	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
+ 											 * created */
  } PgFdwModifyState;
  
  /*
***************
*** 1844,1851 **** postgresExecForeignInsert(EState *estate,
  						  TupleTableSlot *slot,
  						  TupleTableSlot *planSlot)
  {
! 	return execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
  								  slot, planSlot);
  }
  
  /*
--- 1848,1871 ----
  						  TupleTableSlot *slot,
  						  TupleTableSlot *planSlot)
  {
! 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
! 	TupleTableSlot *rslot;
! 
! 	/*
! 	 * If the fmstate has aux_fmstate set, it means the foreign table is an
! 	 * UPDATE subplan resultrel, in which case, the aux_fmstate is provided
! 	 * for the INSERT operation on the table, whereas the fmstate is provided
! 	 * for the UPDATE operation on the table; use the aux_fmstate
! 	 */
! 	if (fmstate->aux_fmstate)
! 		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
! 	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
  								  slot, planSlot);
+ 	/* Revert that change */
+ 	if (fmstate->aux_fmstate)
+ 		resultRelInfo->ri_FdwState = fmstate;
+ 
+ 	return rslot;
  }
  
  /*
***************
*** 1917,1922 **** postgresBeginForeignInsert(ModifyTableState *mtstate,
--- 1937,1959 ----
  
  	initStringInfo(&sql);
  
+ 	/*
+ 	 * If the foreign table is an UPDATE subplan resultrel that hasn't yet
+ 	 * been updated, routing tuples to the table might yield incorrect
+ 	 * results, because if routing tuples, routed tuples will be mistakenly
+ 	 * read from the table and updated twice when updating the table --- it
+ 	 * would be nice if we could handle this case; but for now, throw an error
+ 	 * for safety.
+ 	 */
+ 	if (plan && plan->operation == CMD_UPDATE &&
+ 		(resultRelInfo->ri_usesFdwDirectModify ||
+ 		 resultRelInfo->ri_FdwState) &&
+ 		resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("cannot route tuples into foreign table to be updated \"%s\"",
+ 						RelationGetRelationName(rel))));
+ 
  	/* We transmit all columns that are defined in the foreign table. */
  	for (attnum = 1; attnum <= tupdesc->natts; attnum++)
  	{
***************
*** 1983,1989 **** postgresBeginForeignInsert(ModifyTableState *mtstate,
  									retrieved_attrs != NIL,
  									retrieved_attrs);
  
! 	resultRelInfo->ri_FdwState = fmstate;
  }
  
  /*
--- 2020,2038 ----
  									retrieved_attrs != NIL,
  									retrieved_attrs);
  
! 	/*
! 	 * If the given resultRelInfo already has PgFdwModifyState set, it means
! 	 * the foreign table is an UPDATE subplan resultrel; in which case, store
! 	 * the resulting state into the aux_fmstate of the PgFdwModifyState.
! 	 */
! 	if (resultRelInfo->ri_FdwState)
! 	{
! 		Assert(plan && plan->operation == CMD_UPDATE);
! 		Assert(resultRelInfo->ri_usesFdwDirectModify == false);
! 		((PgFdwModifyState *) resultRelInfo->ri_FdwState)->aux_fmstate = fmstate;
! 	}
! 	else
! 		resultRelInfo->ri_FdwState = fmstate;
  }
  
  /*
***************
*** 1998,2003 **** postgresEndForeignInsert(EState *estate,
--- 2047,2061 ----
  
  	Assert(fmstate != NULL);
  
+ 	/*
+ 	 * If the fmstate has aux_fmstate set, it means the foreign table is an
+ 	 * UPDATE subplan resultrel, in which case, the aux_fmstate is provided
+ 	 * for the INSERT operation on the table, whereas the fmstate is provided
+ 	 * for the UPDATE operation on the table; get the aux_fmstate
+ 	 */
+ 	if (fmstate->aux_fmstate)
+ 		fmstate = fmstate->aux_fmstate;
+ 
  	/* Destroy the execution state */
  	finish_foreign_modify(fmstate);
  }
***************
*** 3482,3487 **** create_foreign_modify(EState *estate,
--- 3540,3548 ----
  
  	Assert(fmstate->p_nums <= n_params);
  
+ 	/* Initialize auxiliary state */
+ 	fmstate->aux_fmstate = NULL;
+ 
  	return fmstate;
  }
  
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***************
*** 2015,2020 **** update utrtest set a = 1 where a = 2 returning *;
--- 2015,2069 ----
  
  drop trigger loct_br_insert_trigger on loct;
  
+ delete from utrtest;
+ insert into utrtest values (1, 'foo');
+ insert into utrtest values (2, 'qux');
+ 
+ -- We can move rows to a foreign partition that has been updated already,
+ -- but can't move rows to a foreign partition that hasn't yet been updated
+ 
+ -- Test the former case:
+ -- with a direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 1 returning *;
+ update utrtest set a = 1 returning *;
+ 
+ delete from utrtest;
+ insert into utrtest values (1, 'foo');
+ insert into utrtest values (2, 'qux');
+ 
+ -- with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+ update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+ 
+ -- Change the definition of utrtest so that the foreign partition get
+ -- processed after the local partition
+ delete from utrtest;
+ alter table utrtest detach partition remp;
+ drop foreign table remp;
+ alter table loct drop constraint loct_a_check;
+ alter table loct add check (a in (3));
+ create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+ alter table utrtest attach partition remp for values in (3);
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+ 
+ -- Test the latter case:
+ -- with a direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 returning *;
+ update utrtest set a = 3 returning *; -- ERROR
+ 
+ delete from utrtest;
+ insert into utrtest values (2, 'qux');
+ insert into utrtest values (3, 'xyzzy');
+ 
+ -- with a non-direct modification plan
+ explain (verbose, costs off)
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+ update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
+ 
  drop table utrtest;
  drop table loct;

#10

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#9)

Re: bug in update tuple routing with foreign partitions

(2019/04/18 22:10), Etsuro Fujita wrote:

Notes:

* I kept all the changes in the previous patch, because otherwise
postgres_fdw would fail to release resources for a foreign-insert
operation created by postgresBeginForeignInsert() for a tuple-routable
foreign table (ie, a foreign-table subplan resultrel that has been
updated already) during postgresEndForeignInsert().

I noticed that this explanation was not right. Let me correct myself.
The reason why I kept those changes is: without those changes, we would
fail to release the resources for a foreign-update operation (ie,
fmstate) created by postgresBeginForeignModify(), replaced by the
fmstate for the foreign-insert operation, because when doing
ExecEndPlan(), we first call postgresEndForeignModify() and then call
postgresEndForeignInsert(); so, if we didn't keep those changes, we
would *mistakenly* release the fmstate for the foreign-insert operation
in postgresEndForeignModify(), and we wouldn't do anything about the
fmstate for the foreign-update operation in that function and in the
subsequent postgresEndForeignInsert().

Best regards,
Etsuro Fujita

#11

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#9)

Re: bug in update tuple routing with foreign partitions

Fujita-san,

On 2019/04/18 22:10, Etsuro Fujita wrote:

On 2019/04/18 14:06, Amit Langote wrote:

On 2019/04/17 21:49, Etsuro Fujita wrote:

So what I'm thinking is to throw an error for cases like this.
(Though, I
think we should keep to allow rows to be moved from local partitions to a
foreign-table subplan targetrel that has been updated already.)

Hmm, how would you distinguish (totally inside postgred_fdw I presume) the
two cases?

One thing I came up with to do that is this:

@@ -1917,6 +1937,23 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,

initStringInfo(&sql);
+       /*
+        * If the foreign table is an UPDATE subplan resultrel that hasn't
yet
+        * been updated, routing tuples to the table might yield incorrect
+        * results, because if routing tuples, routed tuples will be
mistakenly
+        * read from the table and updated twice when updating the table
--- it
+        * would be nice if we could handle this case; but for now, throw
an error
+        * for safety.
+        */
+       if (plan && plan->operation == CMD_UPDATE &&
+               (resultRelInfo->ri_usesFdwDirectModify ||
+                resultRelInfo->ri_FdwState) &&
+               resultRelInfo > mtstate->resultRelInfo +
mtstate->mt_whichplan)
+               ereport(ERROR,
+                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                errmsg("cannot route tuples into foreign
table to be updated \"%s\"",
+ RelationGetRelationName(rel))));

Oh, I see.

So IIUC, you're making postgresBeginForeignInsert() check two things:

1. Whether the foreign table it's about to begin inserting (moved) rows
into is a subplan result rel, checked by
(resultRelInfo->ri_usesFdwDirectModify || resultRelInfo->ri_FdwState)

2. Whether the foreign table it's about to begin inserting (moved) rows
into will be updated later, checked by (resultRelInfo >
mtstate->resultRelInfo + mtstate->mt_whichplan)

This still allows a foreign table to receive rows moved from the local
partitions if it has already finished the UPDATE operation.

Seems reasonable to me.

Forgot to say that since this is a separate issue from the original bug
report, maybe we can first finish fixing the latter. What to do you think?

Yeah, I think we can do that, but my favorite would be to fix the two
issues in a single shot, because they seem to me rather close to each
other. So I updated a new version in a single patch, which I'm attaching.

I agree that we can move to fix the other issue right away as the fix you
outlined above seems reasonable, but I wonder if it wouldn't be better to
commit the two fixes separately? The two fixes, although small, are
somewhat complicated and combining them in a single commit might be
confusing. Also, a combined commit might make it harder for the release
note author to list down the exact set of problems being fixed. But I
guess your commit message will make it clear that two distinct problems
are being solved, so maybe that shouldn't be a problem.

Notes:

* I kept all the changes in the previous patch, because otherwise
postgres_fdw would fail to release resources for a foreign-insert
operation created by postgresBeginForeignInsert() for a tuple-routable
foreign table (ie, a foreign-table subplan resultrel that has been updated
already) during postgresEndForeignInsert().

Hmm are you saying that the cases for which we'll still allow tuple
routing (foreign table receiving moved-in rows has already been updated),
there will be two fmstates to be released -- the original fmstate
(UPDATE's) and aux_fmstate (INSERT's)?

* I revised a comment according to your previous comment, though I changed
a state name: s/sub_fmstate/aux_fmstate/

That seems fine to me.

* DirectModify also has the latter issue. Here is an example:

postgres=# create table p (a int, b text) partition by list (a);
postgres=# create table p1 partition of p for values in (1);
postgres=# create table p2base (a int check (a = 2 or a = 3), b text);
postgres=# create foreign table p2 partition of p for values in (2, 3)
server loopback options (table_name 'p2base');

postgres=# insert into p values (1, 'foo');
INSERT 0 1
postgres=# explain verbose update p set a = a + 1;
                                 QUERY PLAN
-----------------------------------------------------------------------------
Update on public.p (cost=0.00..176.21 rows=2511 width=42)
   Update on public.p1
   Foreign Update on public.p2
   -> Seq Scan on public.p1 (cost=0.00..25.88 rows=1270 width=42)
         Output: (p1.a + 1), p1.b, p1.ctid
   -> Foreign Update on public.p2 (cost=100.00..150.33 rows=1241 width=42)
         Remote SQL: UPDATE public.p2base SET a = (a + 1)
(7 rows)

postgres=# update p set a = a + 1;
UPDATE 2
postgres=# select * from p;
a | b
---+-----
3 | foo
(1 row)

Ah, the expected out is "(2, foo)". Also, with RETURNING, you'd get this:

update p set a = a + 1 returning tableoid::regclass, *;
tableoid │ a │ b
──────────┼───┼─────
p2 │ 2 │ foo
p2 │ 3 │ foo
(2 rows)

As shown in the above bit added to postgresBeginForeignInsert(), I
modified the patch so that we throw an error for cases like this even when
using a direct modification plan, and I also added regression test cases
for that.

Thanks for adding detailed tests.

Some mostly cosmetic comments on the code changes:

* In the following comment:

+    /*
+     * If the foreign table is an UPDATE subplan resultrel that hasn't yet
+     * been updated, routing tuples to the table might yield incorrect
+     * results, because if routing tuples, routed tuples will be mistakenly
+     * read from the table and updated twice when updating the table --- it
+     * would be nice if we could handle this case; but for now, throw an
error
+     * for safety.
+     */

the part that start with "because if routing tuples..." reads a bit
unclear to me. How about writing this as:

/*
* If the foreign table we are about to insert routed rows into is
* also an UPDATE result rel and the UPDATE hasn't been performed yet,
* proceeding with the INSERT will result in the later UPDATE
* incorrectly modifying those routed rows, so prevent the INSERT ---
* it would be nice if we could handle this case, but for now, throw
* an error for safety.
*/

I see that in all the hunks involving some manipulation of aux_fmstate,
there's a comment explaining what it is, which seems a bit repetitive. I
can see more or less the same explanation in postgresExecForeignInsert(),
postgresBeginForeignInsert(), and postgresEndForeignInsert(). Maybe just
keep the description in postgresBeginForeignInsert as follows:

@@ -1983,7 +2020,19 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
retrieved_attrs != NIL,
retrieved_attrs);

-    resultRelInfo->ri_FdwState = fmstate;
+    /*
+     * If the given resultRelInfo already has PgFdwModifyState set, it means
+     * the foreign table is an UPDATE subplan resultrel; in which case, store
+     * the resulting state into the aux_fmstate of the PgFdwModifyState.
+     */

and change the other sites to refer to postgresBeginForeingInsert for the
detailed explanation of what aux_fmstate is.

BTW, do you think we should update the docs regarding the newly
established limitation of row movement involving foreign tables?

Thanks,
Amit

#12

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Amit Langote (#11)

Re: bug in update tuple routing with foreign partitions

On 2019/04/19 13:00, Amit Langote wrote:

BTW, do you think we should update the docs regarding the newly
established limitation of row movement involving foreign tables?

Ah, maybe not, because it's a postgres_fdw limitation, not of the core
tuple routing feature. OTOH, we don't mention at all in postgres-fdw.sgml
that postgres_fdw supports tuple routing. Maybe we should and list this
limitation or would it be too much burden to maintain?

Thanks,
Amit

#13

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Amit Langote (#11)

Re: bug in update tuple routing with foreign partitions

(2019/04/19 13:00), Amit Langote wrote:

On 2019/04/18 22:10, Etsuro Fujita wrote:
On 2019/04/18 14:06, Amit Langote wrote:

On 2019/04/17 21:49, Etsuro Fujita wrote:

So what I'm thinking is to throw an error for cases like this.
(Though, I
think we should keep to allow rows to be moved from local partitions to a
foreign-table subplan targetrel that has been updated already.)

Hmm, how would you distinguish (totally inside postgred_fdw I presume) the
two cases?

One thing I came up with to do that is this:

@@ -1917,6 +1937,23 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,

initStringInfo(&sql);
+       /*
+        * If the foreign table is an UPDATE subplan resultrel that hasn't
yet
+        * been updated, routing tuples to the table might yield incorrect
+        * results, because if routing tuples, routed tuples will be
mistakenly
+        * read from the table and updated twice when updating the table
--- it
+        * would be nice if we could handle this case; but for now, throw
an error
+        * for safety.
+        */
+       if (plan&&  plan->operation == CMD_UPDATE&&
+               (resultRelInfo->ri_usesFdwDirectModify ||
+                resultRelInfo->ri_FdwState)&&
+               resultRelInfo>  mtstate->resultRelInfo +
mtstate->mt_whichplan)
+               ereport(ERROR,
+                               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                                errmsg("cannot route tuples into foreign
table to be updated \"%s\"",
+ RelationGetRelationName(rel))));
Oh, I see.

So IIUC, you're making postgresBeginForeignInsert() check two things:

1. Whether the foreign table it's about to begin inserting (moved) rows
into is a subplan result rel, checked by
(resultRelInfo->ri_usesFdwDirectModify || resultRelInfo->ri_FdwState)

2. Whether the foreign table it's about to begin inserting (moved) rows
into will be updated later, checked by (resultRelInfo>
mtstate->resultRelInfo + mtstate->mt_whichplan)

This still allows a foreign table to receive rows moved from the local
partitions if it has already finished the UPDATE operation.

Seems reasonable to me.

Great!

Forgot to say that since this is a separate issue from the original bug
report, maybe we can first finish fixing the latter. What to do you think?

Yeah, I think we can do that, but my favorite would be to fix the two
issues in a single shot, because they seem to me rather close to each
other. So I updated a new version in a single patch, which I'm attaching.

I agree that we can move to fix the other issue right away as the fix you
outlined above seems reasonable, but I wonder if it wouldn't be better to
commit the two fixes separately? The two fixes, although small, are
somewhat complicated and combining them in a single commit might be
confusing. Also, a combined commit might make it harder for the release
note author to list down the exact set of problems being fixed.

OK, I'll separate the patch into two.

Notes:

* I kept all the changes in the previous patch, because otherwise
postgres_fdw would fail to release resources for a foreign-insert
operation created by postgresBeginForeignInsert() for a tuple-routable
foreign table (ie, a foreign-table subplan resultrel that has been updated
already) during postgresEndForeignInsert().

Hmm are you saying that the cases for which we'll still allow tuple
routing (foreign table receiving moved-in rows has already been updated),
there will be two fmstates to be released -- the original fmstate
(UPDATE's) and aux_fmstate (INSERT's)?

Yeah, but I noticed that that explanation was not correct. (I think I
was really in hurry.) See the correction in [1]/messages/by-id/5CB93D3F.6050903@lab.ntt.co.jp.

* I revised a comment according to your previous comment, though I changed
a state name: s/sub_fmstate/aux_fmstate/

That seems fine to me.

Cool!

Some mostly cosmetic comments on the code changes:

* In the following comment:
+    /*
+     * If the foreign table is an UPDATE subplan resultrel that hasn't yet
+     * been updated, routing tuples to the table might yield incorrect
+     * results, because if routing tuples, routed tuples will be mistakenly
+     * read from the table and updated twice when updating the table --- it
+     * would be nice if we could handle this case; but for now, throw an
error
+     * for safety.
+     */
the part that start with "because if routing tuples..." reads a bit
unclear to me. How about writing this as:

/*
* If the foreign table we are about to insert routed rows into is
* also an UPDATE result rel and the UPDATE hasn't been performed yet,
* proceeding with the INSERT will result in the later UPDATE
* incorrectly modifying those routed rows, so prevent the INSERT ---
* it would be nice if we could handle this case, but for now, throw
* an error for safety.
*/

I think that's better than mine; will use that wording.

I see that in all the hunks involving some manipulation of aux_fmstate,
there's a comment explaining what it is, which seems a bit repetitive. I
can see more or less the same explanation in postgresExecForeignInsert(),
postgresBeginForeignInsert(), and postgresEndForeignInsert(). Maybe just
keep the description in postgresBeginForeignInsert as follows:

@@ -1983,7 +2020,19 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
retrieved_attrs != NIL,
retrieved_attrs);
-    resultRelInfo->ri_FdwState = fmstate;
+    /*
+     * If the given resultRelInfo already has PgFdwModifyState set, it means
+     * the foreign table is an UPDATE subplan resultrel; in which case, store
+     * the resulting state into the aux_fmstate of the PgFdwModifyState.
+     */
and change the other sites to refer to postgresBeginForeingInsert for the
detailed explanation of what aux_fmstate is.

Good idea; will do.

Thanks for the comments!

Best regards,
Etsuro Fujita

[1]: /messages/by-id/5CB93D3F.6050903@lab.ntt.co.jp

#14

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Amit Langote (#12)

Re: bug in update tuple routing with foreign partitions

(2019/04/19 13:17), Amit Langote wrote:

OTOH, we don't mention at all in postgres-fdw.sgml
that postgres_fdw supports tuple routing. Maybe we should and list this
limitation or would it be too much burden to maintain?

I think it's better to add this limitation to postgres-fdw.sgml. Will
do. Thanks for pointing that out!

Best regards,
Etsuro Fujita

#15

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#13)

Re: bug in update tuple routing with foreign partitions

On 2019/04/19 14:39, Etsuro Fujita wrote:

(2019/04/19 13:00), Amit Langote wrote:

On 2019/04/18 22:10, Etsuro Fujita wrote:

* I kept all the changes in the previous patch, because otherwise
postgres_fdw would fail to release resources for a foreign-insert
operation created by postgresBeginForeignInsert() for a tuple-routable
foreign table (ie, a foreign-table subplan resultrel that has been updated
already) during postgresEndForeignInsert().

Hmm are you saying that the cases for which we'll still allow tuple
routing (foreign table receiving moved-in rows has already been updated),
there will be two fmstates to be released -- the original fmstate
(UPDATE's) and aux_fmstate (INSERT's)?

Yeah, but I noticed that that explanation was not correct. (I think I was
really in hurry.) See the correction in [1].

Ah, I hadn't noticed your corrected description in [1] even though your
message was in my inbox before I sent my email.

Thanks,
Amit

#16

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#13)

1 attachment(s)

Re: bug in update tuple routing with foreign partitions

(2019/04/19 14:39), Etsuro Fujita wrote:

(2019/04/19 13:00), Amit Langote wrote:
On 2019/04/18 22:10, Etsuro Fujita wrote:
On 2019/04/18 14:06, Amit Langote wrote:

On 2019/04/17 21:49, Etsuro Fujita wrote:

So what I'm thinking is to throw an error for cases like this.
(Though, I
think we should keep to allow rows to be moved from local
partitions to a
foreign-table subplan targetrel that has been updated already.)

Hmm, how would you distinguish (totally inside postgred_fdw I
presume) the
two cases?

One thing I came up with to do that is this:

@@ -1917,6 +1937,23 @@ postgresBeginForeignInsert(ModifyTableState
*mtstate,

initStringInfo(&sql);
+ /*
+ * If the foreign table is an UPDATE subplan resultrel that hasn't
yet
+ * been updated, routing tuples to the table might yield incorrect
+ * results, because if routing tuples, routed tuples will be
mistakenly
+ * read from the table and updated twice when updating the table
--- it
+ * would be nice if we could handle this case; but for now, throw
an error
+ * for safety.
+ */
+ if (plan&& plan->operation == CMD_UPDATE&&
+ (resultRelInfo->ri_usesFdwDirectModify ||
+ resultRelInfo->ri_FdwState)&&
+ resultRelInfo> mtstate->resultRelInfo +
mtstate->mt_whichplan)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot route tuples into foreign
table to be updated \"%s\"",
+ RelationGetRelationName(rel))));
Oh, I see.

So IIUC, you're making postgresBeginForeignInsert() check two things:

1. Whether the foreign table it's about to begin inserting (moved) rows
into is a subplan result rel, checked by
(resultRelInfo->ri_usesFdwDirectModify || resultRelInfo->ri_FdwState)

2. Whether the foreign table it's about to begin inserting (moved) rows
into will be updated later, checked by (resultRelInfo>
mtstate->resultRelInfo + mtstate->mt_whichplan)

This still allows a foreign table to receive rows moved from the local
partitions if it has already finished the UPDATE operation.

Seems reasonable to me.

Forgot to say that since this is a separate issue from the original bug
report, maybe we can first finish fixing the latter. What to do you
think?

Yeah, I think we can do that, but my favorite would be to fix the two
issues in a single shot, because they seem to me rather close to each
other. So I updated a new version in a single patch, which I'm
attaching.

I agree that we can move to fix the other issue right away as the fix you
outlined above seems reasonable, but I wonder if it wouldn't be better to
commit the two fixes separately? The two fixes, although small, are
somewhat complicated and combining them in a single commit might be
confusing. Also, a combined commit might make it harder for the release
note author to list down the exact set of problems being fixed.

OK, I'll separate the patch into two.

After I tried to separate the patch a bit, I changed my mind: I think we
should commit the two issues in a single patch, because the original
issue that overriding fmstate for the UPDATE operation mistakenly by
fmstate for the INSERT operation caused backend crash is fixed by what I
proposed above. So I add the commit message to the previous single
patch, as you suggested.

Some mostly cosmetic comments on the code changes:

* In the following comment:
+ /*
+ * If the foreign table is an UPDATE subplan resultrel that hasn't yet
+ * been updated, routing tuples to the table might yield incorrect
+ * results, because if routing tuples, routed tuples will be mistakenly
+ * read from the table and updated twice when updating the table --- it
+ * would be nice if we could handle this case; but for now, throw an
error
+ * for safety.
+ */
the part that start with "because if routing tuples..." reads a bit
unclear to me. How about writing this as:

/*
* If the foreign table we are about to insert routed rows into is
* also an UPDATE result rel and the UPDATE hasn't been performed yet,
* proceeding with the INSERT will result in the later UPDATE
* incorrectly modifying those routed rows, so prevent the INSERT ---
* it would be nice if we could handle this case, but for now, throw
* an error for safety.
*/
I think that's better than mine; will use that wording.

Done. I simplified your wording a little bit, though.

I see that in all the hunks involving some manipulation of aux_fmstate,
there's a comment explaining what it is, which seems a bit repetitive. I
can see more or less the same explanation in postgresExecForeignInsert(),
postgresBeginForeignInsert(), and postgresEndForeignInsert(). Maybe just
keep the description in postgresBeginForeignInsert as follows:

@@ -1983,7 +2020,19 @@ postgresBeginForeignInsert(ModifyTableState
*mtstate,
retrieved_attrs != NIL,
retrieved_attrs);
- resultRelInfo->ri_FdwState = fmstate;
+ /*
+ * If the given resultRelInfo already has PgFdwModifyState set, it means
+ * the foreign table is an UPDATE subplan resultrel; in which case,
store
+ * the resulting state into the aux_fmstate of the PgFdwModifyState.
+ */
and change the other sites to refer to postgresBeginForeingInsert for the
detailed explanation of what aux_fmstate is.
Good idea; will do.

Done.

Other changes:
* I updated the docs in postgres-fdw.sgml to mention the limitation.
* I did some cleanups for the regression tests.

Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita

Attachments:

HEAD-bug-update-tuple-routing-with-foreign-parts-efujita-3.patchtext/x-patch; name=HEAD-bug-update-tuple-routing-with-foreign-parts-efujita-3.patchDownload

From 77c7d247bbc26a99e6fa2792152cc28dc327c946 Mon Sep 17 00:00:00 2001
From: Etsuro Fujita <efujita@postgresql.org>
Date: Mon, 22 Apr 2019 19:55:26 +0900
Subject: [PATCH] postgres_fdw: Fix incorrect handling of row movement for
 remote partitions.

Commit 3d956d9562 added support for update row movement in postgres_fdw,
and allowed it to insert rows moved from local partitions into a remote
partition.  This patch fixes the following issues introduced by the that
commit:

* When a remote partition chosen to insert rouoted rows into was also an
  UPDATE subplan target rel that would be updated later, the UPDATE that
  uses a direct modification plan modified those routed rows incorrectly
  because those routed rows were visible by the later UPDATE command.
  The right fix for this would be to have some way in postgres_fdw in
  which the later UPDATE command ignores those routed rows, but it seems
  hard to do so with the current infrastructure.  For now throw an error
  in that case.

* When a remote partition chosen to insert routed rows into was also an
  UPDATE subplan target rel, fmstate created for the UPDATE that uses a
  non-direct modification plan was mistakenly overridden by another
  fmstate created for inserting those routed rows into the partition.
  This caused 1) server crash when the partition would be updated later,
  and 2) resource leak when the partition had been already updated.  To
  avoid that, adjust the treatment of the fmstate for the inserting.  As
  for #1, since we would also have the incorrectness issue as mentioned
  above, error out in that case as well.

Update the docs to mention that postgres_fdw currently does not handle
the case where a remote partition chosen to insert a routed row into is
also an UPDATE subplan target rel that will be updated later.

Author: Amit Langote and Etsuro Fujita
Reviewed-by: Amit Langote
Backpatch-through: 11 where the support for row movement was added
Discussion: https://postgr.es/m/21e7eaa4-0d4d-20c2-a1f7-c7e96f4ce440@lab.ntt.co.jp
---
 .../postgres_fdw/expected/postgres_fdw.out    | 131 ++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.c           |  60 +++++++-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  45 ++++++
 doc/src/sgml/postgres-fdw.sgml                |   5 +
 4 files changed, 239 insertions(+), 2 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 3ed52709ee..48ea67aaec 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7734,6 +7734,137 @@ update utrtest set a = 1 where a = 2 returning *;
 (1 row)
 
 drop trigger loct_br_insert_trigger on loct;
+-- We can move rows to a foreign partition that has been updated already,
+-- but can't move rows to a foreign partition that hasn't been updated yet
+delete from utrtest;
+insert into utrtest values (1, 'foo');
+insert into utrtest values (2, 'qux');
+-- Test the former case:
+-- with a direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 1 returning *;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b
+   Foreign Update on public.remp
+   Update on public.locp
+   ->  Foreign Update on public.remp
+         Remote SQL: UPDATE public.loct SET a = 1 RETURNING a, b
+   ->  Seq Scan on public.locp
+         Output: 1, locp.b, locp.ctid
+(8 rows)
+
+update utrtest set a = 1 returning *;
+ a |  b  
+---+-----
+ 1 | foo
+ 1 | qux
+(2 rows)
+
+delete from utrtest;
+insert into utrtest values (1, 'foo');
+insert into utrtest values (2, 'qux');
+-- with a non-direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: remp.a, remp.b, "*VALUES*".column1
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+   Update on public.locp
+   ->  Hash Join
+         Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
+
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+ a |  b  | x 
+---+-----+---
+ 1 | foo | 1
+ 1 | qux | 2
+(2 rows)
+
+-- Change the definition of utrtest so that the foreign partition get updated
+-- after the local partition
+delete from utrtest;
+alter table utrtest detach partition remp;
+drop foreign table remp;
+alter table loct drop constraint loct_a_check;
+alter table loct add check (a in (3));
+create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+alter table utrtest attach partition remp for values in (3);
+insert into utrtest values (2, 'qux');
+insert into utrtest values (3, 'xyzzy');
+-- Test the latter case:
+-- with a direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 3 returning *;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Update on public.utrtest
+   Output: locp.a, locp.b
+   Update on public.locp
+   Foreign Update on public.remp
+   ->  Seq Scan on public.locp
+         Output: 3, locp.b, locp.ctid
+   ->  Foreign Update on public.remp
+         Remote SQL: UPDATE public.loct SET a = 3 RETURNING a, b
+(8 rows)
+
+update utrtest set a = 3 returning *; -- ERROR
+ERROR:  cannot route tuples into foreign table to be updated "remp"
+-- with a non-direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Update on public.utrtest
+   Output: locp.a, locp.b, "*VALUES*".column1
+   Update on public.locp
+   Foreign Update on public.remp
+     Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING a, b
+   ->  Hash Join
+         Output: 3, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (locp.a = "*VALUES*".column1)
+         ->  Seq Scan on public.locp
+               Output: locp.b, locp.ctid, locp.a
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+   ->  Hash Join
+         Output: 3, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
+         Hash Cond: (remp.a = "*VALUES*".column1)
+         ->  Foreign Scan on public.remp
+               Output: remp.b, remp.ctid, remp.a
+               Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
+         ->  Hash
+               Output: "*VALUES*".*, "*VALUES*".column1
+               ->  Values Scan on "*VALUES*"
+                     Output: "*VALUES*".*, "*VALUES*".column1
+(24 rows)
+
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
+ERROR:  cannot route tuples into foreign table to be updated "remp"
 drop table utrtest;
 drop table loct;
 -- Test copy tuple routing
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 25e219dd82..794363c184 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -185,6 +185,10 @@ typedef struct PgFdwModifyState
 
 	/* working memory context */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	/* for update row movement if subplan result rel */
+	struct PgFdwModifyState *aux_fmstate;	/* foreign-insert state, if
+											 * created */
 } PgFdwModifyState;
 
 /*
@@ -1844,8 +1848,22 @@ postgresExecForeignInsert(EState *estate,
 						  TupleTableSlot *slot,
 						  TupleTableSlot *planSlot)
 {
-	return execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
+	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
+	TupleTableSlot *rslot;
+
+	/*
+	 * If the fmstate has aux_fmstate set, use the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate->aux_fmstate;
+	rslot = execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
 								  slot, planSlot);
+	/* Revert that change */
+	if (fmstate->aux_fmstate)
+		resultRelInfo->ri_FdwState = fmstate;
+
+	return rslot;
 }
 
 /*
@@ -1915,6 +1933,22 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 	List	   *retrieved_attrs = NIL;
 	bool		doNothing = false;
 
+	/*
+	 * If the foreign table we are about to insert routed rows into is also
+	 * an UPDATE subplan result rel that will be updated later, proceeding
+	 * with the INSERT will result in the later UPDATE incorrectly modifying
+	 * those routed rows, so prevent the INSERT --- it would be nice if we
+	 * could handle this case; but for now, throw an error for safety.
+	 */
+	if (plan && plan->operation == CMD_UPDATE &&
+		(resultRelInfo->ri_usesFdwDirectModify ||
+		 resultRelInfo->ri_FdwState) &&
+		resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("cannot route tuples into foreign table to be updated \"%s\"",
+						RelationGetRelationName(rel))));
+
 	initStringInfo(&sql);
 
 	/* We transmit all columns that are defined in the foreign table. */
@@ -1983,7 +2017,19 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
 									retrieved_attrs != NIL,
 									retrieved_attrs);
 
-	resultRelInfo->ri_FdwState = fmstate;
+	/*
+	 * If the given resultRelInfo already has PgFdwModifyState set, it means
+	 * the foreign table is an UPDATE subplan result rel; in which case, store
+	 * the resulting state into the aux_fmstate of the PgFdwModifyState.
+	 */
+	if (resultRelInfo->ri_FdwState)
+	{
+		Assert(plan && plan->operation == CMD_UPDATE);
+		Assert(resultRelInfo->ri_usesFdwDirectModify == false);
+		((PgFdwModifyState *) resultRelInfo->ri_FdwState)->aux_fmstate = fmstate;
+	}
+	else
+		resultRelInfo->ri_FdwState = fmstate;
 }
 
 /*
@@ -1998,6 +2044,13 @@ postgresEndForeignInsert(EState *estate,
 
 	Assert(fmstate != NULL);
 
+	/*
+	 * If the fmstate has aux_fmstate set, get the aux_fmstate (see
+	 * postgresBeginForeignInsert())
+	 */
+	if (fmstate->aux_fmstate)
+		fmstate = fmstate->aux_fmstate;
+
 	/* Destroy the execution state */
 	finish_foreign_modify(fmstate);
 }
@@ -3482,6 +3535,9 @@ create_foreign_modify(EState *estate,
 
 	Assert(fmstate->p_nums <= n_params);
 
+	/* Initialize auxiliary state */
+	fmstate->aux_fmstate = NULL;
+
 	return fmstate;
 }
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 3bfcdabc78..127cd30a92 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2015,6 +2015,51 @@ update utrtest set a = 1 where a = 2 returning *;
 
 drop trigger loct_br_insert_trigger on loct;
 
+-- We can move rows to a foreign partition that has been updated already,
+-- but can't move rows to a foreign partition that hasn't been updated yet
+
+delete from utrtest;
+insert into utrtest values (1, 'foo');
+insert into utrtest values (2, 'qux');
+
+-- Test the former case:
+-- with a direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 1 returning *;
+update utrtest set a = 1 returning *;
+
+delete from utrtest;
+insert into utrtest values (1, 'foo');
+insert into utrtest values (2, 'qux');
+
+-- with a non-direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x returning *;
+
+-- Change the definition of utrtest so that the foreign partition get updated
+-- after the local partition
+delete from utrtest;
+alter table utrtest detach partition remp;
+drop foreign table remp;
+alter table loct drop constraint loct_a_check;
+alter table loct add check (a in (3));
+create foreign table remp (a int check (a in (3)), b text) server loopback options (table_name 'loct');
+alter table utrtest attach partition remp for values in (3);
+insert into utrtest values (2, 'qux');
+insert into utrtest values (3, 'xyzzy');
+
+-- Test the latter case:
+-- with a direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 3 returning *;
+update utrtest set a = 3 returning *; -- ERROR
+
+-- with a non-direct modification plan
+explain (verbose, costs off)
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *;
+update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x returning *; -- ERROR
+
 drop table utrtest;
 drop table loct;
 
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index a46fd750f6..e9ce39af64 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -74,6 +74,11 @@
   UPDATE</literal> clause.  However, the <literal>ON CONFLICT DO NOTHING</literal>
   clause is supported, provided a unique index inference specification
   is omitted.
+  Note also that <filename>postgres_fdw</filename> supports row movement
+  invoked by <command>UPDATE</command> statements executed on partitioned
+  tables, but it currently does not handle the case where a remote partition
+  chosen to insert a moved row into is also an <command>UPDATE</command>
+  target partition that will be updated later.
  </para>
 
  <para>
-- 
2.19.2

#17

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#16)

Re: bug in update tuple routing with foreign partitions

Fujita-san,

On 2019/04/22 20:00, Etsuro Fujita wrote:

(2019/04/19 14:39), Etsuro Fujita wrote:

(2019/04/19 13:00), Amit Langote wrote:

I agree that we can move to fix the other issue right away as the fix you
outlined above seems reasonable, but I wonder if it wouldn't be better to
commit the two fixes separately? The two fixes, although small, are
somewhat complicated and combining them in a single commit might be
confusing. Also, a combined commit might make it harder for the release
note author to list down the exact set of problems being fixed.

OK, I'll separate the patch into two.

After I tried to separate the patch a bit, I changed my mind: I think we
should commit the two issues in a single patch, because the original issue
that overriding fmstate for the UPDATE operation mistakenly by fmstate for
the INSERT operation caused backend crash is fixed by what I proposed
above. So I add the commit message to the previous single patch, as you
suggested.

Ah, you're right. The case that would return wrong result, that is now
prevented per the latest patch, is also the case that would crash before.
So, it seems to OK to keep this commit this as one patch. Sorry for the
noise.

I read your commit message and it seems to sufficiently explain the issues
being fixed. Thanks for adding me as an author, but I think the latest
patch is mostly your work, so I'm happy to be listed as just a reviewer. :)

Some mostly cosmetic comments on the code changes:

* In the following comment:
+ /*
+ * If the foreign table is an UPDATE subplan resultrel that hasn't yet
+ * been updated, routing tuples to the table might yield incorrect
+ * results, because if routing tuples, routed tuples will be mistakenly
+ * read from the table and updated twice when updating the table --- it
+ * would be nice if we could handle this case; but for now, throw an
error
+ * for safety.
+ */
the part that start with "because if routing tuples..." reads a bit
unclear to me. How about writing this as:

/*
* If the foreign table we are about to insert routed rows into is
* also an UPDATE result rel and the UPDATE hasn't been performed yet,
* proceeding with the INSERT will result in the later UPDATE
* incorrectly modifying those routed rows, so prevent the INSERT ---
* it would be nice if we could handle this case, but for now, throw
* an error for safety.
*/
I think that's better than mine; will use that wording.
Done. I simplified your wording a little bit, though.

Thanks, looks fine.

Other changes:
* I updated the docs in postgres-fdw.sgml to mention the limitation.

Looks fine.

* I did some cleanups for the regression tests.

Here too.

Please find attached an updated version of the patch.

I don't have any more comments. Thanks for working on this.

Thanks,
Amit

#18

Etsuro Fujita

fujita.etsuro@lab.ntt.co.jp

over 6 years ago

In reply to: Amit Langote (#17)

Re: bug in update tuple routing with foreign partitions

(2019/04/23 10:03), Amit Langote wrote:

So, it seems to OK to keep this commit this as one patch.

I read your commit message and it seems to sufficiently explain the issues
being fixed.

Cool!

Thanks for adding me as an author, but I think the latest
patch is mostly your work, so I'm happy to be listed as just a reviewer. :)

You found this bug, analyzed it, and wrote the first version of the
patch. I heavily modified the patch, but I used your test case, so I
think you deserve the first author of this fix.

I don't have any more comments. Thanks for working on this.

Pushed. Many thanks, Amit-san!

Best regards,
Etsuro Fujita

#19

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

over 6 years ago

In reply to: Etsuro Fujita (#18)

Re: bug in update tuple routing with foreign partitions

Fujita-san,

On 2019/04/24 18:55, Etsuro Fujita wrote:

(2019/04/23 10:03), Amit Langote wrote:

Thanks for adding me as an author, but I think the latest
patch is mostly your work, so I'm happy to be listed as just a reviewer. :)

You found this bug, analyzed it, and wrote the first version of the
patch. I heavily modified the patch, but I used your test case, so I
think you deserve the first author of this fix.

OK, thanks.

Regards,
Amit