Skip adding row-marks for non target tables when result relation is foreign table.
Hi PostgreSQL Community,
I would like to bring to your attention an observation regarding the
planner's behavior for foreign table update/delete operations. It appears
that the planner adds rowmarks (ROW_MARK_COPY) for non-target tables, which
I believe is unnecessary when using the postgres-fdw. This is because
postgres-fdw performs early locking on tuples belonging to the target
foreign table by utilizing the SELECT FOR UPDATE clause.
In an attempt to address this, I tried implementing late locking. However,
this approach still doesn't work as intended because the API assumes that
foreign table rows can be re-fetched using TID (ctid). This assumption is
invalid for partitioned tables on the foreign server. Additionally, the
commit afb9249d06f47d7a6d4a89fea0c3625fe43c5a5d, which introduced late
locking for foreign tables, mentions that the benefits of late locking
against a remote server are unclear, as the extra round trips required are
likely to outweigh any potential concurrency improvements.
To address this issue, I have taken the initiative to create a patch that
prevents the addition of rowmarks for non-target tables when the target
table is using early locking. I would greatly appreciate it if you could
review the patch and provide any feedback or insights I may be overlooking.
Example query plan with my change: (foreign scan doesn't fetch whole row
for bar).
postgres=# \d+ bar
Foreign table "public.bar"
Column | Type | Collation | Nullable | Default | FDW options |
Storage | Stats target | Description
--------+---------+-----------+----------+---------+--------------------+---------+--------------+-------------
b1 | integer | | | | (column_name 'b1') |
plain | |
b2 | integer | | | | (column_name 'b2') |
plain | |
Server: postgres_1
FDW options: (schema_name 'public', table_name 'bar')
router=# \d+ foo
Foreign table "public.foo"
Column | Type | Collation | Nullable | Default | FDW options |
Storage | Stats target | Description
--------+---------+-----------+----------+---------+--------------------+---------+--------------+-------------
f1 | integer | | | | (column_name 'f1') |
plain | |
f2 | integer | | | | (column_name 'f2') |
plain | |
Server: postgres_2
FDW options: (schema_name 'public', table_name 'foo')
postgres=# explain verbose update foo set f1 = b1 from bar where f2=b2;
QUERY PLAN
----------------------------------------------------------------------------------------
Update on public.foo (cost=200.00..48713.72 rows=0 width=0)
Remote SQL: UPDATE public.foo SET f1 = $2 WHERE ctid = $1
-> Nested Loop (cost=200.00..48713.72 rows=15885 width=42)
Output: bar.b1, foo.ctid, foo.*
Join Filter: (foo.f2 = bar.b2)
-> Foreign Scan on public.bar (cost=100.00..673.20 rows=2560
width=8)
Output: bar.b1, bar.b2
Remote SQL: SELECT b1, b2 FROM public.bar
-> Materialize (cost=100.00..389.23 rows=1241 width=42)
Output: foo.ctid, foo.*, foo.f2
-> Foreign Scan on public.foo (cost=100.00..383.02
rows=1241 width=42)
Output: foo.ctid, foo.*, foo.f2
Remote SQL: SELECT f1, f2, ctid FROM public.foo FOR
UPDATE
(13 rows)
Thank you for your time and consideration.
Regards
Saikiran Avula
SDE, Amazon Web Services.
Attachments:
v1-0001-Skip-adding-row-marks-for-not-target-relations-wh.patchapplication/octet-stream; name=v1-0001-Skip-adding-row-marks-for-not-target-relations-wh.patchDownload
From 9b3ebaad986bf51764487103e977be33d25fb818 Mon Sep 17 00:00:00 2001
From: Saikiran Avula <saiavula@amazon.com>
Date: Mon, 6 May 2024 22:29:30 +0100
Subject: [PATCH v1] Skip adding row marks for not target relations when the
result relation is a foreign table using early locking.
If a result relation is a foreign relation, FDWs have 2 ways for locking - early locking (using select for update)
and late locking (using RefetchForeignRow). If FDW uses early locking, then there is no reason for planner
to add row marks and project that column duing execution. Currently we fetch whole row var which is not used.
This change ensures we do add any row mark for non target relations.
---
.../postgres_fdw/expected/postgres_fdw.out | 54 ++++++-------
src/backend/optimizer/plan/planner.c | 76 ++++++++++++++-----
2 files changed, 86 insertions(+), 44 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 078b8a966f..d1d6016491 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -6210,25 +6210,25 @@ UPDATE ft2 AS target SET (c2) = (
EXPLAIN (VERBOSE, COSTS OFF)
UPDATE ft2 d SET c2 = CASE WHEN random() >= 0 THEN d.c2 ELSE 0 END
FROM ft2 AS t WHERE d.c1 = t.c1 AND d.c1 > 1000;
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on public.ft2 d
Remote SQL: UPDATE "S 1"."T 1" SET c2 = $2 WHERE ctid = $1
-> Foreign Scan
- Output: CASE WHEN (random() >= '0'::double precision) THEN d.c2 ELSE 0 END, d.ctid, d.*, t.*
+ Output: CASE WHEN (random() >= '0'::double precision) THEN d.c2 ELSE 0 END, d.ctid, d.*
Relations: (public.ft2 d) INNER JOIN (public.ft2 t)
- Remote SQL: SELECT r1.c2, r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1."C 1" > 1000)))) FOR UPDATE OF r1
+ Remote SQL: SELECT r1.c2, r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r1."C 1" > 1000)))) FOR UPDATE OF r1
-> Hash Join
- Output: d.c2, d.ctid, d.*, t.*
+ Output: d.c2, d.ctid, d.*
Hash Cond: (d.c1 = t.c1)
-> Foreign Scan on public.ft2 d
Output: d.c2, d.ctid, d.*, d.c1
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 1000)) ORDER BY "C 1" ASC NULLS LAST FOR UPDATE
-> Hash
- Output: t.*, t.c1
+ Output: t.c1
-> Foreign Scan on public.ft2 t
- Output: t.*, t.c1
- Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
+ Output: t.c1
+ Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
(17 rows)
UPDATE ft2 d SET c2 = CASE WHEN random() >= 0 THEN d.c2 ELSE 0 END
@@ -6271,31 +6271,31 @@ UPDATE ft2 SET c3 = 'baz'
FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
RETURNING ft2.*, ft4.*, ft5.*; -- can't be pushed down
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------
Update on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
Remote SQL: UPDATE "S 1"."T 1" SET c3 = $2 WHERE ctid = $1 RETURNING "C 1", c2, c3, c4, c5, c6, c7, c8
-> Nested Loop
- Output: 'baz'::text, ft2.ctid, ft2.*, ft4.*, ft5.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
+ Output: 'baz'::text, ft2.ctid, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
Join Filter: (ft2.c2 === ft4.c1)
-> Foreign Scan on public.ft2
Output: ft2.ctid, ft2.*, ft2.c2
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
-> Foreign Scan
- Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
+ Output: ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
Relations: (public.ft4) INNER JOIN (public.ft5)
- Remote SQL: SELECT CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, r2.c1, r2.c2, r2.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r3.c1, r3.c2, r3.c3 FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1))))
+ Remote SQL: SELECT r2.c1, r2.c2, r2.c3, r3.c1, r3.c2, r3.c3 FROM ("S 1"."T 3" r2 INNER JOIN "S 1"."T 4" r3 ON (((r2.c1 = r3.c1))))
-> Hash Join
- Output: ft4.*, ft4.c1, ft4.c2, ft4.c3, ft5.*, ft5.c1, ft5.c2, ft5.c3
+ Output: ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3
Hash Cond: (ft4.c1 = ft5.c1)
-> Foreign Scan on public.ft4
- Output: ft4.*, ft4.c1, ft4.c2, ft4.c3
+ Output: ft4.c1, ft4.c2, ft4.c3
Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
-> Hash
- Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
+ Output: ft5.c1, ft5.c2, ft5.c3
-> Foreign Scan on public.ft5
- Output: ft5.*, ft5.c1, ft5.c2, ft5.c3
+ Output: ft5.c1, ft5.c2, ft5.c3
Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
(24 rows)
@@ -6313,30 +6313,30 @@ DELETE FROM ft2
USING ft4 INNER JOIN ft5 ON (ft4.c1 === ft5.c1)
WHERE ft2.c1 > 2000 AND ft2.c2 = ft4.c1
RETURNING ft2.c1, ft2.c2, ft2.c3; -- can't be pushed down
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Delete on public.ft2
Output: ft2.c1, ft2.c2, ft2.c3
Remote SQL: DELETE FROM "S 1"."T 1" WHERE ctid = $1 RETURNING "C 1", c2, c3
-> Foreign Scan
- Output: ft2.ctid, ft4.*, ft5.*
+ Output: ft2.ctid
Filter: (ft4.c1 === ft5.c1)
Relations: ((public.ft2) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
- Remote SQL: SELECT r1.ctid, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2.c1, r2.c2, r2.c3) END, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r2.c1, r3.c1 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c2 = r2.c1)) AND ((r1."C 1" > 2000)))) INNER JOIN "S 1"."T 4" r3 ON (TRUE)) FOR UPDATE OF r1
+ Remote SQL: SELECT r1.ctid, r2.c1, r3.c1 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c2 = r2.c1)) AND ((r1."C 1" > 2000)))) INNER JOIN "S 1"."T 4" r3 ON (TRUE)) FOR UPDATE OF r1
-> Nested Loop
- Output: ft2.ctid, ft4.*, ft5.*, ft4.c1, ft5.c1
+ Output: ft2.ctid, ft4.c1, ft5.c1
-> Nested Loop
- Output: ft2.ctid, ft4.*, ft4.c1
+ Output: ft2.ctid, ft4.c1
Join Filter: (ft2.c2 = ft4.c1)
-> Foreign Scan on public.ft2
Output: ft2.ctid, ft2.c2
Remote SQL: SELECT c2, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
-> Foreign Scan on public.ft4
- Output: ft4.*, ft4.c1
- Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3"
+ Output: ft4.c1, ft4.c2, ft4.c3
+ Remote SQL: SELECT c1 FROM "S 1"."T 3"
-> Foreign Scan on public.ft5
- Output: ft5.*, ft5.c1
- Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4"
+ Output: ft5.c1, ft5.c2, ft5.c3
+ Remote SQL: SELECT c1 FROM "S 1"."T 4"
(22 rows)
DELETE FROM ft2
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5320da51a0..c4a190fa6c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -22,6 +22,7 @@
#include "access/parallel.h"
#include "access/sysattr.h"
#include "access/table.h"
+#include "access/relation.h"
#include "catalog/pg_aggregate.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_inherits.h"
@@ -255,6 +256,7 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static bool relationUsesEarlyLocking(RangeTblEntry *rte);
/*****************************************************************************
@@ -2269,6 +2271,7 @@ preprocess_rowmarks(PlannerInfo *root)
List *prowmarks;
ListCell *l;
int i;
+ RangeTblEntry *resultRte = NULL;
if (parse->rowMarks)
{
@@ -2343,28 +2346,39 @@ preprocess_rowmarks(PlannerInfo *root)
}
/*
- * Now, add rowmarks for any non-target, non-locked base relations.
+ * Check if the result relation uses early locking. In which case we do not
+ * need to add rowmarks for non target relations.
*/
- i = 0;
- foreach(l, parse->rtable)
+ if (parse->resultRelation > 0)
+ resultRte = rt_fetch(parse->resultRelation, parse->rtable);
+
+ if (!relationUsesEarlyLocking(resultRte))
{
- RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
- PlanRowMark *newrc;
- i++;
- if (!bms_is_member(i, rels))
- continue;
+ /*
+ * Now, add rowmarks for any non-target, non-locked base relations.
+ */
+ i = 0;
+ foreach(l, parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
+ PlanRowMark *newrc;
- newrc = makeNode(PlanRowMark);
- newrc->rti = newrc->prti = i;
- newrc->rowmarkId = ++(root->glob->lastRowMarkId);
- newrc->markType = select_rowmark_type(rte, LCS_NONE);
- newrc->allMarkTypes = (1 << newrc->markType);
- newrc->strength = LCS_NONE;
- newrc->waitPolicy = LockWaitBlock; /* doesn't matter */
- newrc->isParent = false;
+ i++;
+ if (!bms_is_member(i, rels))
+ continue;
- prowmarks = lappend(prowmarks, newrc);
+ newrc = makeNode(PlanRowMark);
+ newrc->rti = newrc->prti = i;
+ newrc->rowmarkId = ++(root->glob->lastRowMarkId);
+ newrc->markType = select_rowmark_type(rte, LCS_NONE);
+ newrc->allMarkTypes = (1 << newrc->markType);
+ newrc->strength = LCS_NONE;
+ newrc->waitPolicy = LockWaitBlock; /* doesn't matter */
+ newrc->isParent = false;
+
+ prowmarks = lappend(prowmarks, newrc);
+ }
}
root->rowMarks = prowmarks;
@@ -2422,6 +2436,34 @@ select_rowmark_type(RangeTblEntry *rte, LockClauseStrength strength)
}
}
+/*
+ * relationUsesEarlyLocking - checks if the relation uses early locking.
+ *
+ * We do early locking for a foreign relation if the RefetchForeignRow API is not implemented by FDW.
+ */
+static bool
+relationUsesEarlyLocking(RangeTblEntry *rte)
+{
+ Relation rel;
+ FdwRoutine *fdwroutine = NULL;
+
+ /* Early locking is applicable for foreign tables. */
+ if (rte == NULL || rte->relkind != RELKIND_FOREIGN_TABLE)
+ return false;
+
+ rel = relation_open(rte->relid, NoLock);
+
+ fdwroutine = GetFdwRoutineForRelation(rel, false);
+
+ relation_close(rel, NoLock);
+
+ /* Check if FDW supports late locking */
+ if (fdwroutine && fdwroutine->RefetchForeignRow == NULL)
+ return true;
+
+ return false;
+}
+
/*
* preprocess_limit - do pre-estimation for LIMIT and/or OFFSET clauses
*
--
2.39.3 (Apple Git-146)
On Mon, 2024-05-06 at 23:10 +0100, SAIKIRAN AVULA wrote:
I would like to bring to your attention an observation regarding the
planner's behavior for foreign table update/delete operations. It
appears that the planner adds rowmarks (ROW_MARK_COPY) for non-target
tables, which I believe is unnecessary when using the postgres-fdw.
This is because postgres-fdw performs early locking on tuples
belonging to the target foreign table by utilizing the SELECT FOR
UPDATE clause.
I agree with your reasoning here. If it reads the row with SELECT FOR
UPDATE, what's the purpose of row marks?
The cost of ROW_MARK_COPY is that it brings the whole tuple along
rather than a reference. I assume you are concerned about wide tables
involved in the join or is there another concern?
In an attempt to address this, I tried implementing late locking.
For others in the thread, see:
https://www.postgresql.org/docs/current/fdw-row-locking.html
However, this approach still doesn't work as intended because the API
assumes that foreign table rows can be re-fetched using TID (ctid).
This assumption is invalid for partitioned tables on the foreign
server.
It looks like it's a "Datum rowid", but is currently only allowed to be
a ctid, which can't identify the partition. I wonder how much work it
would be to fix this?
Additionally, the commit afb9249d06f47d7a6d4a89fea0c3625fe43c5a5d,
which introduced late locking for foreign tables, mentions that the
benefits of late locking against a remote server are unclear, as the
extra round trips required are likely to outweigh any potential
concurrency improvements.
The extra round trip only happens when EPQ finds a newer version of the
tuple, which should be the exceptional case. I'm not sure how this
balances out, but to me late locking still seems preferable. Early
locking is a huge performance hit in some cases (locking many more rows
than necessary).
Early locking is also a violation of the documentation here:
"When a locking clause appears at the top level of a SELECT query, the
rows that are locked are exactly those that are returned by the query;
in the case of a join query, the rows locked are those that contribute
to returned join rows."
https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE
To address this issue, I have taken the initiative to create a patch
that prevents the addition of rowmarks for non-target tables when the
target table is using early locking. I would greatly appreciate it if
you could review the patch and provide any feedback or insights I may
be overlooking.
A couple comments:
* You're using GetFdwRoutineForRelation() with makecopy=false, and then
closing the relation. If the rd_fdwroutine was already set previously,
then the returned pointer will point into the relcache, which may be
invalid after closing the relation. I'd probably pass makecopy=true and
then free it. (Weirdly if you pass makecopy=false, you may or may not
get a copy, so there's no way to know whether to free it or not.)
* Core postgres doesn't really choose early locking. If
RefetchForeignRow is not defined, then late locking is impossible, so
it assumes that early locking is happening. That assumption is true for
postgres_fdw, but might not be for other FDWs. What if an FDW doesn't
do early locking and also doesn't define RefetchForeignRow?
Regards,
Jeff Davis
On Wed, May 22, 2024 at 10:13 AM Jeff Davis <pgsql@j-davis.com> wrote:
On Mon, 2024-05-06 at 23:10 +0100, SAIKIRAN AVULA wrote:
Additionally, the commit afb9249d06f47d7a6d4a89fea0c3625fe43c5a5d,
which introduced late locking for foreign tables, mentions that the
benefits of late locking against a remote server are unclear, as the
extra round trips required are likely to outweigh any potential
concurrency improvements.The extra round trip only happens when EPQ finds a newer version of the
tuple, which should be the exceptional case. I'm not sure how this
balances out, but to me late locking still seems preferable. Early
locking is a huge performance hit in some cases (locking many more rows
than necessary).
I might be missing something, but I think the extra round trip happens
for each foreign row locked using the RefetchForeignRow() API in
ExecLockRows(), so I think the overhead is caused in the normal case.
Early locking is also a violation of the documentation here:
"When a locking clause appears at the top level of a SELECT query, the
rows that are locked are exactly those that are returned by the query;
in the case of a join query, the rows locked are those that contribute
to returned join rows."
Yeah, but I think this holds true for SELECT queries postgres_fdw
sends to the remote side. :)
Best regards,
Etsuro Fujita
On Fri, 2024-08-09 at 17:35 +0900, Etsuro Fujita wrote:
I might be missing something, but I think the extra round trip
happens
for each foreign row locked using the RefetchForeignRow() API in
ExecLockRows(), so I think the overhead is caused in the normal case.
Is there any sample code that implements late locking for a FDW? I'm
not quite clear on how it's supposed to work.
Regards,
Jeff Davis
On Thu, Aug 15, 2024 at 9:56 AM Jeff Davis <pgsql@j-davis.com> wrote:
Is there any sample code that implements late locking for a FDW? I'm
not quite clear on how it's supposed to work.
See the patch in [1]/messages/by-id/16016.1431455074@sss.pgh.pa.us. It would not apply to HEAD anymore, though.
Best regards,
Etsuro Fujita