Eliminating SPI from RI triggers - take 2

Started by Amit Langoteover 3 years ago28 messages

amitlangote09@gmail.com

over 3 years ago

2 attachment(s)

Hi,

I had proposed $subject for some RI trigger functions in the last dev
cycle [1]/messages/by-id/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com. Briefly, the proposal was to stop using an SQL query
(using the SPI interface) for RI checks that could be done by directly
scanning the primary/unique key index of the referenced table, which
must always be there. While acknowledging that the patch showed a
clear performance benefit, Tom gave the feedback that doing so only
for some RI checks but not others is not very desirable [2]/messages/by-id/3400437.1649363527@sss.pgh.pa.us.

The other cases include querying the referencing table when deleting
from the referenced table to handle the referential action clause.
Two main hurdles to not using an SQL query for those cases that I
hadn't addressed were:

1) What should the hard-coded plan be? Referencing table may not
always have an index on the queried foreign key columns. Even if
there is one, it's not clear if scanning it is *always* better than
scanning the whole table to find the matching rows.

2) While the RI check functions for RESTRICT and NO ACTION actions
issue a `SELECT ... LIMIT 1` query, those for CASCADE and SET actions
issue a `UPDATE SET / DELETE`. I had no good idea as to how much of
the executor functionality would need to be replicated in order to
perform the update/delete actions without leaving the ri_triggers.c
module.

We had an unconference session to discuss these concerns at this
year's PGCon, whose minutes can be found at [3]https://wiki.postgresql.org/wiki/PgCon_2022_Developer_Unconference#Removing_SPI_from_RI_trigger_implementation. Among other
suggestions, one was to only stop using the SPI interface to issue the
RI check/action queries, while continuing to use the same SQL queries
as now. That means creating a copy in ri_triggers.c of the
functionality of SPI_prepare(), which creates the CachedPlanSource for
the query, and of SPI_execute_plan(), which executes a CachedPlan
obtained from that CachedPlanSource to produce the result tuples if
any. That may not have the same performance boost as skipping the
planner/plancache and the executor altogether, but at least it becomes
easier to check the difference between semantic behaviors of an RI
query implemented as SQL and another implemented using some hard-coded
plan if we choose to do, because the logic would no longer be divided
between ri_trigger.c and spi.c. I think that will, at least to some
degree, alleviate the concerns that Tom expressed about the previous
effort.

So, I hacked together a patch (attached 0001) that invents an "RI
plan" construct (struct RIPlan) to replace the use of an "SPI plan"
(struct _SPI_plan). While the latter encapsulates the
CachedPlanSource of an RI query directly, I decided to make it an
option for a given RI trigger to specify whether it would like to have
its RIPlan store CachedPlanSource if its check is still implemented as
an SQL query or something else if the implementation will be a
hard-coded plan. RIPlan contains callbacks to create, execute,
validate, and free a plan that implements a given RI query. For
example, an RI plan for checks implemented as SQL will call the
callback ri_SqlStringPlanCreate() to parse the query and allocate a
CachedPlanSource and ri_SqlStringPlanExecute() to a CachedPlan and
executes it PlannedStmt using the executor interface directly.
Remaining callbacks ri_SqlStringPlanIsValid() and
ri_SqlStringPlanFree() use CachedPlanIsValid() and DropCachedPlan(),
respectively, to validate and free a CachedPlan.

With that in place, I decided to rebase my previous patch [1]/messages/by-id/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com to use
this new interface and the result is attached 0002. One notable
improvement over the previous standalone patch is that the snapshot
setting logic need no longer be in function implementing the proposed
hard-coded plan for RI check triggers. That logic and other
configuration needed before executing the plan is now a part of the
top-level ri_PerformCheck() function that is shared between various RI
plan implementations. So whether an RI check or action is implemented
using SQL plan or a hard-code plan, the execution should proceed with
the effectively same config/environment.

I will continue investigating what to do about points (1) and (2)
mentioned above and see if we can do away with using SQL in the
remaining cases.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/CA+HiwqGkfJfYdeq5vHPh6eqPKjSbfpDDY+j-kXYFePQedtSLeg@mail.gmail.com

[2]: /messages/by-id/3400437.1649363527@sss.pgh.pa.us

[3]: https://wiki.postgresql.org/wiki/PgCon_2022_Developer_Unconference#Removing_SPI_from_RI_trigger_implementation

Attachments:

v1-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v1-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 1baf6a646c3dfd743b049cfc961d35e85fd9063d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v1 1/2] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing the CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query using a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot() respectively.
ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.  ri_PlanExecute() will execute the "RI plan" by
calling a caller-specific callback function whose pointer is saved
within the "RI Plan" data structure (struct RIPlan).  For example,
the callback ri_SqlStringPlanExecute() will fetch a CachedPlan for
given CachedPlanSource found in the "RI plan" and execute its
PlannedStmt by invoking the executor, just as SPI_execute_snapshot()
would.  The details such as which snapshot to use are now fully
controlled by ri_PerformCheck, whereas the previous arrangement relied
on the SPI logic for snapshot management.

By making ri_PlanCreate() and ri_PlanExecute() and the "RI plan"
data structure pluggable, it will be possible for the future commits
to replace the current SQL string based implementation of some RI
checks with something as simple as a C function to directly scan
the underlying table/index.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 598 ++++++++++++++++++++++------
 2 files changed, 488 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..1d5d7d0383 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 51b3fdc9a0..73b51eea73 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,366 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query processed with ri_CreatePlan() or
+ * ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed;
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	/* We no longer need the cached plan refcount, if any */
+	if (cplan)
+		ReleaseCachedPlan(cplan, plan_owner);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2663,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2679,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2690,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2746,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2784,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2804,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3075,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3110,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3119,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3131,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

v1-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/octet-stream; name=v1-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From 98715db5c523bf17beb56f09c799084dcfde5a75 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v1 2/2] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case that was added by 00cb86e75d has been corrected.
---
 src/backend/executor/execPartition.c        | 160 ++++++-
 src/backend/executor/nodeLockRows.c         | 160 +++----
 src/backend/utils/adt/ri_triggers.c         | 440 +++++++++++++++-----
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 592 insertions(+), 192 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..813bd240a4 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1341,12 +1344,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1438,6 +1441,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForTuple
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..ab54a65e0e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 73b51eea73..8d2019a559 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,63 +428,17 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table. */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -533,48 +509,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +697,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +797,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +914,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1141,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1950,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1962,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2553,14 +2496,278 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the the
+ * constraint's index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			processed_tuples = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = "ri_LookupKeyInPkRel";
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap;
+
+		/*
+		 * HACK: the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.
+		 */
+		mysnap = GetLatestSnapshot();
+		PushActiveSnapshot(mysnap);
+
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+		/*
+		 * HACK: done fiddling with the partition descriptor machinery so
+		 * unset the active snapshot.
+		 */
+		PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			processed_tuples = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return processed_tuples;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2575,6 +2782,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2640,7 +2849,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2664,7 +2874,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3275,7 +3486,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3331,8 +3545,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..315015f1d1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

Jacob Champion

jchampion@timescale.com

over 3 years ago

In reply to: Amit Langote (#1)

Re: Eliminating SPI from RI triggers - take 2

On Thu, Jun 30, 2022 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote:

I will continue investigating what to do about points (1) and (2)
mentioned above and see if we can do away with using SQL in the
remaining cases.

Hi Amit, looks like isolation tests are failing in cfbot:

https://cirrus-ci.com/task/6642884727275520

Note also the uninitialized variable warning that cfbot picked up;
that may or may not be related.

Thanks,
--Jacob

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Jacob Champion (#2)

2 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Wed, Jul 6, 2022 at 3:24 AM Jacob Champion <jchampion@timescale.com> wrote:

On Thu, Jun 30, 2022 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote:

I will continue investigating what to do about points (1) and (2)
mentioned above and see if we can do away with using SQL in the
remaining cases.

Hi Amit, looks like isolation tests are failing in cfbot:

https://cirrus-ci.com/task/6642884727275520

Note also the uninitialized variable warning that cfbot picked up;
that may or may not be related.

Thanks for the heads up.

Yeah, I noticed the warning when I compiled with a different set of
gcc parameters, though not the isolation test failures, so not sure
what the bot is running into.

Attaching updated patches which fix the warning and a few other issues
I noticed.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/octet-stream; name=v2-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From e49a318fde13ef76e6dc3fc22ead8781bf9c5347 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v2 2/2] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
the latest snapshot for find_inheritance_children_extended() to
interpret any detach-pending partitions correctly for RI queries
running under REPEATABLE READ isolation.  With the previous SQL
string implementation of those RI queries, the latest snapshot set
for that hack would also get used, inadvertently, by the scan of the
user table.  With ri_LookupKeyInPkRel(), the snapshot needed for the
hack is now set only for the duration of the code stanza to retrieve
the partition descriptor and thus doesn't affect the PK index scan's
result.  The buggy output in src/test/isolation/expected/fk-snapshot.out
of the relevant test case that was added by 00cb86e75d has been
changed to the correct output.
---
 src/backend/executor/execPartition.c        | 160 ++++++-
 src/backend/executor/nodeLockRows.c         | 160 ++++---
 src/backend/utils/adt/ri_triggers.c         | 463 +++++++++++++++-----
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 614 insertions(+), 193 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..87f5b78d6f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1341,12 +1344,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1438,6 +1441,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForTuple
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Done using the partition directory. */
+		DestroyPartitionDirectory(partdir);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..ab54a65e0e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 46e26dae52..c16cb912a8 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,63 +428,17 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table. */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -533,48 +509,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +697,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +797,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +914,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1141,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1950,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1962,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2228,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2268,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2515,282 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the the
+ * constraint's index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = "ri_LookupKeyInPkRel";
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap;
+
+		/*
+		 * HACK: the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.
+		 */
+		mysnap = GetLatestSnapshot();
+		PushActiveSnapshot(mysnap);
+
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+		/*
+		 * HACK: done fiddling with the partition descriptor machinery so
+		 * unset the active snapshot.
+		 */
+		PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2805,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2872,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2897,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3509,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3568,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..315015f1d1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

v2-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v2-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From c61a9b8b4b909f42543b4cbbe26fd186033e4e19 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v2 1/2] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..1d5d7d0383 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 51b3fdc9a0..46e26dae52 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#3)

Re: Eliminating SPI from RI triggers - take 2

On Wed, Jul 6, 2022 at 11:55 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jul 6, 2022 at 3:24 AM Jacob Champion <jchampion@timescale.com> wrote:

On Thu, Jun 30, 2022 at 11:23 PM Amit Langote <amitlangote09@gmail.com> wrote:

I will continue investigating what to do about points (1) and (2)
mentioned above and see if we can do away with using SQL in the
remaining cases.

Hi Amit, looks like isolation tests are failing in cfbot:

https://cirrus-ci.com/task/6642884727275520

Note also the uninitialized variable warning that cfbot picked up;
that may or may not be related.

Thanks for the heads up.

Yeah, I noticed the warning when I compiled with a different set of
gcc parameters, though not the isolation test failures, so not sure
what the bot is running into.

Attaching updated patches which fix the warning and a few other issues
I noticed.

Hmm, cfbot is telling me that detach-partition-concurrently-2 is
failing on Cirrus-CI [1]https://cirrus-ci.com/task/5253369525698560?logs=test_world#L317. Will look into it.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]: https://cirrus-ci.com/task/5253369525698560?logs=test_world#L317

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Amit Langote (#1)

Re: Eliminating SPI from RI triggers - take 2

On Fri, Jul 1, 2022 at 2:23 AM Amit Langote <amitlangote09@gmail.com> wrote:

So, I hacked together a patch (attached 0001) that invents an "RI
plan" construct (struct RIPlan) to replace the use of an "SPI plan"
(struct _SPI_plan).

With that in place, I decided to rebase my previous patch [1] to use
this new interface and the result is attached 0002.

I think inventing something like RIPlan is probably reasonable, but
I'm not sure how much it really does to address the objections that
were raised previously. How do we know that ri_LookupKeyInPkRel does
all the same things that executing a plan would have done? I see that
function contains permission-checking logic, for example, as well as
snapshot-related logic, and maybe there are other subsystems to worry
about, like rules or triggers or row-level security. Maybe there's no
answer to that problem other than careful manual verification, because
after all the only way to be 100% certain we're doing all the things
that would happen if you executed a plan is to execute a plan, which
kind of defeats the point of the whole thing. All I'm saying is that
I'm not sure that this refactoring in and of itself addresses that
concern.

As far as 0002 goes, the part I'm most skeptical about is this:

+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+ /* Never store anything that can be invalidated. */
+ return true;
+}

Isn't that leaving rather a lot on the table? ri_LookupKeyInPkRel is
going to be called a lot of times and do a lot of things over and over
again that maybe only need to be done once, like checking permissions
and looking up the operators to use and reopening the index. And all
the stuff ExecGetLeafPartitionForKey does too, yikes that's a lot of
stuff. Now maybe that's what Tom wants, I don't know. Certainly, the
existing SQL-based implementation is going to do that stuff on every
call, too; I'm just not sure that's a good thing. I think there's some
debate to be had here over what behavior we need to preserve exactly
vs. what we can and should change. For instance, it seems clear to me
that leaving out permissions checks altogether would be not OK, but if
this implementation arranged to cache the results of a permission
check and the SQL-based implementations don't, is that OK? Maybe Tom
would argue that it isn't, because he considers that a part of the
user-visible behavior, but I'm not sure that's the right view of it. I
think what we're promising the user is that we will check permissions,
not that we're going to do it separately for every trigger firing, or
even that every kind of trigger is going to do it exactly the same
number of times as every other trigger. I think we need some input
from Tom (and perhaps others) on how rigidly we need to maintain the
high-level behavior here before we can really say much about whether
the implementation is as good as it can be.

I suspect, though, that there's more that can be done here in terms of
sharing code. For instance, picking on the permissions checking logic,
presumably that's something that every non-SQL implementation would
need to do. But the rest of what's in ri_LookupKeyInPkRel() is
specific to one particular kind of trigger. If we had multiple non-SQL
trigger types, we'd want to somehow have common logic for permissions
checking for all of them.

I also suspect that we ought to have a separation between planning and
execution even for non-SQL based things. You don't really have that
here. What that ought to look like, though, depends on the answers to
the questions above, about how exactly we think we need to reproduce
the existing behavior.

I find my ego slightly wounded by the comment that "the partition
descriptor machinery has a hack that assumes that the queries
originating in this module push the latest snapshot in the
transaction-snapshot mode." It's true that the partition descriptor
machinery gives different answers depending on the active snapshot,
but, err, is that a hack, or just a perfectly reasonable design
decision? An alternative might be for PartitionDirectoryLookup to take
a snapshot as an explicit argument rather than relying on the global
variable to get that information from context. I generally feel that
we rely too much on global variables where we should be passing around
explicit parameters, so if you're just arguing that explicit
parameters would be better here, then I agree and just didn't think of
it. If you're arguing that making the answer depend on the snapshot is
itself a bad idea, I don't agree with that.

--
Robert Haas
EDB: http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Robert Haas (#5)

Re: Eliminating SPI from RI triggers - take 2

Robert Haas <robertmhaas@gmail.com> writes:

... I think there's some
debate to be had here over what behavior we need to preserve exactly
vs. what we can and should change.

For sure. For example, people occasionally complain because
user-defined triggers can defeat RI integrity checks. Should we
change that? I dunno, but if we're not using the standard executor
then there's at least some room to consider it. I think people would
be upset if we stopped firing user triggers at all; but if triggers
couldn't defeat RI actions short of throwing a transaction-aborting
error, I believe a lot of people would consider that an improvement.

For instance, it seems clear to me
that leaving out permissions checks altogether would be not OK, but if
this implementation arranged to cache the results of a permission
check and the SQL-based implementations don't, is that OK? Maybe Tom
would argue that it isn't, because he considers that a part of the
user-visible behavior, but I'm not sure that's the right view of it.

Uh ... if such caching behavior is at all competently implemented,
it will be transparent because the cache will notice and respond to
events that should change its outputs. So I don't foresee a semantic
problem there. It may well be that it's practical to cache
permissions-check info for RI checks when it isn't for more general
queries, so looking into ideas like that seems well within scope here.
(Or then again, maybe we should be building a more general permissions
cache?)

I'm too tired to have more than that to say right now, but I agree
that there is room for discussion about exactly what behavior we
want to preserve.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Tom Lane (#6)

Re: Eliminating SPI from RI triggers - take 2

On Fri, Jul 8, 2022 at 10:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Uh ... if such caching behavior is at all competently implemented,
it will be transparent because the cache will notice and respond to
events that should change its outputs.

Well, that assumes that we emit appropriate invalidations in every
place where permissions are updated, and take appropriate locks every
place where they are checked. I think that the first one might be too
optimistic, and the second one is definitely too optimistic. For
instance, consider pg_proc_ownercheck. There's no lock of any kind
taken on the function here, and at least in typical cases, I don't
think the caller takes one either. Compare the extensive tap-dancing
around locking and permissions checking in RangeVarGetRelidExtended
against the blithe unconcern in FuncnameGetCandidates.

I believe that of all the types of SQL objects in the system, only
relations have anything like proper interlocking against concurrent
DDL. Other examples of not caring at all include LookupCollation() and
LookupTypeNameExtended(). There's just no heavyweight locking here at
all, and so no invalidation based on sinval messages can ever be
reliable.

GRANT and REVOKE don't take proper locks, either, even on tables:

rhaas=# begin;
BEGIN
rhaas=*# lock table pgbench_accounts;
LOCK TABLE
rhaas=*#

Then, in another session:

rhaas=# create role foo;
CREATE ROLE
rhaas=# grant select on pgbench_accounts to foo;
GRANT
rhaas=#

Executing "SELECT * FROM pgbench_accounts" in the other session would
have blocked, but the GRANT has no problem at all.

I don't see that any of this is this patch's job to fix. If nobody's
cared enough to fix it any time in the past 20 years, or just didn't
want to pay the locking cost, well then we probably don't need to do
it now either. But I think it means that even the slightest change in
the timing or frequency of permissions checks is in theory a
user-visible change, because there are no grounds for assuming that
the permissions on any of the objects involved aren't changing while
the query is executing.

--
Robert Haas
EDB: http://www.enterprisedb.com

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Robert Haas (#5)

Re: Eliminating SPI from RI triggers - take 2

On Sat, Jul 9, 2022 at 1:15 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 1, 2022 at 2:23 AM Amit Langote <amitlangote09@gmail.com> wrote:

So, I hacked together a patch (attached 0001) that invents an "RI
plan" construct (struct RIPlan) to replace the use of an "SPI plan"
(struct _SPI_plan).

With that in place, I decided to rebase my previous patch [1] to use
this new interface and the result is attached 0002.

Thanks for taking a look at this. I'll try to respond to other points
in a separate email, but I wanted to clarify something about below:

I find my ego slightly wounded by the comment that "the partition
descriptor machinery has a hack that assumes that the queries
originating in this module push the latest snapshot in the
transaction-snapshot mode." It's true that the partition descriptor
machinery gives different answers depending on the active snapshot,
but, err, is that a hack, or just a perfectly reasonable design
decision?

I think my calling it a hack of "partition descriptor machinery" is
not entirely fair (sorry), because it's talking about the following
comment in find_inheritance_children_extended(), which describes it as
being a hack, so I mentioned the word "hack" in my comment too:

/*
* Cope with partitions concurrently being detached. When we see a
* partition marked "detach pending", we omit it from the returned set
* of visible partitions if caller requested that and the tuple's xmin
* does not appear in progress to the active snapshot. (If there's no
* active snapshot set, that means we're not running a user query, so
* it's OK to always include detached partitions in that case; if the
* xmin is still running to the active snapshot, then the partition
* has not been detached yet and so we include it.)
*
* The reason for this hack is that we want to avoid seeing the
* partition as alive in RI queries during REPEATABLE READ or
* SERIALIZABLE transactions: such queries use a different snapshot
* than the one used by regular (user) queries.
*/

That bit came in to make DETACH CONCURRENTLY produce sane answers for
RI queries in some cases.

I guess my comment should really have said something like:

HACK: find_inheritance_children_extended() has a hack that assumes
that the queries originating in this module push the latest snapshot
in transaction-snapshot mode.

An alternative might be for PartitionDirectoryLookup to take
a snapshot as an explicit argument rather than relying on the global
variable to get that information from context. I generally feel that
we rely too much on global variables where we should be passing around
explicit parameters, so if you're just arguing that explicit
parameters would be better here, then I agree and just didn't think of
it. If you're arguing that making the answer depend on the snapshot is
itself a bad idea, I don't agree with that.

No, I'm not arguing that using a snapshot there is wrong and haven't
really thought hard about an alternative.

I tend to agree passing a snapshot explicitly might be better than
using ActiveSnapshot stuff for this.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#8)

2 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Wed, Jul 13, 2022 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Jul 9, 2022 at 1:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
Thanks for taking a look at this. I'll try to respond to other points
in a separate email, but I wanted to clarify something about below:

I find my ego slightly wounded by the comment that "the partition
descriptor machinery has a hack that assumes that the queries
originating in this module push the latest snapshot in the
transaction-snapshot mode." It's true that the partition descriptor
machinery gives different answers depending on the active snapshot,
but, err, is that a hack, or just a perfectly reasonable design
decision?

I think my calling it a hack of "partition descriptor machinery" is
not entirely fair (sorry), because it's talking about the following
comment in find_inheritance_children_extended(), which describes it as
being a hack, so I mentioned the word "hack" in my comment too:

/*
* Cope with partitions concurrently being detached. When we see a
* partition marked "detach pending", we omit it from the returned set
* of visible partitions if caller requested that and the tuple's xmin
* does not appear in progress to the active snapshot. (If there's no
* active snapshot set, that means we're not running a user query, so
* it's OK to always include detached partitions in that case; if the
* xmin is still running to the active snapshot, then the partition
* has not been detached yet and so we include it.)
*
* The reason for this hack is that we want to avoid seeing the
* partition as alive in RI queries during REPEATABLE READ or
* SERIALIZABLE transactions: such queries use a different snapshot
* than the one used by regular (user) queries.
*/

That bit came in to make DETACH CONCURRENTLY produce sane answers for
RI queries in some cases.

I guess my comment should really have said something like:

HACK: find_inheritance_children_extended() has a hack that assumes
that the queries originating in this module push the latest snapshot
in transaction-snapshot mode.

Posting a new version with this bit fixed; cfbot complained that 0002
needed a rebase over 3592e0ff98.

I will try to come up with a patch to enhance the PartitionDirectory
interface to allow passing the snapshot to use when scanning
pg_inherits explicitly, so we won't need the above "hack".

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v3-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/octet-stream; name=v3-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From a42c95360d7aeacd652c33d93f3c4d5a2789acfb Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v3 2/2] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
the latest snapshot for find_inheritance_children_extended() to
interpret any detach-pending partitions correctly for RI queries
running under REPEATABLE READ isolation.  With the previous SQL
string implementation of those RI queries, the latest snapshot set
for that hack would also get used, inadvertently, by the scan of the
user table.  With ri_LookupKeyInPkRel(), the snapshot needed for the
hack is now set only for the duration of the code stanza to retrieve
the partition descriptor and thus doesn't affect the PK index scan's
result.  The buggy output in src/test/isolation/expected/fk-snapshot.out
of the relevant test case that was added by 00cb86e75d has been
changed to the correct output.
---
 src/backend/executor/execPartition.c        | 160 ++++++-
 src/backend/executor/nodeLockRows.c         | 160 ++++---
 src/backend/utils/adt/ri_triggers.c         | 464 +++++++++++++++-----
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 615 insertions(+), 193 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index eb49106102..a68d5d7eb3 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1380,12 +1383,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset = -1;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/*
@@ -1592,6 +1595,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForTuple
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Done using the partition directory. */
+		DestroyPartitionDirectory(partdir);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a74813c7aa..352cacd70b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 46e26dae52..401b17e283 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,63 +428,17 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table. */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -533,48 +509,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +697,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +797,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +914,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1141,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1950,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1962,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2228,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2268,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2515,283 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the the
+ * constraint's index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = "ri_LookupKeyInPkRel";
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap;
+
+		/*
+		 * HACK: find_inheritance_children_extended() that might get called
+		 * as part of the following function has a hack that assumes that the
+		 * queries originating in this module push the latest snapshot in
+		 * transaction-snapshot mode.
+		 */
+		mysnap = GetLatestSnapshot();
+		PushActiveSnapshot(mysnap);
+
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+		/*
+		 * HACK: done fiddling with the partition descriptor machinery so
+		 * unset the active snapshot.
+		 */
+		PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2806,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2873,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2898,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3510,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3569,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..315015f1d1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

v3-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v3-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From cb5a9ceba00d9ba08af549e09817b7614a53901d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v3 1/2] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..1d5d7d0383 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 51b3fdc9a0..46e26dae52 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

#10

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#9)

4 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Thu, Aug 4, 2022 at 1:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jul 13, 2022 at 8:59 PM Amit Langote <amitlangote09@gmail.com> wrote:

That bit came in to make DETACH CONCURRENTLY produce sane answers for
RI queries in some cases.

I guess my comment should really have said something like:

HACK: find_inheritance_children_extended() has a hack that assumes
that the queries originating in this module push the latest snapshot
in transaction-snapshot mode.

Posting a new version with this bit fixed; cfbot complained that 0002
needed a rebase over 3592e0ff98.

I will try to come up with a patch to enhance the PartitionDirectory
interface to allow passing the snapshot to use when scanning
pg_inherits explicitly, so we won't need the above "hack".

Sorry about the delay.

So I came up with such a patch that is attached as 0003.

The main problem I want to fix with it is the need for RI_FKey_check()
to "force"-push the latest snapshot that the PartitionDesc code wants
to use to correctly include or omit a detach-pending partition from
the view of that function's RI query. Scribbling on ActiveSnapshot
that way means that *all* scans involved in the execution of that
query now see a snapshot that they shouldn't likely be seeing; a bug
resulting from this has been demonstrated in a test case added by the
commit 00cb86e75d.

The fix is to make RI_FKey_check(), or really its RI_Plan's execution
function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
snapshot explicitly as a parameter of PartitionDirectoryLookup(),
which passes it down to the PartitionDesc code. No need to manipulate
ActiveSnapshot. The actual fix is in patch 0004, which I extracted
out of 0002 to keep the latter a mere refactoring patch without any
semantic changes (though a bit more on that below). BTW, I don't know
of a way to back-patch a fix like this for the bug, because there is
no way other than ActiveSnapshot to pass the desired snapshot to the
PartitionDesc code if the only way we get to that code is by executing
an SQL query plan.

0003 moves the relevant logic out of
find_inheritance_children_extended() into its callers. The logic of
deciding which snapshot to use to determine if a detach-pending
partition should indeed be omitted from the consideration of a caller
based on the result of checking the visibility of the corresponding
pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
Given the problems with using ActiveSnapshot mentioned above, I think
it is better to make the callers decide the snapshot and pass it using
a parameter named omit_detached_snapshot. Only PartitionDesc code
actually cares about sending anything but the parent query's
ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
has been changed to add the same omit_detached_snapshot parameter.
find_inheritance_children(), the other caller used in many sites that
look at a table's partitions, defaults to using ActiveSnapshot, which
does not seem problematic. Furthermore, only RI_FKey_check() needs to
pass anything other than ActiveSnapshot, so other users of
PartitionDesc, like user queries, still default to using the
ActiveSnapshot, which doesn't have any known problems either.

0001 and 0002 are mostly unchanged in this version, except I took out
the visibility bug-fix from 0002 into 0004 described above, which
looks better using the interface added by 0003 anyway. I need to
address the main concern that it's still hard to be sure that the
patch in its current form doesn't break any user-level semantics of
these RI check triggers and other concerns about the implementation
that Robert expressed in [1]/messages/by-id/CA+TgmoaiTNj4DgQy42OT9JmTTP1NWcMV+ke0i=+a7=VgnzqGXw@mail.gmail.com.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/CA+TgmoaiTNj4DgQy42OT9JmTTP1NWcMV+ke0i=+a7=VgnzqGXw@mail.gmail.com

Attachments:

v4-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/octet-stream; name=v4-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From 049af6bb165d9624004395c6b32ae0ff49314993 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v4 2/4] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.
---
 src/backend/executor/execPartition.c | 167 +++++++++-
 src/backend/executor/nodeLockRows.c  | 160 +++++-----
 src/backend/utils/adt/ri_triggers.c  | 447 +++++++++++++++++++++------
 src/include/executor/execPartition.h |   6 +
 src/include/executor/executor.h      |   9 +
 5 files changed, 610 insertions(+), 179 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..764f2b9f8a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1379,12 +1382,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset = -1;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/*
@@ -1591,6 +1594,158 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForKey
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified primary key tuple containing a subset of the
+ *		table's columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Any intermediate parent tables encountered on the way to finding the leaf
+ * partition are locked using 'lockmode' when opening.
+ *
+ * Returns NULL if no leaf partition is found for the key.
+ *
+ * This also finds the index in thus found leaf partition that is recorded as
+ * descending from 'root_idxoid' and returns it in '*leaf_idxoid'.
+ *
+ * Caller must close the returned relation, if any.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
+										  partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Done using the partition directory. */
+		DestroyPartitionDirectory(partdir);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a74813c7aa..352cacd70b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cfebd9c4f2..174e9746ff 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,49 +428,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -533,48 +515,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +703,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +803,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +920,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1147,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1956,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1968,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2234,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2274,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2521,276 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the constraint's
+ * index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = "ri_LookupKeyInPkRel";
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		/*
+		 * Note that this relies on the latest snapshot having been pushed by
+		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * that runs as part of this will need to use the snapshot to determine
+		 * whether to omit or include any detach-pending partition based on the
+		 * whether the pg_inherits row that marks it as detach-pending is
+		 * is visible to it or not, respectively.
+		 */
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2805,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2872,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2897,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3509,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3568,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..2f415b80ce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern void ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.35.3

v4-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchapplication/octet-stream; name=v4-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchDownload

From 0d60e4ef72fc2160c91eba41a3135d59412f511a Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 28 Sep 2022 16:37:55 +0900
Subject: [PATCH v4 4/4] Teach ri_LookupKeyInPkRel() to pass
 omit_detached_snapshot

Now that the RI triggers that need to look up PK rows in a
partitioned table can manipulate partitions directly through
ExecGetLeafPartitionForKey(), the snapshot being passed to omit or
include detach-pending partitions can also now be passed explicitly,
rather than using ActiveSnapshot for that purpose.

For the detach-pending partitions to be correctly omitted or included
from the consideration of PK row lookup, the PartitionDesc machinery
needs to see the latest snapshot.  Pushing the latest snapshot to be
the ActiveSnapshot as is done presently meant that even the scans that
should NOT be using the latest snapshot also end up using one to
time-qualify table/partition rows.  That led to incorrect results of
PK lookups over partitioned tables running under REPEATABLE READ
isolation; 00cb86e75d added a test that demonstrates this bug.

To fix, do not force-push the latest snapshot in the cases of PK
lookup over partitioned tables (as was being done by passing
detectNewRows=true to ri_PerformCheck()), but rather make
ri_LookupKeyInPkRel() pass the latest snapshot directly to
PartitionDirectoryLookup() through its new omit_detached_snapshot
parameter.

The buggy output in src/test/isolation/expected/fk-snapshot.out
of the relevant test case that was added by 00cb86e75d has been
changed to the correct output.
---
 src/backend/executor/execPartition.c        | 12 +++++++++++-
 src/backend/utils/adt/ri_triggers.c         | 16 ++++++----------
 src/include/executor/execPartition.h        |  1 +
 src/test/isolation/expected/fk-snapshot.out |  4 ++--
 src/test/isolation/specs/fk-snapshot.spec   |  5 +----
 5 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c90f07c433..65cd365a8b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1607,6 +1607,14 @@ get_partition_for_tuple(PartitionKey key,
  *
  * Any intermediate parent tables encountered on the way to finding the leaf
  * partition are locked using 'lockmode' when opening.
+ *
+ * In 'omit_detached_snapshot' a caller can specify the snapshot to pass to
+ * PartitionDirectoryLookup() that in turn passes it down to the code that
+ * scans the pg_inherits catalog when building the partition descriptor from
+ * scratch.  Any detach-pending partitions are omitted from the considerations
+ * of this function if the DETACH operation appears committed to *this*
+ * snapshot.
+
  *
  * Returns NULL if no leaf partition is found for the key.
  *
@@ -1624,6 +1632,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 						   const AttrNumber *key_attnums,
 						   Datum *key_vals, char *key_nulls,
 						   Oid root_idxoid, int lockmode,
+						   Snapshot omit_detached_snapshot,
 						   Oid *leaf_idxoid)
 {
 	Relation	rel = root_rel;
@@ -1709,7 +1718,8 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
+		partdesc = PartitionDirectoryLookup(partdir, rel,
+											omit_detached_snapshot);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 174e9746ff..49b716a529 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -434,17 +434,11 @@ RI_FKey_check(TriggerData *trigdata)
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -2679,8 +2673,9 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 		Oid		leaf_idxoid;
 
 		/*
-		 * Note that this relies on the latest snapshot having been pushed by
-		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * Pass the latest snapshot for omit_detached_snapshot so that any
+		 * detach-pending partitions are correctly omitted or included from
+		 * the considerations of this lookup.  The PartitionDesc machinery
 		 * that runs as part of this will need to use the snapshot to determine
 		 * whether to omit or include any detach-pending partition based on the
 		 * whether the pg_inherits row that marks it as detach-pending is
@@ -2690,6 +2685,7 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 												 riinfo->pk_attnums,
 												 pk_vals, pk_nulls,
 												 idxoid, RowShareLock,
+												 GetLatestSnapshot(),
 												 &leaf_idxoid);
 
 		/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index cbe1d996e6..18c6b676f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -36,6 +36,7 @@ extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
 										   const AttrNumber *key_attnums,
 										   Datum *key_vals, char *key_nulls,
 										   Oid root_idxoid, int lockmode,
+										   Snapshot omit_detached_snapshot,
 										   Oid *leaf_idxoid);
 
 
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

v4-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchapplication/octet-stream; name=v4-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchDownload

From 272ac2988705a96d0b198adae887797de37bf38b Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 15 Sep 2022 16:45:44 +0900
Subject: [PATCH v4 3/4] Make omit_detached logic independent of ActiveSnapshot

In find_inheritance_children_extended() and elsewhere, we use
ActiveSnapshot to determine if a detach-pending partition should
be considered detached or not based on checking if the xmin of
such a partition's pg_inherits row appears committed to that
snapshot or not.

This logic really came in to make the RI queries over partitioned
PK tables running under REPEATABLE READ isolation level work
correctly by appropriately omitting or including the detach-pending
partition from the plan, based on the visibility of the pg_inherits
row of that partition to the latest snapshot.  To that end,
RI_FKey_check()  was made to force-push the latest snapshot to get
that desired behavior.  However, pushing a snapshot this way makes
the results of other scans that use ActiveSnapshot violate the
isolation of the parent transaction; 00cb86e75d added a test that
demonstrates this bug.

So, this commit changes the PartitionDesc interface to allow the
desired snapshot to be passed explicitly as a parameter, rather than
having to scribble on ActiveSnapshot to pass it.  A later commit will
change ExecGetLeafPartitionForKey() used by RI PK row lookups to use
this new interface.

Note that the default behavior in the absence of any explicitly
specified snapshot is still to use the ActiveSnapshot, so there is
no behavior change from this to non-RI queries and sites that call
find_inheritance_children() for purposes other than querying a
partitioned table.
---
 src/backend/catalog/pg_inherits.c    |  31 +++++----
 src/backend/executor/execPartition.c |   7 +-
 src/backend/optimizer/util/inherit.c |   2 +-
 src/backend/optimizer/util/plancat.c |   2 +-
 src/backend/partitioning/partdesc.c  | 100 +++++++++++++++++++--------
 src/include/catalog/pg_inherits.h    |   5 +-
 src/include/partitioning/partdesc.h  |   4 +-
 7 files changed, 100 insertions(+), 51 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 92afbc2f25..f810e5de0d 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,14 +52,18 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * Partitions marked as being detached are omitted; see
+ * A partition marked as being detached is omitted from the result if the
+ * pg_inherits row showing the partition as being detached is visible to
+ * ActiveSnapshot, doing so only when one has been pushed; see
  * find_inheritance_children_extended for details.
  */
 List *
 find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
 {
-	return find_inheritance_children_extended(parentrelId, true, lockmode,
-											  NULL, NULL);
+	return find_inheritance_children_extended(parentrelId, true,
+											  ActiveSnapshotSet() ?
+											  GetActiveSnapshot() : NULL,
+											  lockmode, NULL, NULL);
 }
 
 /*
@@ -71,16 +75,17 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
- * If omit_detached is true and there is an active snapshot (not the same as
- * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
- * marked "detach pending" is visible to that snapshot, then that partition is
- * omitted from the output list.  This makes partitions invisible depending on
- * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.  In addition, *detached_xmin (if not null)
- * is set to the xmin of the row of the detached partition.
+ * If omit_detached is true and the caller passed 'omit_detached_snapshot',
+ * the partition whose pg_inherits tuple marks it as "detach pending" is
+ * omitted from the output list if the tuple is visible to that snapshot.
+ * That is, such a partition is omitted from the output list depending on
+ * whether the transaction that marked that partition as detached appears
+ * committed to omit_detached_snapshot.  If omitted, *detached_xmin (if non
+ * NULL) is set to the xmin of that pg_inherits tuple.
  */
 List *
 find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+								   Snapshot omit_detached_snapshot,
 								   LOCKMODE lockmode, bool *detached_exist,
 								   TransactionId *detached_xmin)
 {
@@ -141,15 +146,13 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 			if (detached_exist)
 				*detached_exist = true;
 
-			if (omit_detached && ActiveSnapshotSet())
+			if (omit_detached && omit_detached_snapshot)
 			{
 				TransactionId xmin;
-				Snapshot	snap;
 
 				xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
-				snap = GetActiveSnapshot();
 
-				if (!XidInMVCCSnapshot(xmin, snap))
+				if (!XidInMVCCSnapshot(xmin, omit_detached_snapshot))
 				{
 					if (detached_xmin)
 					{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 764f2b9f8a..c90f07c433 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1121,7 +1121,8 @@ ExecInitPartitionDispatchInfo(EState *estate,
 		rel = table_open(partoid, RowExclusiveLock);
 	else
 		rel = proute->partition_root;
-	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel);
+	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel,
+										NULL);
 
 	pd = (PartitionDispatch) palloc(offsetof(PartitionDispatchData, indexes) +
 									partdesc->nparts * sizeof(int));
@@ -1708,7 +1709,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel);
+		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
@@ -2085,7 +2086,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
 			partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
 			partkey = RelationGetPartitionKey(partrel);
 			partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
-												partrel);
+												partrel, NULL);
 
 			/*
 			 * Initialize the subplan_map and subpart_map.
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index cf7691a474..cc4d27ece8 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -317,7 +317,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 	Assert(parentrte->inh);
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										parentrel);
+										parentrel, NULL);
 
 	/* A partitioned table should always have a partition descriptor. */
 	Assert(partdesc);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6d5718ee4c..9c6bc5c4a5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2221,7 +2221,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										relation);
+										relation, NULL);
 	rel->part_scheme = find_partition_scheme(root, relation);
 	Assert(partdesc != NULL && rel->part_scheme != NULL);
 	rel->boundinfo = partdesc->boundinfo;
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 737f0edd89..863b04c17d 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -48,17 +48,24 @@ typedef struct PartitionDirectoryEntry
 } PartitionDirectoryEntry;
 
 static PartitionDesc RelationBuildPartitionDesc(Relation rel,
-												bool omit_detached);
+												bool omit_detached,
+												Snapshot omit_detached_snapshot);
 
 
 /*
- * RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
+ * RelationGetPartitionDescExt
+ * 		Get partition descriptor of a partitioned table, building one and
+ * 		caching it for later use if not already or if the cached one would
+ * 		not be suitable for a given request
  *
  * We keep two partdescs in relcache: rd_partdesc includes all partitions
- * (even those being concurrently marked detached), while rd_partdesc_nodetach
- * omits (some of) those.  We store the pg_inherits.xmin value for the latter,
- * to determine whether it can be validly reused in each case, since that
- * depends on the active snapshot.
+ * (even the one being concurrently marked detached), while
+ * rd_partdesc_nodetach omits the detach-pending partition.  If the latter one
+ * is present, rd_partdesc_nodetach_xmin would have been set to the xmin of
+ * the detach-pending partition's pg_inherits row, which is used to determine
+ * whether rd_partdesc_nodetach can be validly reused for a given request by
+ * checking if the xmin appears visible to 'omit_detached_snapshot' passed by
+ * the caller.
  *
  * Note: we arrange for partition descriptors to not get freed until the
  * relcache entry's refcount goes to zero (see hacks in RelationClose,
@@ -69,7 +76,8 @@ static PartitionDesc RelationBuildPartitionDesc(Relation rel,
  * that the data doesn't become stale.
  */
 PartitionDesc
-RelationGetPartitionDesc(Relation rel, bool omit_detached)
+RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+							Snapshot omit_detached_snapshot)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
@@ -78,36 +86,52 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
 	 * do so when we are asked to include all partitions including detached;
 	 * and also when we know that there are no detached partitions.
 	 *
-	 * If there is no active snapshot, detached partitions aren't omitted
-	 * either, so we can use the cached descriptor too in that case.
+	 * omit_detached_snapshot being NULL means that the caller doesn't care
+	 * that the returned partition descriptor may contain detached partitions,
+	 * so we we can used the cached descriptor in that case too.
 	 */
 	if (likely(rel->rd_partdesc &&
 			   (!rel->rd_partdesc->detached_exist || !omit_detached ||
-				!ActiveSnapshotSet())))
+				omit_detached_snapshot == NULL)))
 		return rel->rd_partdesc;
 
 	/*
-	 * If we're asked to omit detached partitions, we may be able to use a
-	 * cached descriptor too.  We determine that based on the pg_inherits.xmin
-	 * that was saved alongside that descriptor: if the xmin that was not in
-	 * progress for that active snapshot is also not in progress for the
-	 * current active snapshot, then we can use it.  Otherwise build one from
-	 * scratch.
+	 * If we're asked to omit the detached partition, we may be able to use
+	 * the other cached descriptor, which has been made to omit the detached
+	 * partition.  Whether that descriptor can be reused in this case is
+	 * determined based on cross-checking the visibility of
+	 * rd_partdesc_nodetached_xmin, that is, the pg_inherits.xmin of the
+	 * pg_inherits row of the detached partition: if the xmin seems in-progress
+	 * to both the given omit_detached_snapshot and to the snapshot that would
+	 * have been passed when rd_partdesc_nodetached was built, then we can
+	 * reuse it.  Otherwise we must build one from scratch.
 	 */
 	if (omit_detached &&
 		rel->rd_partdesc_nodetached &&
-		ActiveSnapshotSet())
+		omit_detached_snapshot)
 	{
-		Snapshot	activesnap;
-
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
-		activesnap = GetActiveSnapshot();
 
-		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, activesnap))
+		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin,
+							   omit_detached_snapshot))
 			return rel->rd_partdesc_nodetached;
 	}
 
-	return RelationBuildPartitionDesc(rel, omit_detached);
+	return RelationBuildPartitionDesc(rel, omit_detached,
+									  omit_detached_snapshot);
+}
+
+/*
+ * RelationGetPartitionDesc
+ *		Like RelationGetPartitionDescExt() but for callers that are fine with
+ *		ActiveSnapshot being used as omit_detached_snapshot
+ */
+PartitionDesc
+RelationGetPartitionDesc(Relation rel, bool omit_detached)
+{
+	return RelationGetPartitionDescExt(rel, omit_detached,
+									   ActiveSnapshotSet() ?
+									   GetActiveSnapshot() : NULL);
 }
 
 /*
@@ -132,7 +156,8 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
  * for them.
  */
 static PartitionDesc
-RelationBuildPartitionDesc(Relation rel, bool omit_detached)
+RelationBuildPartitionDesc(Relation rel, bool omit_detached,
+						   Snapshot omit_detached_snapshot)
 {
 	PartitionDesc partdesc;
 	PartitionBoundInfo boundinfo = NULL;
@@ -160,7 +185,9 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	detached_exist = false;
 	detached_xmin = InvalidTransactionId;
 	inhoids = find_inheritance_children_extended(RelationGetRelid(rel),
-												 omit_detached, NoLock,
+												 omit_detached,
+												 omit_detached_snapshot,
+												 NoLock,
 												 &detached_exist,
 												 &detached_xmin);
 
@@ -322,11 +349,11 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	 *
 	 * Note that if a partition was found by the catalog's scan to have been
 	 * detached, but the pg_inherit tuple saying so was not visible to the
-	 * active snapshot (find_inheritance_children_extended will not have set
-	 * detached_xmin in that case), we consider there to be no "omittable"
-	 * detached partitions.
+	 * omit_detached_snapshot (find_inheritance_children_extended() will not
+	 * have set detached_xmin in that case), we consider there to be no
+	 * "omittable" detached partitions.
 	 */
-	is_omit = omit_detached && detached_exist && ActiveSnapshotSet() &&
+	is_omit = omit_detached && detached_exist && omit_detached_snapshot &&
 		TransactionIdIsValid(detached_xmin);
 
 	/*
@@ -411,9 +438,18 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
  * different views of the catalog state, but any single particular OID
  * will always get the same PartitionDesc for as long as the same
  * PartitionDirectory is used.
+ *
+ * Callers can specify a snapshot to cross-check the visibility of the
+ * pg_inherits row marking a given partition being detached.  Depending on the
+ * result of that visibility check, such a partition is either included in
+ * the returned PartitionDesc, considering it not yet detached, or omitted
+ * from it, considering it detached.
+ * XXX - currently unused, because we don't have any callers of this that
+ * would like to pass a snapshot that is not ActiveSnapshot.
  */
 PartitionDesc
-PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
+PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
+						 Snapshot omit_detached_snapshot)
 {
 	PartitionDirectoryEntry *pde;
 	Oid			relid = RelationGetRelid(rel);
@@ -428,7 +464,11 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		pde->pd = RelationGetPartitionDesc(rel, pdir->omit_detached);
+		Assert(omit_detached_snapshot == NULL);
+		if (pdir->omit_detached && ActiveSnapshotSet())
+			omit_detached_snapshot = GetActiveSnapshot();
+		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
+											  omit_detached_snapshot);
 		Assert(pde->pd != NULL);
 	}
 	return pde->pd;
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 9221c2ea57..67f148f2bf 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -23,6 +23,7 @@
 
 #include "nodes/pg_list.h"
 #include "storage/lock.h"
+#include "utils/snapshot.h"
 
 /* ----------------
  *		pg_inherits definition.  cpp turns this into
@@ -50,7 +51,9 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, InheritsParentIndexId, on pg_inher
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
 extern List *find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
-												LOCKMODE lockmode, bool *detached_exist, TransactionId *detached_xmin);
+												Snapshot omit_detached_snapshot,
+												LOCKMODE lockmode, bool *detached_exist,
+												TransactionId *detached_xmin);
 
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 								 List **numparents);
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index 7e979433b6..f42d137fc1 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -65,9 +65,11 @@ typedef struct PartitionDescData
 
 
 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+												 Snapshot omit_detached_snapshot);
 
 extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);
-extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
+extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation, Snapshot);
 extern void DestroyPartitionDirectory(PartitionDirectory pdir);
 
 extern Oid	get_default_oid_from_partdesc(PartitionDesc partdesc);
-- 
2.35.3

v4-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v4-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 62d53b827d10de3cfea43187c0dd645dc73bad1d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v4 1/4] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..1d5d7d0383 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 1d503e7e01..cfebd9c4f2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

#11

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#10)

4 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Thu, Sep 29, 2022 at 1:46 PM Amit Langote <amitlangote09@gmail.com> wrote:

Sorry about the delay.

So I came up with such a patch that is attached as 0003.

The main problem I want to fix with it is the need for RI_FKey_check()
to "force"-push the latest snapshot that the PartitionDesc code wants
to use to correctly include or omit a detach-pending partition from
the view of that function's RI query. Scribbling on ActiveSnapshot
that way means that *all* scans involved in the execution of that
query now see a snapshot that they shouldn't likely be seeing; a bug
resulting from this has been demonstrated in a test case added by the
commit 00cb86e75d.

The fix is to make RI_FKey_check(), or really its RI_Plan's execution
function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
snapshot explicitly as a parameter of PartitionDirectoryLookup(),
which passes it down to the PartitionDesc code. No need to manipulate
ActiveSnapshot. The actual fix is in patch 0004, which I extracted
out of 0002 to keep the latter a mere refactoring patch without any
semantic changes (though a bit more on that below). BTW, I don't know
of a way to back-patch a fix like this for the bug, because there is
no way other than ActiveSnapshot to pass the desired snapshot to the
PartitionDesc code if the only way we get to that code is by executing
an SQL query plan.

0003 moves the relevant logic out of
find_inheritance_children_extended() into its callers. The logic of
deciding which snapshot to use to determine if a detach-pending
partition should indeed be omitted from the consideration of a caller
based on the result of checking the visibility of the corresponding
pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
Given the problems with using ActiveSnapshot mentioned above, I think
it is better to make the callers decide the snapshot and pass it using
a parameter named omit_detached_snapshot. Only PartitionDesc code
actually cares about sending anything but the parent query's
ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
has been changed to add the same omit_detached_snapshot parameter.
find_inheritance_children(), the other caller used in many sites that
look at a table's partitions, defaults to using ActiveSnapshot, which
does not seem problematic. Furthermore, only RI_FKey_check() needs to
pass anything other than ActiveSnapshot, so other users of
PartitionDesc, like user queries, still default to using the
ActiveSnapshot, which doesn't have any known problems either.

0001 and 0002 are mostly unchanged in this version, except I took out
the visibility bug-fix from 0002 into 0004 described above, which
looks better using the interface added by 0003 anyway. I need to
address the main concern that it's still hard to be sure that the
patch in its current form doesn't break any user-level semantics of
these RI check triggers and other concerns about the implementation
that Robert expressed in [1].

Oops, I apparently posted the wrong 0004, containing a bug that
crashes `make check`.

Fixed version attached.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v5-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/octet-stream; name=v5-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From 0fd35f8f55f8d8e7f382523d3588900076dc1b08 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v5 2/4] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.
---
 src/backend/executor/execPartition.c | 167 +++++++++-
 src/backend/executor/nodeLockRows.c  | 160 +++++-----
 src/backend/utils/adt/ri_triggers.c  | 447 +++++++++++++++++++++------
 src/include/executor/execPartition.h |   6 +
 src/include/executor/executor.h      |   9 +
 5 files changed, 610 insertions(+), 179 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..764f2b9f8a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1379,12 +1382,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset = -1;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/*
@@ -1591,6 +1594,158 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForKey
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified primary key tuple containing a subset of the
+ *		table's columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Any intermediate parent tables encountered on the way to finding the leaf
+ * partition are locked using 'lockmode' when opening.
+ *
+ * Returns NULL if no leaf partition is found for the key.
+ *
+ * This also finds the index in thus found leaf partition that is recorded as
+ * descending from 'root_idxoid' and returns it in '*leaf_idxoid'.
+ *
+ * Caller must close the returned relation, if any.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
+										  partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Done using the partition directory. */
+		DestroyPartitionDirectory(partdir);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a74813c7aa..352cacd70b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cfebd9c4f2..9894bc4951 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,49 +428,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -533,48 +515,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +703,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +803,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +920,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1147,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1956,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1968,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2234,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2274,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2521,276 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the constraint's
+ * index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = pstrdup("ri_LookupKeyInPkRel");
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		/*
+		 * Note that this relies on the latest snapshot having been pushed by
+		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * that runs as part of this will need to use the snapshot to determine
+		 * whether to omit or include any detach-pending partition based on the
+		 * whether the pg_inherits row that marks it as detach-pending is
+		 * is visible to it or not, respectively.
+		 */
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2805,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2872,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2897,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3509,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3568,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..2f415b80ce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern void ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.35.3

v5-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchapplication/octet-stream; name=v5-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchDownload

From 4acd07bf1639f06c7f85a4f98289622a3eb2ee6c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 15 Sep 2022 16:45:44 +0900
Subject: [PATCH v5 3/4] Make omit_detached logic independent of ActiveSnapshot

In find_inheritance_children_extended() and elsewhere, we use
ActiveSnapshot to determine if a detach-pending partition should
be considered detached or not based on checking if the xmin of
such a partition's pg_inherits row appears committed to that
snapshot or not.

This logic really came in to make the RI queries over partitioned
PK tables running under REPEATABLE READ isolation level work
correctly by appropriately omitting or including the detach-pending
partition from the plan, based on the visibility of the pg_inherits
row of that partition to the latest snapshot.  To that end,
RI_FKey_check()  was made to force-push the latest snapshot to get
that desired behavior.  However, pushing a snapshot this way makes
the results of other scans that use ActiveSnapshot violate the
isolation of the parent transaction; 00cb86e75d added a test that
demonstrates this bug.

So, this commit changes the PartitionDesc interface to allow the
desired snapshot to be passed explicitly as a parameter, rather than
having to scribble on ActiveSnapshot to pass it.  A later commit will
change ExecGetLeafPartitionForKey() used by RI PK row lookups to use
this new interface.

Note that the default behavior in the absence of any explicitly
specified snapshot is still to use the ActiveSnapshot, so there is
no behavior change from this to non-RI queries and sites that call
find_inheritance_children() for purposes other than querying a
partitioned table.
---
 src/backend/catalog/pg_inherits.c    |  31 +++++----
 src/backend/executor/execPartition.c |   7 +-
 src/backend/optimizer/util/inherit.c |   2 +-
 src/backend/optimizer/util/plancat.c |   2 +-
 src/backend/partitioning/partdesc.c  | 100 +++++++++++++++++++--------
 src/include/catalog/pg_inherits.h    |   5 +-
 src/include/partitioning/partdesc.h  |   4 +-
 7 files changed, 100 insertions(+), 51 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 92afbc2f25..f810e5de0d 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,14 +52,18 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * Partitions marked as being detached are omitted; see
+ * A partition marked as being detached is omitted from the result if the
+ * pg_inherits row showing the partition as being detached is visible to
+ * ActiveSnapshot, doing so only when one has been pushed; see
  * find_inheritance_children_extended for details.
  */
 List *
 find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
 {
-	return find_inheritance_children_extended(parentrelId, true, lockmode,
-											  NULL, NULL);
+	return find_inheritance_children_extended(parentrelId, true,
+											  ActiveSnapshotSet() ?
+											  GetActiveSnapshot() : NULL,
+											  lockmode, NULL, NULL);
 }
 
 /*
@@ -71,16 +75,17 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
- * If omit_detached is true and there is an active snapshot (not the same as
- * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
- * marked "detach pending" is visible to that snapshot, then that partition is
- * omitted from the output list.  This makes partitions invisible depending on
- * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.  In addition, *detached_xmin (if not null)
- * is set to the xmin of the row of the detached partition.
+ * If omit_detached is true and the caller passed 'omit_detached_snapshot',
+ * the partition whose pg_inherits tuple marks it as "detach pending" is
+ * omitted from the output list if the tuple is visible to that snapshot.
+ * That is, such a partition is omitted from the output list depending on
+ * whether the transaction that marked that partition as detached appears
+ * committed to omit_detached_snapshot.  If omitted, *detached_xmin (if non
+ * NULL) is set to the xmin of that pg_inherits tuple.
  */
 List *
 find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+								   Snapshot omit_detached_snapshot,
 								   LOCKMODE lockmode, bool *detached_exist,
 								   TransactionId *detached_xmin)
 {
@@ -141,15 +146,13 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 			if (detached_exist)
 				*detached_exist = true;
 
-			if (omit_detached && ActiveSnapshotSet())
+			if (omit_detached && omit_detached_snapshot)
 			{
 				TransactionId xmin;
-				Snapshot	snap;
 
 				xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
-				snap = GetActiveSnapshot();
 
-				if (!XidInMVCCSnapshot(xmin, snap))
+				if (!XidInMVCCSnapshot(xmin, omit_detached_snapshot))
 				{
 					if (detached_xmin)
 					{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 764f2b9f8a..c90f07c433 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1121,7 +1121,8 @@ ExecInitPartitionDispatchInfo(EState *estate,
 		rel = table_open(partoid, RowExclusiveLock);
 	else
 		rel = proute->partition_root;
-	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel);
+	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel,
+										NULL);
 
 	pd = (PartitionDispatch) palloc(offsetof(PartitionDispatchData, indexes) +
 									partdesc->nparts * sizeof(int));
@@ -1708,7 +1709,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel);
+		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
@@ -2085,7 +2086,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
 			partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
 			partkey = RelationGetPartitionKey(partrel);
 			partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
-												partrel);
+												partrel, NULL);
 
 			/*
 			 * Initialize the subplan_map and subpart_map.
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index cf7691a474..cc4d27ece8 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -317,7 +317,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 	Assert(parentrte->inh);
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										parentrel);
+										parentrel, NULL);
 
 	/* A partitioned table should always have a partition descriptor. */
 	Assert(partdesc);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6d5718ee4c..9c6bc5c4a5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2221,7 +2221,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										relation);
+										relation, NULL);
 	rel->part_scheme = find_partition_scheme(root, relation);
 	Assert(partdesc != NULL && rel->part_scheme != NULL);
 	rel->boundinfo = partdesc->boundinfo;
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 737f0edd89..863b04c17d 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -48,17 +48,24 @@ typedef struct PartitionDirectoryEntry
 } PartitionDirectoryEntry;
 
 static PartitionDesc RelationBuildPartitionDesc(Relation rel,
-												bool omit_detached);
+												bool omit_detached,
+												Snapshot omit_detached_snapshot);
 
 
 /*
- * RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
+ * RelationGetPartitionDescExt
+ * 		Get partition descriptor of a partitioned table, building one and
+ * 		caching it for later use if not already or if the cached one would
+ * 		not be suitable for a given request
  *
  * We keep two partdescs in relcache: rd_partdesc includes all partitions
- * (even those being concurrently marked detached), while rd_partdesc_nodetach
- * omits (some of) those.  We store the pg_inherits.xmin value for the latter,
- * to determine whether it can be validly reused in each case, since that
- * depends on the active snapshot.
+ * (even the one being concurrently marked detached), while
+ * rd_partdesc_nodetach omits the detach-pending partition.  If the latter one
+ * is present, rd_partdesc_nodetach_xmin would have been set to the xmin of
+ * the detach-pending partition's pg_inherits row, which is used to determine
+ * whether rd_partdesc_nodetach can be validly reused for a given request by
+ * checking if the xmin appears visible to 'omit_detached_snapshot' passed by
+ * the caller.
  *
  * Note: we arrange for partition descriptors to not get freed until the
  * relcache entry's refcount goes to zero (see hacks in RelationClose,
@@ -69,7 +76,8 @@ static PartitionDesc RelationBuildPartitionDesc(Relation rel,
  * that the data doesn't become stale.
  */
 PartitionDesc
-RelationGetPartitionDesc(Relation rel, bool omit_detached)
+RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+							Snapshot omit_detached_snapshot)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
@@ -78,36 +86,52 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
 	 * do so when we are asked to include all partitions including detached;
 	 * and also when we know that there are no detached partitions.
 	 *
-	 * If there is no active snapshot, detached partitions aren't omitted
-	 * either, so we can use the cached descriptor too in that case.
+	 * omit_detached_snapshot being NULL means that the caller doesn't care
+	 * that the returned partition descriptor may contain detached partitions,
+	 * so we we can used the cached descriptor in that case too.
 	 */
 	if (likely(rel->rd_partdesc &&
 			   (!rel->rd_partdesc->detached_exist || !omit_detached ||
-				!ActiveSnapshotSet())))
+				omit_detached_snapshot == NULL)))
 		return rel->rd_partdesc;
 
 	/*
-	 * If we're asked to omit detached partitions, we may be able to use a
-	 * cached descriptor too.  We determine that based on the pg_inherits.xmin
-	 * that was saved alongside that descriptor: if the xmin that was not in
-	 * progress for that active snapshot is also not in progress for the
-	 * current active snapshot, then we can use it.  Otherwise build one from
-	 * scratch.
+	 * If we're asked to omit the detached partition, we may be able to use
+	 * the other cached descriptor, which has been made to omit the detached
+	 * partition.  Whether that descriptor can be reused in this case is
+	 * determined based on cross-checking the visibility of
+	 * rd_partdesc_nodetached_xmin, that is, the pg_inherits.xmin of the
+	 * pg_inherits row of the detached partition: if the xmin seems in-progress
+	 * to both the given omit_detached_snapshot and to the snapshot that would
+	 * have been passed when rd_partdesc_nodetached was built, then we can
+	 * reuse it.  Otherwise we must build one from scratch.
 	 */
 	if (omit_detached &&
 		rel->rd_partdesc_nodetached &&
-		ActiveSnapshotSet())
+		omit_detached_snapshot)
 	{
-		Snapshot	activesnap;
-
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
-		activesnap = GetActiveSnapshot();
 
-		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, activesnap))
+		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin,
+							   omit_detached_snapshot))
 			return rel->rd_partdesc_nodetached;
 	}
 
-	return RelationBuildPartitionDesc(rel, omit_detached);
+	return RelationBuildPartitionDesc(rel, omit_detached,
+									  omit_detached_snapshot);
+}
+
+/*
+ * RelationGetPartitionDesc
+ *		Like RelationGetPartitionDescExt() but for callers that are fine with
+ *		ActiveSnapshot being used as omit_detached_snapshot
+ */
+PartitionDesc
+RelationGetPartitionDesc(Relation rel, bool omit_detached)
+{
+	return RelationGetPartitionDescExt(rel, omit_detached,
+									   ActiveSnapshotSet() ?
+									   GetActiveSnapshot() : NULL);
 }
 
 /*
@@ -132,7 +156,8 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
  * for them.
  */
 static PartitionDesc
-RelationBuildPartitionDesc(Relation rel, bool omit_detached)
+RelationBuildPartitionDesc(Relation rel, bool omit_detached,
+						   Snapshot omit_detached_snapshot)
 {
 	PartitionDesc partdesc;
 	PartitionBoundInfo boundinfo = NULL;
@@ -160,7 +185,9 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	detached_exist = false;
 	detached_xmin = InvalidTransactionId;
 	inhoids = find_inheritance_children_extended(RelationGetRelid(rel),
-												 omit_detached, NoLock,
+												 omit_detached,
+												 omit_detached_snapshot,
+												 NoLock,
 												 &detached_exist,
 												 &detached_xmin);
 
@@ -322,11 +349,11 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	 *
 	 * Note that if a partition was found by the catalog's scan to have been
 	 * detached, but the pg_inherit tuple saying so was not visible to the
-	 * active snapshot (find_inheritance_children_extended will not have set
-	 * detached_xmin in that case), we consider there to be no "omittable"
-	 * detached partitions.
+	 * omit_detached_snapshot (find_inheritance_children_extended() will not
+	 * have set detached_xmin in that case), we consider there to be no
+	 * "omittable" detached partitions.
 	 */
-	is_omit = omit_detached && detached_exist && ActiveSnapshotSet() &&
+	is_omit = omit_detached && detached_exist && omit_detached_snapshot &&
 		TransactionIdIsValid(detached_xmin);
 
 	/*
@@ -411,9 +438,18 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
  * different views of the catalog state, but any single particular OID
  * will always get the same PartitionDesc for as long as the same
  * PartitionDirectory is used.
+ *
+ * Callers can specify a snapshot to cross-check the visibility of the
+ * pg_inherits row marking a given partition being detached.  Depending on the
+ * result of that visibility check, such a partition is either included in
+ * the returned PartitionDesc, considering it not yet detached, or omitted
+ * from it, considering it detached.
+ * XXX - currently unused, because we don't have any callers of this that
+ * would like to pass a snapshot that is not ActiveSnapshot.
  */
 PartitionDesc
-PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
+PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
+						 Snapshot omit_detached_snapshot)
 {
 	PartitionDirectoryEntry *pde;
 	Oid			relid = RelationGetRelid(rel);
@@ -428,7 +464,11 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		pde->pd = RelationGetPartitionDesc(rel, pdir->omit_detached);
+		Assert(omit_detached_snapshot == NULL);
+		if (pdir->omit_detached && ActiveSnapshotSet())
+			omit_detached_snapshot = GetActiveSnapshot();
+		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
+											  omit_detached_snapshot);
 		Assert(pde->pd != NULL);
 	}
 	return pde->pd;
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 9221c2ea57..67f148f2bf 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -23,6 +23,7 @@
 
 #include "nodes/pg_list.h"
 #include "storage/lock.h"
+#include "utils/snapshot.h"
 
 /* ----------------
  *		pg_inherits definition.  cpp turns this into
@@ -50,7 +51,9 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, InheritsParentIndexId, on pg_inher
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
 extern List *find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
-												LOCKMODE lockmode, bool *detached_exist, TransactionId *detached_xmin);
+												Snapshot omit_detached_snapshot,
+												LOCKMODE lockmode, bool *detached_exist,
+												TransactionId *detached_xmin);
 
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 								 List **numparents);
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index 7e979433b6..f42d137fc1 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -65,9 +65,11 @@ typedef struct PartitionDescData
 
 
 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+												 Snapshot omit_detached_snapshot);
 
 extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);
-extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
+extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation, Snapshot);
 extern void DestroyPartitionDirectory(PartitionDirectory pdir);
 
 extern Oid	get_default_oid_from_partdesc(PartitionDesc partdesc);
-- 
2.35.3

v5-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchapplication/octet-stream; name=v5-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchDownload

From ab4461126f1ae642dab7fc36507ad2d91f88478f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 28 Sep 2022 16:37:55 +0900
Subject: [PATCH v5 4/4] Teach ri_LookupKeyInPkRel() to pass
 omit_detached_snapshot

Now that the RI triggers that need to look up PK rows in a
partitioned table can manipulate partitions directly through
ExecGetLeafPartitionForKey(), the snapshot being passed to omit or
include detach-pending partitions can also now be passed explicitly,
rather than using ActiveSnapshot for that purpose.

For the detach-pending partitions to be correctly omitted or included
from the consideration of PK row lookup, the PartitionDesc machinery
needs to see the latest snapshot.  Pushing the latest snapshot to be
the ActiveSnapshot as is done presently meant that even the scans that
should NOT be using the latest snapshot also end up using one to
time-qualify table/partition rows.  That led to incorrect results of
PK lookups over partitioned tables running under REPEATABLE READ
isolation; 00cb86e75d added a test that demonstrates this bug.

To fix, do not force-push the latest snapshot in the cases of PK
lookup over partitioned tables (as was being done by passing
detectNewRows=true to ri_PerformCheck()), but rather make
ri_LookupKeyInPkRel() pass the latest snapshot directly to
PartitionDirectoryLookup() through its new omit_detached_snapshot
parameter.

The buggy output in src/test/isolation/expected/fk-snapshot.out
of the relevant test case that was added by 00cb86e75d has been
changed to the correct output.
---
 src/backend/executor/execPartition.c        | 12 +++++++++++-
 src/backend/partitioning/partdesc.c         |  6 ++----
 src/backend/utils/adt/ri_triggers.c         | 16 ++++++----------
 src/include/executor/execPartition.h        |  1 +
 src/test/isolation/expected/fk-snapshot.out |  4 ++--
 src/test/isolation/specs/fk-snapshot.spec   |  5 +----
 6 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c90f07c433..65cd365a8b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1607,6 +1607,14 @@ get_partition_for_tuple(PartitionKey key,
  *
  * Any intermediate parent tables encountered on the way to finding the leaf
  * partition are locked using 'lockmode' when opening.
+ *
+ * In 'omit_detached_snapshot' a caller can specify the snapshot to pass to
+ * PartitionDirectoryLookup() that in turn passes it down to the code that
+ * scans the pg_inherits catalog when building the partition descriptor from
+ * scratch.  Any detach-pending partitions are omitted from the considerations
+ * of this function if the DETACH operation appears committed to *this*
+ * snapshot.
+
  *
  * Returns NULL if no leaf partition is found for the key.
  *
@@ -1624,6 +1632,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 						   const AttrNumber *key_attnums,
 						   Datum *key_vals, char *key_nulls,
 						   Oid root_idxoid, int lockmode,
+						   Snapshot omit_detached_snapshot,
 						   Oid *leaf_idxoid)
 {
 	Relation	rel = root_rel;
@@ -1709,7 +1718,8 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
+		partdesc = PartitionDirectoryLookup(partdir, rel,
+											omit_detached_snapshot);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 863b04c17d..4bfa4076bd 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -444,8 +444,6 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
  * result of that visibility check, such a partition is either included in
  * the returned PartitionDesc, considering it not yet detached, or omitted
  * from it, considering it detached.
- * XXX - currently unused, because we don't have any callers of this that
- * would like to pass a snapshot that is not ActiveSnapshot.
  */
 PartitionDesc
 PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
@@ -464,8 +462,8 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		Assert(omit_detached_snapshot == NULL);
-		if (pdir->omit_detached && ActiveSnapshotSet())
+		if (pdir->omit_detached &&
+			omit_detached_snapshot == NULL && ActiveSnapshotSet())
 			omit_detached_snapshot = GetActiveSnapshot();
 		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
 											  omit_detached_snapshot);
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 9894bc4951..f0a33df9c9 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -434,17 +434,11 @@ RI_FKey_check(TriggerData *trigdata)
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -2679,8 +2673,9 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 		Oid		leaf_idxoid;
 
 		/*
-		 * Note that this relies on the latest snapshot having been pushed by
-		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * Pass the latest snapshot for omit_detached_snapshot so that any
+		 * detach-pending partitions are correctly omitted or included from
+		 * the considerations of this lookup.  The PartitionDesc machinery
 		 * that runs as part of this will need to use the snapshot to determine
 		 * whether to omit or include any detach-pending partition based on the
 		 * whether the pg_inherits row that marks it as detach-pending is
@@ -2690,6 +2685,7 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 												 riinfo->pk_attnums,
 												 pk_vals, pk_nulls,
 												 idxoid, RowShareLock,
+												 GetLatestSnapshot(),
 												 &leaf_idxoid);
 
 		/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index cbe1d996e6..18c6b676f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -36,6 +36,7 @@ extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
 										   const AttrNumber *key_attnums,
 										   Datum *key_vals, char *key_nulls,
 										   Oid root_idxoid, int lockmode,
+										   Snapshot omit_detached_snapshot,
 										   Oid *leaf_idxoid);
 
 
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

v5-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v5-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 62d53b827d10de3cfea43187c0dd645dc73bad1d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v5 1/4] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..1d5d7d0383 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 1d503e7e01..cfebd9c4f2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

#12

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#11)

4 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Thu, Sep 29, 2022 at 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Sep 29, 2022 at 1:46 PM Amit Langote <amitlangote09@gmail.com> wrote:

Sorry about the delay.

So I came up with such a patch that is attached as 0003.

The main problem I want to fix with it is the need for RI_FKey_check()
to "force"-push the latest snapshot that the PartitionDesc code wants
to use to correctly include or omit a detach-pending partition from
the view of that function's RI query. Scribbling on ActiveSnapshot
that way means that *all* scans involved in the execution of that
query now see a snapshot that they shouldn't likely be seeing; a bug
resulting from this has been demonstrated in a test case added by the
commit 00cb86e75d.

The fix is to make RI_FKey_check(), or really its RI_Plan's execution
function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
snapshot explicitly as a parameter of PartitionDirectoryLookup(),
which passes it down to the PartitionDesc code. No need to manipulate
ActiveSnapshot. The actual fix is in patch 0004, which I extracted
out of 0002 to keep the latter a mere refactoring patch without any
semantic changes (though a bit more on that below). BTW, I don't know
of a way to back-patch a fix like this for the bug, because there is
no way other than ActiveSnapshot to pass the desired snapshot to the
PartitionDesc code if the only way we get to that code is by executing
an SQL query plan.

0003 moves the relevant logic out of
find_inheritance_children_extended() into its callers. The logic of
deciding which snapshot to use to determine if a detach-pending
partition should indeed be omitted from the consideration of a caller
based on the result of checking the visibility of the corresponding
pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
Given the problems with using ActiveSnapshot mentioned above, I think
it is better to make the callers decide the snapshot and pass it using
a parameter named omit_detached_snapshot. Only PartitionDesc code
actually cares about sending anything but the parent query's
ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
has been changed to add the same omit_detached_snapshot parameter.
find_inheritance_children(), the other caller used in many sites that
look at a table's partitions, defaults to using ActiveSnapshot, which
does not seem problematic. Furthermore, only RI_FKey_check() needs to
pass anything other than ActiveSnapshot, so other users of
PartitionDesc, like user queries, still default to using the
ActiveSnapshot, which doesn't have any known problems either.

0001 and 0002 are mostly unchanged in this version, except I took out
the visibility bug-fix from 0002 into 0004 described above, which
looks better using the interface added by 0003 anyway. I need to
address the main concern that it's still hard to be sure that the
patch in its current form doesn't break any user-level semantics of
these RI check triggers and other concerns about the implementation
that Robert expressed in [1].

Oops, I apparently posted the wrong 0004, containing a bug that
crashes `make check`.

Fixed version attached.

Here's another version that hopefully fixes the crash reported by
Cirrus CI [1]https://cirrus-ci.com/task/4901906421121024 (permalink?) that is not reliably reproducible.

I suspect it may have to do with error_context_stack not being reset
when ri_LookupKeyInPkRel() does an early return; the `return false` in
that case was wrong too:

@@ -2693,7 +2693,7 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
         * looking for.
         */
        if (leaf_pk_rel == NULL)
-           return false;
+           goto done;

...
+done:
/*
* Pop the error context stack
*/

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]: https://cirrus-ci.com/task/4901906421121024 (permalink?)

Attachments:

v6-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v6-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 62d53b827d10de3cfea43187c0dd645dc73bad1d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v6 1/4] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..1d5d7d0383 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 1d503e7e01..cfebd9c4f2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

v6-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/octet-stream; name=v6-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From d87f962277652cbbc7401345003ac2486366ebe0 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v6 2/4] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.
---
 src/backend/executor/execPartition.c | 167 +++++++++-
 src/backend/executor/nodeLockRows.c  | 160 +++++-----
 src/backend/utils/adt/ri_triggers.c  | 448 +++++++++++++++++++++------
 src/include/executor/execPartition.h |   6 +
 src/include/executor/executor.h      |   9 +
 5 files changed, 611 insertions(+), 179 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..764f2b9f8a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1379,12 +1382,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset = -1;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/*
@@ -1591,6 +1594,158 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForKey
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified primary key tuple containing a subset of the
+ *		table's columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Any intermediate parent tables encountered on the way to finding the leaf
+ * partition are locked using 'lockmode' when opening.
+ *
+ * Returns NULL if no leaf partition is found for the key.
+ *
+ * This also finds the index in thus found leaf partition that is recorded as
+ * descending from 'root_idxoid' and returns it in '*leaf_idxoid'.
+ *
+ * Caller must close the returned relation, if any.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
+										  partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Done using the partition directory. */
+		DestroyPartitionDirectory(partdir);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a74813c7aa..352cacd70b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cfebd9c4f2..9c52e765fe 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,49 +428,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -533,48 +515,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +703,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +803,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +920,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1147,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1956,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1968,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2234,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2274,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2521,277 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the constraint's
+ * index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = pstrdup("ri_LookupKeyInPkRel");
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		/*
+		 * Note that this relies on the latest snapshot having been pushed by
+		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * that runs as part of this will need to use the snapshot to determine
+		 * whether to omit or include any detach-pending partition based on the
+		 * whether the pg_inherits row that marks it as detach-pending is
+		 * is visible to it or not, respectively.
+		 */
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			goto done;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+done:
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2806,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2873,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2898,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3510,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3569,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..2f415b80ce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern void ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.35.3

v6-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchapplication/octet-stream; name=v6-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchDownload

From 9285f3bf79f4b3996bed839236768f17f0cebfea Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 28 Sep 2022 16:37:55 +0900
Subject: [PATCH v6 4/4] Teach ri_LookupKeyInPkRel() to pass
 omit_detached_snapshot

Now that the RI triggers that need to look up PK rows in a
partitioned table can manipulate partitions directly through
ExecGetLeafPartitionForKey(), the snapshot being passed to omit or
include detach-pending partitions can also now be passed explicitly,
rather than using ActiveSnapshot for that purpose.

For the detach-pending partitions to be correctly omitted or included
from the consideration of PK row lookup, the PartitionDesc machinery
needs to see the latest snapshot.  Pushing the latest snapshot to be
the ActiveSnapshot as is done presently meant that even the scans that
should NOT be using the latest snapshot also end up using one to
time-qualify table/partition rows.  That led to incorrect results of
PK lookups over partitioned tables running under REPEATABLE READ
isolation; 00cb86e75d added a test that demonstrates this bug.

To fix, do not force-push the latest snapshot in the cases of PK
lookup over partitioned tables (as was being done by passing
detectNewRows=true to ri_PerformCheck()), but rather make
ri_LookupKeyInPkRel() pass the latest snapshot directly to
PartitionDirectoryLookup() through its new omit_detached_snapshot
parameter.

The buggy output in src/test/isolation/expected/fk-snapshot.out
of the relevant test case that was added by 00cb86e75d has been
changed to the correct output.
---
 src/backend/executor/execPartition.c        | 12 +++++++++++-
 src/backend/partitioning/partdesc.c         |  6 ++----
 src/backend/utils/adt/ri_triggers.c         | 16 ++++++----------
 src/include/executor/execPartition.h        |  1 +
 src/test/isolation/expected/fk-snapshot.out |  4 ++--
 src/test/isolation/specs/fk-snapshot.spec   |  5 +----
 6 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c90f07c433..65cd365a8b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1607,6 +1607,14 @@ get_partition_for_tuple(PartitionKey key,
  *
  * Any intermediate parent tables encountered on the way to finding the leaf
  * partition are locked using 'lockmode' when opening.
+ *
+ * In 'omit_detached_snapshot' a caller can specify the snapshot to pass to
+ * PartitionDirectoryLookup() that in turn passes it down to the code that
+ * scans the pg_inherits catalog when building the partition descriptor from
+ * scratch.  Any detach-pending partitions are omitted from the considerations
+ * of this function if the DETACH operation appears committed to *this*
+ * snapshot.
+
  *
  * Returns NULL if no leaf partition is found for the key.
  *
@@ -1624,6 +1632,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 						   const AttrNumber *key_attnums,
 						   Datum *key_vals, char *key_nulls,
 						   Oid root_idxoid, int lockmode,
+						   Snapshot omit_detached_snapshot,
 						   Oid *leaf_idxoid)
 {
 	Relation	rel = root_rel;
@@ -1709,7 +1718,8 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
+		partdesc = PartitionDirectoryLookup(partdir, rel,
+											omit_detached_snapshot);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 863b04c17d..4bfa4076bd 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -444,8 +444,6 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
  * result of that visibility check, such a partition is either included in
  * the returned PartitionDesc, considering it not yet detached, or omitted
  * from it, considering it detached.
- * XXX - currently unused, because we don't have any callers of this that
- * would like to pass a snapshot that is not ActiveSnapshot.
  */
 PartitionDesc
 PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
@@ -464,8 +462,8 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		Assert(omit_detached_snapshot == NULL);
-		if (pdir->omit_detached && ActiveSnapshotSet())
+		if (pdir->omit_detached &&
+			omit_detached_snapshot == NULL && ActiveSnapshotSet())
 			omit_detached_snapshot = GetActiveSnapshot();
 		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
 											  omit_detached_snapshot);
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 9c52e765fe..ad08f0e378 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -434,17 +434,11 @@ RI_FKey_check(TriggerData *trigdata)
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -2679,8 +2673,9 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 		Oid		leaf_idxoid;
 
 		/*
-		 * Note that this relies on the latest snapshot having been pushed by
-		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * Pass the latest snapshot for omit_detached_snapshot so that any
+		 * detach-pending partitions are correctly omitted or included from
+		 * the considerations of this lookup.  The PartitionDesc machinery
 		 * that runs as part of this will need to use the snapshot to determine
 		 * whether to omit or include any detach-pending partition based on the
 		 * whether the pg_inherits row that marks it as detach-pending is
@@ -2690,6 +2685,7 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 												 riinfo->pk_attnums,
 												 pk_vals, pk_nulls,
 												 idxoid, RowShareLock,
+												 GetLatestSnapshot(),
 												 &leaf_idxoid);
 
 		/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index cbe1d996e6..18c6b676f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -36,6 +36,7 @@ extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
 										   const AttrNumber *key_attnums,
 										   Datum *key_vals, char *key_nulls,
 										   Oid root_idxoid, int lockmode,
+										   Snapshot omit_detached_snapshot,
 										   Oid *leaf_idxoid);
 
 
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

v6-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchapplication/octet-stream; name=v6-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchDownload

From cd458c4d30cd6c2a13543f8540f10c79e7a61517 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 15 Sep 2022 16:45:44 +0900
Subject: [PATCH v6 3/4] Make omit_detached logic independent of ActiveSnapshot

In find_inheritance_children_extended() and elsewhere, we use
ActiveSnapshot to determine if a detach-pending partition should
be considered detached or not based on checking if the xmin of
such a partition's pg_inherits row appears committed to that
snapshot or not.

This logic really came in to make the RI queries over partitioned
PK tables running under REPEATABLE READ isolation level work
correctly by appropriately omitting or including the detach-pending
partition from the plan, based on the visibility of the pg_inherits
row of that partition to the latest snapshot.  To that end,
RI_FKey_check()  was made to force-push the latest snapshot to get
that desired behavior.  However, pushing a snapshot this way makes
the results of other scans that use ActiveSnapshot violate the
isolation of the parent transaction; 00cb86e75d added a test that
demonstrates this bug.

So, this commit changes the PartitionDesc interface to allow the
desired snapshot to be passed explicitly as a parameter, rather than
having to scribble on ActiveSnapshot to pass it.  A later commit will
change ExecGetLeafPartitionForKey() used by RI PK row lookups to use
this new interface.

Note that the default behavior in the absence of any explicitly
specified snapshot is still to use the ActiveSnapshot, so there is
no behavior change from this to non-RI queries and sites that call
find_inheritance_children() for purposes other than querying a
partitioned table.
---
 src/backend/catalog/pg_inherits.c    |  31 +++++----
 src/backend/executor/execPartition.c |   7 +-
 src/backend/optimizer/util/inherit.c |   2 +-
 src/backend/optimizer/util/plancat.c |   2 +-
 src/backend/partitioning/partdesc.c  | 100 +++++++++++++++++++--------
 src/include/catalog/pg_inherits.h    |   5 +-
 src/include/partitioning/partdesc.h  |   4 +-
 7 files changed, 100 insertions(+), 51 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 92afbc2f25..f810e5de0d 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,14 +52,18 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * Partitions marked as being detached are omitted; see
+ * A partition marked as being detached is omitted from the result if the
+ * pg_inherits row showing the partition as being detached is visible to
+ * ActiveSnapshot, doing so only when one has been pushed; see
  * find_inheritance_children_extended for details.
  */
 List *
 find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
 {
-	return find_inheritance_children_extended(parentrelId, true, lockmode,
-											  NULL, NULL);
+	return find_inheritance_children_extended(parentrelId, true,
+											  ActiveSnapshotSet() ?
+											  GetActiveSnapshot() : NULL,
+											  lockmode, NULL, NULL);
 }
 
 /*
@@ -71,16 +75,17 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
- * If omit_detached is true and there is an active snapshot (not the same as
- * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
- * marked "detach pending" is visible to that snapshot, then that partition is
- * omitted from the output list.  This makes partitions invisible depending on
- * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.  In addition, *detached_xmin (if not null)
- * is set to the xmin of the row of the detached partition.
+ * If omit_detached is true and the caller passed 'omit_detached_snapshot',
+ * the partition whose pg_inherits tuple marks it as "detach pending" is
+ * omitted from the output list if the tuple is visible to that snapshot.
+ * That is, such a partition is omitted from the output list depending on
+ * whether the transaction that marked that partition as detached appears
+ * committed to omit_detached_snapshot.  If omitted, *detached_xmin (if non
+ * NULL) is set to the xmin of that pg_inherits tuple.
  */
 List *
 find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+								   Snapshot omit_detached_snapshot,
 								   LOCKMODE lockmode, bool *detached_exist,
 								   TransactionId *detached_xmin)
 {
@@ -141,15 +146,13 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 			if (detached_exist)
 				*detached_exist = true;
 
-			if (omit_detached && ActiveSnapshotSet())
+			if (omit_detached && omit_detached_snapshot)
 			{
 				TransactionId xmin;
-				Snapshot	snap;
 
 				xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
-				snap = GetActiveSnapshot();
 
-				if (!XidInMVCCSnapshot(xmin, snap))
+				if (!XidInMVCCSnapshot(xmin, omit_detached_snapshot))
 				{
 					if (detached_xmin)
 					{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 764f2b9f8a..c90f07c433 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1121,7 +1121,8 @@ ExecInitPartitionDispatchInfo(EState *estate,
 		rel = table_open(partoid, RowExclusiveLock);
 	else
 		rel = proute->partition_root;
-	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel);
+	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel,
+										NULL);
 
 	pd = (PartitionDispatch) palloc(offsetof(PartitionDispatchData, indexes) +
 									partdesc->nparts * sizeof(int));
@@ -1708,7 +1709,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel);
+		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
@@ -2085,7 +2086,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
 			partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
 			partkey = RelationGetPartitionKey(partrel);
 			partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
-												partrel);
+												partrel, NULL);
 
 			/*
 			 * Initialize the subplan_map and subpart_map.
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index cf7691a474..cc4d27ece8 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -317,7 +317,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 	Assert(parentrte->inh);
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										parentrel);
+										parentrel, NULL);
 
 	/* A partitioned table should always have a partition descriptor. */
 	Assert(partdesc);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6d5718ee4c..9c6bc5c4a5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2221,7 +2221,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										relation);
+										relation, NULL);
 	rel->part_scheme = find_partition_scheme(root, relation);
 	Assert(partdesc != NULL && rel->part_scheme != NULL);
 	rel->boundinfo = partdesc->boundinfo;
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 737f0edd89..863b04c17d 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -48,17 +48,24 @@ typedef struct PartitionDirectoryEntry
 } PartitionDirectoryEntry;
 
 static PartitionDesc RelationBuildPartitionDesc(Relation rel,
-												bool omit_detached);
+												bool omit_detached,
+												Snapshot omit_detached_snapshot);
 
 
 /*
- * RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
+ * RelationGetPartitionDescExt
+ * 		Get partition descriptor of a partitioned table, building one and
+ * 		caching it for later use if not already or if the cached one would
+ * 		not be suitable for a given request
  *
  * We keep two partdescs in relcache: rd_partdesc includes all partitions
- * (even those being concurrently marked detached), while rd_partdesc_nodetach
- * omits (some of) those.  We store the pg_inherits.xmin value for the latter,
- * to determine whether it can be validly reused in each case, since that
- * depends on the active snapshot.
+ * (even the one being concurrently marked detached), while
+ * rd_partdesc_nodetach omits the detach-pending partition.  If the latter one
+ * is present, rd_partdesc_nodetach_xmin would have been set to the xmin of
+ * the detach-pending partition's pg_inherits row, which is used to determine
+ * whether rd_partdesc_nodetach can be validly reused for a given request by
+ * checking if the xmin appears visible to 'omit_detached_snapshot' passed by
+ * the caller.
  *
  * Note: we arrange for partition descriptors to not get freed until the
  * relcache entry's refcount goes to zero (see hacks in RelationClose,
@@ -69,7 +76,8 @@ static PartitionDesc RelationBuildPartitionDesc(Relation rel,
  * that the data doesn't become stale.
  */
 PartitionDesc
-RelationGetPartitionDesc(Relation rel, bool omit_detached)
+RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+							Snapshot omit_detached_snapshot)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
@@ -78,36 +86,52 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
 	 * do so when we are asked to include all partitions including detached;
 	 * and also when we know that there are no detached partitions.
 	 *
-	 * If there is no active snapshot, detached partitions aren't omitted
-	 * either, so we can use the cached descriptor too in that case.
+	 * omit_detached_snapshot being NULL means that the caller doesn't care
+	 * that the returned partition descriptor may contain detached partitions,
+	 * so we we can used the cached descriptor in that case too.
 	 */
 	if (likely(rel->rd_partdesc &&
 			   (!rel->rd_partdesc->detached_exist || !omit_detached ||
-				!ActiveSnapshotSet())))
+				omit_detached_snapshot == NULL)))
 		return rel->rd_partdesc;
 
 	/*
-	 * If we're asked to omit detached partitions, we may be able to use a
-	 * cached descriptor too.  We determine that based on the pg_inherits.xmin
-	 * that was saved alongside that descriptor: if the xmin that was not in
-	 * progress for that active snapshot is also not in progress for the
-	 * current active snapshot, then we can use it.  Otherwise build one from
-	 * scratch.
+	 * If we're asked to omit the detached partition, we may be able to use
+	 * the other cached descriptor, which has been made to omit the detached
+	 * partition.  Whether that descriptor can be reused in this case is
+	 * determined based on cross-checking the visibility of
+	 * rd_partdesc_nodetached_xmin, that is, the pg_inherits.xmin of the
+	 * pg_inherits row of the detached partition: if the xmin seems in-progress
+	 * to both the given omit_detached_snapshot and to the snapshot that would
+	 * have been passed when rd_partdesc_nodetached was built, then we can
+	 * reuse it.  Otherwise we must build one from scratch.
 	 */
 	if (omit_detached &&
 		rel->rd_partdesc_nodetached &&
-		ActiveSnapshotSet())
+		omit_detached_snapshot)
 	{
-		Snapshot	activesnap;
-
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
-		activesnap = GetActiveSnapshot();
 
-		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, activesnap))
+		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin,
+							   omit_detached_snapshot))
 			return rel->rd_partdesc_nodetached;
 	}
 
-	return RelationBuildPartitionDesc(rel, omit_detached);
+	return RelationBuildPartitionDesc(rel, omit_detached,
+									  omit_detached_snapshot);
+}
+
+/*
+ * RelationGetPartitionDesc
+ *		Like RelationGetPartitionDescExt() but for callers that are fine with
+ *		ActiveSnapshot being used as omit_detached_snapshot
+ */
+PartitionDesc
+RelationGetPartitionDesc(Relation rel, bool omit_detached)
+{
+	return RelationGetPartitionDescExt(rel, omit_detached,
+									   ActiveSnapshotSet() ?
+									   GetActiveSnapshot() : NULL);
 }
 
 /*
@@ -132,7 +156,8 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
  * for them.
  */
 static PartitionDesc
-RelationBuildPartitionDesc(Relation rel, bool omit_detached)
+RelationBuildPartitionDesc(Relation rel, bool omit_detached,
+						   Snapshot omit_detached_snapshot)
 {
 	PartitionDesc partdesc;
 	PartitionBoundInfo boundinfo = NULL;
@@ -160,7 +185,9 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	detached_exist = false;
 	detached_xmin = InvalidTransactionId;
 	inhoids = find_inheritance_children_extended(RelationGetRelid(rel),
-												 omit_detached, NoLock,
+												 omit_detached,
+												 omit_detached_snapshot,
+												 NoLock,
 												 &detached_exist,
 												 &detached_xmin);
 
@@ -322,11 +349,11 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	 *
 	 * Note that if a partition was found by the catalog's scan to have been
 	 * detached, but the pg_inherit tuple saying so was not visible to the
-	 * active snapshot (find_inheritance_children_extended will not have set
-	 * detached_xmin in that case), we consider there to be no "omittable"
-	 * detached partitions.
+	 * omit_detached_snapshot (find_inheritance_children_extended() will not
+	 * have set detached_xmin in that case), we consider there to be no
+	 * "omittable" detached partitions.
 	 */
-	is_omit = omit_detached && detached_exist && ActiveSnapshotSet() &&
+	is_omit = omit_detached && detached_exist && omit_detached_snapshot &&
 		TransactionIdIsValid(detached_xmin);
 
 	/*
@@ -411,9 +438,18 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
  * different views of the catalog state, but any single particular OID
  * will always get the same PartitionDesc for as long as the same
  * PartitionDirectory is used.
+ *
+ * Callers can specify a snapshot to cross-check the visibility of the
+ * pg_inherits row marking a given partition being detached.  Depending on the
+ * result of that visibility check, such a partition is either included in
+ * the returned PartitionDesc, considering it not yet detached, or omitted
+ * from it, considering it detached.
+ * XXX - currently unused, because we don't have any callers of this that
+ * would like to pass a snapshot that is not ActiveSnapshot.
  */
 PartitionDesc
-PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
+PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
+						 Snapshot omit_detached_snapshot)
 {
 	PartitionDirectoryEntry *pde;
 	Oid			relid = RelationGetRelid(rel);
@@ -428,7 +464,11 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		pde->pd = RelationGetPartitionDesc(rel, pdir->omit_detached);
+		Assert(omit_detached_snapshot == NULL);
+		if (pdir->omit_detached && ActiveSnapshotSet())
+			omit_detached_snapshot = GetActiveSnapshot();
+		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
+											  omit_detached_snapshot);
 		Assert(pde->pd != NULL);
 	}
 	return pde->pd;
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 9221c2ea57..67f148f2bf 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -23,6 +23,7 @@
 
 #include "nodes/pg_list.h"
 #include "storage/lock.h"
+#include "utils/snapshot.h"
 
 /* ----------------
  *		pg_inherits definition.  cpp turns this into
@@ -50,7 +51,9 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, InheritsParentIndexId, on pg_inher
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
 extern List *find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
-												LOCKMODE lockmode, bool *detached_exist, TransactionId *detached_xmin);
+												Snapshot omit_detached_snapshot,
+												LOCKMODE lockmode, bool *detached_exist,
+												TransactionId *detached_xmin);
 
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 								 List **numparents);
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index 7e979433b6..f42d137fc1 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -65,9 +65,11 @@ typedef struct PartitionDescData
 
 
 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+												 Snapshot omit_detached_snapshot);
 
 extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);
-extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
+extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation, Snapshot);
 extern void DestroyPartitionDirectory(PartitionDirectory pdir);
 
 extern Oid	get_default_oid_from_partdesc(PartitionDesc partdesc);
-- 
2.35.3

#13

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#12)

4 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Thu, Sep 29, 2022 at 6:09 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Sep 29, 2022 at 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Sep 29, 2022 at 1:46 PM Amit Langote <amitlangote09@gmail.com> wrote:

Sorry about the delay.

So I came up with such a patch that is attached as 0003.

The main problem I want to fix with it is the need for RI_FKey_check()
to "force"-push the latest snapshot that the PartitionDesc code wants
to use to correctly include or omit a detach-pending partition from
the view of that function's RI query. Scribbling on ActiveSnapshot
that way means that *all* scans involved in the execution of that
query now see a snapshot that they shouldn't likely be seeing; a bug
resulting from this has been demonstrated in a test case added by the
commit 00cb86e75d.

The fix is to make RI_FKey_check(), or really its RI_Plan's execution
function ri_LookupKeyInPkRel() added by patch 0002, pass the latest
snapshot explicitly as a parameter of PartitionDirectoryLookup(),
which passes it down to the PartitionDesc code. No need to manipulate
ActiveSnapshot. The actual fix is in patch 0004, which I extracted
out of 0002 to keep the latter a mere refactoring patch without any
semantic changes (though a bit more on that below). BTW, I don't know
of a way to back-patch a fix like this for the bug, because there is
no way other than ActiveSnapshot to pass the desired snapshot to the
PartitionDesc code if the only way we get to that code is by executing
an SQL query plan.

0003 moves the relevant logic out of
find_inheritance_children_extended() into its callers. The logic of
deciding which snapshot to use to determine if a detach-pending
partition should indeed be omitted from the consideration of a caller
based on the result of checking the visibility of the corresponding
pg_inherits row with the snapshot; it just uses ActiveSnapshot now.
Given the problems with using ActiveSnapshot mentioned above, I think
it is better to make the callers decide the snapshot and pass it using
a parameter named omit_detached_snapshot. Only PartitionDesc code
actually cares about sending anything but the parent query's
ActiveSnapshot, so the PartitionDesc and PartitionDirectory interface
has been changed to add the same omit_detached_snapshot parameter.
find_inheritance_children(), the other caller used in many sites that
look at a table's partitions, defaults to using ActiveSnapshot, which
does not seem problematic. Furthermore, only RI_FKey_check() needs to
pass anything other than ActiveSnapshot, so other users of
PartitionDesc, like user queries, still default to using the
ActiveSnapshot, which doesn't have any known problems either.

0001 and 0002 are mostly unchanged in this version, except I took out
the visibility bug-fix from 0002 into 0004 described above, which
looks better using the interface added by 0003 anyway. I need to
address the main concern that it's still hard to be sure that the
patch in its current form doesn't break any user-level semantics of
these RI check triggers and other concerns about the implementation
that Robert expressed in [1].

Oops, I apparently posted the wrong 0004, containing a bug that
crashes `make check`.

Fixed version attached.

Here's another version that hopefully fixes the crash reported by
Cirrus CI [1] that is not reliably reproducible.

And cfbot #1, which failed a bit after the above one, is not happy
with my failing to include utils/snapshot.h in a partdesc.h to which I
added:

@@ -65,9 +66,11 @@ typedef struct PartitionDescData

 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool
omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel, bool
omit_detached,
+                                                Snapshot
omit_detached_snapshot);

 extern PartitionDirectory CreatePartitionDirectory(MemoryContext
mcxt, bool omit_detached);
-extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
+extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory,
Relation, Snapshot);

So, here's a final revision for today. Sorry for the noise.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v7-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchapplication/octet-stream; name=v7-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchDownload

From 86c1179fa07fb919aa6008022d51d3edd59a073f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 28 Sep 2022 16:37:55 +0900
Subject: [PATCH v7 4/4] Teach ri_LookupKeyInPkRel() to pass
 omit_detached_snapshot

Now that the RI triggers that need to look up PK rows in a
partitioned table can manipulate partitions directly through
ExecGetLeafPartitionForKey(), the snapshot being passed to omit or
include detach-pending partitions can also now be passed explicitly,
rather than using ActiveSnapshot for that purpose.

For the detach-pending partitions to be correctly omitted or included
from the consideration of PK row lookup, the PartitionDesc machinery
needs to see the latest snapshot.  Pushing the latest snapshot to be
the ActiveSnapshot as is done presently meant that even the scans that
should NOT be using the latest snapshot also end up using one to
time-qualify table/partition rows.  That led to incorrect results of
PK lookups over partitioned tables running under REPEATABLE READ
isolation; 00cb86e75d added a test that demonstrates this bug.

To fix, do not force-push the latest snapshot in the cases of PK
lookup over partitioned tables (as was being done by passing
detectNewRows=true to ri_PerformCheck()), but rather make
ri_LookupKeyInPkRel() pass the latest snapshot directly to
PartitionDirectoryLookup() through its new omit_detached_snapshot
parameter.

The buggy output in src/test/isolation/expected/fk-snapshot.out
of the relevant test case that was added by 00cb86e75d has been
changed to the correct output.
---
 src/backend/executor/execPartition.c        | 12 +++++++++++-
 src/backend/partitioning/partdesc.c         |  6 ++----
 src/backend/utils/adt/ri_triggers.c         | 16 ++++++----------
 src/include/executor/execPartition.h        |  1 +
 src/test/isolation/expected/fk-snapshot.out |  4 ++--
 src/test/isolation/specs/fk-snapshot.spec   |  5 +----
 6 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index c90f07c433..65cd365a8b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1607,6 +1607,14 @@ get_partition_for_tuple(PartitionKey key,
  *
  * Any intermediate parent tables encountered on the way to finding the leaf
  * partition are locked using 'lockmode' when opening.
+ *
+ * In 'omit_detached_snapshot' a caller can specify the snapshot to pass to
+ * PartitionDirectoryLookup() that in turn passes it down to the code that
+ * scans the pg_inherits catalog when building the partition descriptor from
+ * scratch.  Any detach-pending partitions are omitted from the considerations
+ * of this function if the DETACH operation appears committed to *this*
+ * snapshot.
+
  *
  * Returns NULL if no leaf partition is found for the key.
  *
@@ -1624,6 +1632,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 						   const AttrNumber *key_attnums,
 						   Datum *key_vals, char *key_nulls,
 						   Oid root_idxoid, int lockmode,
+						   Snapshot omit_detached_snapshot,
 						   Oid *leaf_idxoid)
 {
 	Relation	rel = root_rel;
@@ -1709,7 +1718,8 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
+		partdesc = PartitionDirectoryLookup(partdir, rel,
+											omit_detached_snapshot);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 863b04c17d..4bfa4076bd 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -444,8 +444,6 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
  * result of that visibility check, such a partition is either included in
  * the returned PartitionDesc, considering it not yet detached, or omitted
  * from it, considering it detached.
- * XXX - currently unused, because we don't have any callers of this that
- * would like to pass a snapshot that is not ActiveSnapshot.
  */
 PartitionDesc
 PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
@@ -464,8 +462,8 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		Assert(omit_detached_snapshot == NULL);
-		if (pdir->omit_detached && ActiveSnapshotSet())
+		if (pdir->omit_detached &&
+			omit_detached_snapshot == NULL && ActiveSnapshotSet())
 			omit_detached_snapshot = GetActiveSnapshot();
 		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
 											  omit_detached_snapshot);
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 9c52e765fe..ad08f0e378 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -434,17 +434,11 @@ RI_FKey_check(TriggerData *trigdata)
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -2679,8 +2673,9 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 		Oid		leaf_idxoid;
 
 		/*
-		 * Note that this relies on the latest snapshot having been pushed by
-		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * Pass the latest snapshot for omit_detached_snapshot so that any
+		 * detach-pending partitions are correctly omitted or included from
+		 * the considerations of this lookup.  The PartitionDesc machinery
 		 * that runs as part of this will need to use the snapshot to determine
 		 * whether to omit or include any detach-pending partition based on the
 		 * whether the pg_inherits row that marks it as detach-pending is
@@ -2690,6 +2685,7 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 												 riinfo->pk_attnums,
 												 pk_vals, pk_nulls,
 												 idxoid, RowShareLock,
+												 GetLatestSnapshot(),
 												 &leaf_idxoid);
 
 		/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index cbe1d996e6..18c6b676f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -36,6 +36,7 @@ extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
 										   const AttrNumber *key_attnums,
 										   Datum *key_vals, char *key_nulls,
 										   Oid root_idxoid, int lockmode,
+										   Snapshot omit_detached_snapshot,
 										   Oid *leaf_idxoid);
 
 
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

v7-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/octet-stream; name=v7-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From d87f962277652cbbc7401345003ac2486366ebe0 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v7 2/4] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.
---
 src/backend/executor/execPartition.c | 167 +++++++++-
 src/backend/executor/nodeLockRows.c  | 160 +++++-----
 src/backend/utils/adt/ri_triggers.c  | 448 +++++++++++++++++++++------
 src/include/executor/execPartition.h |   6 +
 src/include/executor/executor.h      |   9 +
 5 files changed, 611 insertions(+), 179 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..764f2b9f8a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1379,12 +1382,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset = -1;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/*
@@ -1591,6 +1594,158 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForKey
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified primary key tuple containing a subset of the
+ *		table's columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Any intermediate parent tables encountered on the way to finding the leaf
+ * partition are locked using 'lockmode' when opening.
+ *
+ * Returns NULL if no leaf partition is found for the key.
+ *
+ * This also finds the index in thus found leaf partition that is recorded as
+ * descending from 'root_idxoid' and returns it in '*leaf_idxoid'.
+ *
+ * Caller must close the returned relation, if any.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
+										  partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Done using the partition directory. */
+		DestroyPartitionDirectory(partdir);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a74813c7aa..352cacd70b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cfebd9c4f2..9c52e765fe 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,49 +428,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -533,48 +515,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +703,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +803,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +920,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1147,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1956,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1968,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2234,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2274,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2521,277 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the constraint's
+ * index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = pstrdup("ri_LookupKeyInPkRel");
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		/*
+		 * Note that this relies on the latest snapshot having been pushed by
+		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * that runs as part of this will need to use the snapshot to determine
+		 * whether to omit or include any detach-pending partition based on the
+		 * whether the pg_inherits row that marks it as detach-pending is
+		 * is visible to it or not, respectively.
+		 */
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			goto done;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+done:
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2806,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2873,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2898,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3510,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3569,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..2f415b80ce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern void ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.35.3

v7-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v7-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 62d53b827d10de3cfea43187c0dd645dc73bad1d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v7 1/4] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..1d5d7d0383 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 1d503e7e01..cfebd9c4f2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

v7-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchapplication/octet-stream; name=v7-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchDownload

From 1ed5317be7450f42a7211e0593fcf0c7557c5f3f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 15 Sep 2022 16:45:44 +0900
Subject: [PATCH v7 3/4] Make omit_detached logic independent of ActiveSnapshot

In find_inheritance_children_extended() and elsewhere, we use
ActiveSnapshot to determine if a detach-pending partition should
be considered detached or not based on checking if the xmin of
such a partition's pg_inherits row appears committed to that
snapshot or not.

This logic really came in to make the RI queries over partitioned
PK tables running under REPEATABLE READ isolation level work
correctly by appropriately omitting or including the detach-pending
partition from the plan, based on the visibility of the pg_inherits
row of that partition to the latest snapshot.  To that end,
RI_FKey_check()  was made to force-push the latest snapshot to get
that desired behavior.  However, pushing a snapshot this way makes
the results of other scans that use ActiveSnapshot violate the
isolation of the parent transaction; 00cb86e75d added a test that
demonstrates this bug.

So, this commit changes the PartitionDesc interface to allow the
desired snapshot to be passed explicitly as a parameter, rather than
having to scribble on ActiveSnapshot to pass it.  A later commit will
change ExecGetLeafPartitionForKey() used by RI PK row lookups to use
this new interface.

Note that the default behavior in the absence of any explicitly
specified snapshot is still to use the ActiveSnapshot, so there is
no behavior change from this to non-RI queries and sites that call
find_inheritance_children() for purposes other than querying a
partitioned table.
---
 src/backend/catalog/pg_inherits.c    |  31 +++++----
 src/backend/executor/execPartition.c |   7 +-
 src/backend/optimizer/util/inherit.c |   2 +-
 src/backend/optimizer/util/plancat.c |   2 +-
 src/backend/partitioning/partdesc.c  | 100 +++++++++++++++++++--------
 src/include/catalog/pg_inherits.h    |   5 +-
 src/include/partitioning/partdesc.h  |   5 +-
 7 files changed, 101 insertions(+), 51 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 92afbc2f25..f810e5de0d 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,14 +52,18 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * Partitions marked as being detached are omitted; see
+ * A partition marked as being detached is omitted from the result if the
+ * pg_inherits row showing the partition as being detached is visible to
+ * ActiveSnapshot, doing so only when one has been pushed; see
  * find_inheritance_children_extended for details.
  */
 List *
 find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
 {
-	return find_inheritance_children_extended(parentrelId, true, lockmode,
-											  NULL, NULL);
+	return find_inheritance_children_extended(parentrelId, true,
+											  ActiveSnapshotSet() ?
+											  GetActiveSnapshot() : NULL,
+											  lockmode, NULL, NULL);
 }
 
 /*
@@ -71,16 +75,17 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
- * If omit_detached is true and there is an active snapshot (not the same as
- * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
- * marked "detach pending" is visible to that snapshot, then that partition is
- * omitted from the output list.  This makes partitions invisible depending on
- * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.  In addition, *detached_xmin (if not null)
- * is set to the xmin of the row of the detached partition.
+ * If omit_detached is true and the caller passed 'omit_detached_snapshot',
+ * the partition whose pg_inherits tuple marks it as "detach pending" is
+ * omitted from the output list if the tuple is visible to that snapshot.
+ * That is, such a partition is omitted from the output list depending on
+ * whether the transaction that marked that partition as detached appears
+ * committed to omit_detached_snapshot.  If omitted, *detached_xmin (if non
+ * NULL) is set to the xmin of that pg_inherits tuple.
  */
 List *
 find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+								   Snapshot omit_detached_snapshot,
 								   LOCKMODE lockmode, bool *detached_exist,
 								   TransactionId *detached_xmin)
 {
@@ -141,15 +146,13 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 			if (detached_exist)
 				*detached_exist = true;
 
-			if (omit_detached && ActiveSnapshotSet())
+			if (omit_detached && omit_detached_snapshot)
 			{
 				TransactionId xmin;
-				Snapshot	snap;
 
 				xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
-				snap = GetActiveSnapshot();
 
-				if (!XidInMVCCSnapshot(xmin, snap))
+				if (!XidInMVCCSnapshot(xmin, omit_detached_snapshot))
 				{
 					if (detached_xmin)
 					{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 764f2b9f8a..c90f07c433 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1121,7 +1121,8 @@ ExecInitPartitionDispatchInfo(EState *estate,
 		rel = table_open(partoid, RowExclusiveLock);
 	else
 		rel = proute->partition_root;
-	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel);
+	partdesc = PartitionDirectoryLookup(estate->es_partition_directory, rel,
+										NULL);
 
 	pd = (PartitionDispatch) palloc(offsetof(PartitionDispatchData, indexes) +
 									partdesc->nparts * sizeof(int));
@@ -1708,7 +1709,7 @@ ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
 
 		/* Get the PartitionDesc using the partition directory machinery.  */
 		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
-		partdesc = PartitionDirectoryLookup(partdir, rel);
+		partdesc = PartitionDirectoryLookup(partdir, rel, NULL);
 
 		/* Find the partition for the key. */
 		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
@@ -2085,7 +2086,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
 			partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
 			partkey = RelationGetPartitionKey(partrel);
 			partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
-												partrel);
+												partrel, NULL);
 
 			/*
 			 * Initialize the subplan_map and subpart_map.
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index cf7691a474..cc4d27ece8 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -317,7 +317,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
 	Assert(parentrte->inh);
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										parentrel);
+										parentrel, NULL);
 
 	/* A partitioned table should always have a partition descriptor. */
 	Assert(partdesc);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6d5718ee4c..9c6bc5c4a5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2221,7 +2221,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
-										relation);
+										relation, NULL);
 	rel->part_scheme = find_partition_scheme(root, relation);
 	Assert(partdesc != NULL && rel->part_scheme != NULL);
 	rel->boundinfo = partdesc->boundinfo;
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 737f0edd89..863b04c17d 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -48,17 +48,24 @@ typedef struct PartitionDirectoryEntry
 } PartitionDirectoryEntry;
 
 static PartitionDesc RelationBuildPartitionDesc(Relation rel,
-												bool omit_detached);
+												bool omit_detached,
+												Snapshot omit_detached_snapshot);
 
 
 /*
- * RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
+ * RelationGetPartitionDescExt
+ * 		Get partition descriptor of a partitioned table, building one and
+ * 		caching it for later use if not already or if the cached one would
+ * 		not be suitable for a given request
  *
  * We keep two partdescs in relcache: rd_partdesc includes all partitions
- * (even those being concurrently marked detached), while rd_partdesc_nodetach
- * omits (some of) those.  We store the pg_inherits.xmin value for the latter,
- * to determine whether it can be validly reused in each case, since that
- * depends on the active snapshot.
+ * (even the one being concurrently marked detached), while
+ * rd_partdesc_nodetach omits the detach-pending partition.  If the latter one
+ * is present, rd_partdesc_nodetach_xmin would have been set to the xmin of
+ * the detach-pending partition's pg_inherits row, which is used to determine
+ * whether rd_partdesc_nodetach can be validly reused for a given request by
+ * checking if the xmin appears visible to 'omit_detached_snapshot' passed by
+ * the caller.
  *
  * Note: we arrange for partition descriptors to not get freed until the
  * relcache entry's refcount goes to zero (see hacks in RelationClose,
@@ -69,7 +76,8 @@ static PartitionDesc RelationBuildPartitionDesc(Relation rel,
  * that the data doesn't become stale.
  */
 PartitionDesc
-RelationGetPartitionDesc(Relation rel, bool omit_detached)
+RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+							Snapshot omit_detached_snapshot)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
@@ -78,36 +86,52 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
 	 * do so when we are asked to include all partitions including detached;
 	 * and also when we know that there are no detached partitions.
 	 *
-	 * If there is no active snapshot, detached partitions aren't omitted
-	 * either, so we can use the cached descriptor too in that case.
+	 * omit_detached_snapshot being NULL means that the caller doesn't care
+	 * that the returned partition descriptor may contain detached partitions,
+	 * so we we can used the cached descriptor in that case too.
 	 */
 	if (likely(rel->rd_partdesc &&
 			   (!rel->rd_partdesc->detached_exist || !omit_detached ||
-				!ActiveSnapshotSet())))
+				omit_detached_snapshot == NULL)))
 		return rel->rd_partdesc;
 
 	/*
-	 * If we're asked to omit detached partitions, we may be able to use a
-	 * cached descriptor too.  We determine that based on the pg_inherits.xmin
-	 * that was saved alongside that descriptor: if the xmin that was not in
-	 * progress for that active snapshot is also not in progress for the
-	 * current active snapshot, then we can use it.  Otherwise build one from
-	 * scratch.
+	 * If we're asked to omit the detached partition, we may be able to use
+	 * the other cached descriptor, which has been made to omit the detached
+	 * partition.  Whether that descriptor can be reused in this case is
+	 * determined based on cross-checking the visibility of
+	 * rd_partdesc_nodetached_xmin, that is, the pg_inherits.xmin of the
+	 * pg_inherits row of the detached partition: if the xmin seems in-progress
+	 * to both the given omit_detached_snapshot and to the snapshot that would
+	 * have been passed when rd_partdesc_nodetached was built, then we can
+	 * reuse it.  Otherwise we must build one from scratch.
 	 */
 	if (omit_detached &&
 		rel->rd_partdesc_nodetached &&
-		ActiveSnapshotSet())
+		omit_detached_snapshot)
 	{
-		Snapshot	activesnap;
-
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
-		activesnap = GetActiveSnapshot();
 
-		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, activesnap))
+		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin,
+							   omit_detached_snapshot))
 			return rel->rd_partdesc_nodetached;
 	}
 
-	return RelationBuildPartitionDesc(rel, omit_detached);
+	return RelationBuildPartitionDesc(rel, omit_detached,
+									  omit_detached_snapshot);
+}
+
+/*
+ * RelationGetPartitionDesc
+ *		Like RelationGetPartitionDescExt() but for callers that are fine with
+ *		ActiveSnapshot being used as omit_detached_snapshot
+ */
+PartitionDesc
+RelationGetPartitionDesc(Relation rel, bool omit_detached)
+{
+	return RelationGetPartitionDescExt(rel, omit_detached,
+									   ActiveSnapshotSet() ?
+									   GetActiveSnapshot() : NULL);
 }
 
 /*
@@ -132,7 +156,8 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
  * for them.
  */
 static PartitionDesc
-RelationBuildPartitionDesc(Relation rel, bool omit_detached)
+RelationBuildPartitionDesc(Relation rel, bool omit_detached,
+						   Snapshot omit_detached_snapshot)
 {
 	PartitionDesc partdesc;
 	PartitionBoundInfo boundinfo = NULL;
@@ -160,7 +185,9 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	detached_exist = false;
 	detached_xmin = InvalidTransactionId;
 	inhoids = find_inheritance_children_extended(RelationGetRelid(rel),
-												 omit_detached, NoLock,
+												 omit_detached,
+												 omit_detached_snapshot,
+												 NoLock,
 												 &detached_exist,
 												 &detached_xmin);
 
@@ -322,11 +349,11 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	 *
 	 * Note that if a partition was found by the catalog's scan to have been
 	 * detached, but the pg_inherit tuple saying so was not visible to the
-	 * active snapshot (find_inheritance_children_extended will not have set
-	 * detached_xmin in that case), we consider there to be no "omittable"
-	 * detached partitions.
+	 * omit_detached_snapshot (find_inheritance_children_extended() will not
+	 * have set detached_xmin in that case), we consider there to be no
+	 * "omittable" detached partitions.
 	 */
-	is_omit = omit_detached && detached_exist && ActiveSnapshotSet() &&
+	is_omit = omit_detached && detached_exist && omit_detached_snapshot &&
 		TransactionIdIsValid(detached_xmin);
 
 	/*
@@ -411,9 +438,18 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
  * different views of the catalog state, but any single particular OID
  * will always get the same PartitionDesc for as long as the same
  * PartitionDirectory is used.
+ *
+ * Callers can specify a snapshot to cross-check the visibility of the
+ * pg_inherits row marking a given partition being detached.  Depending on the
+ * result of that visibility check, such a partition is either included in
+ * the returned PartitionDesc, considering it not yet detached, or omitted
+ * from it, considering it detached.
+ * XXX - currently unused, because we don't have any callers of this that
+ * would like to pass a snapshot that is not ActiveSnapshot.
  */
 PartitionDesc
-PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
+PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
+						 Snapshot omit_detached_snapshot)
 {
 	PartitionDirectoryEntry *pde;
 	Oid			relid = RelationGetRelid(rel);
@@ -428,7 +464,11 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		pde->pd = RelationGetPartitionDesc(rel, pdir->omit_detached);
+		Assert(omit_detached_snapshot == NULL);
+		if (pdir->omit_detached && ActiveSnapshotSet())
+			omit_detached_snapshot = GetActiveSnapshot();
+		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
+											  omit_detached_snapshot);
 		Assert(pde->pd != NULL);
 	}
 	return pde->pd;
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 9221c2ea57..67f148f2bf 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -23,6 +23,7 @@
 
 #include "nodes/pg_list.h"
 #include "storage/lock.h"
+#include "utils/snapshot.h"
 
 /* ----------------
  *		pg_inherits definition.  cpp turns this into
@@ -50,7 +51,9 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, InheritsParentIndexId, on pg_inher
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
 extern List *find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
-												LOCKMODE lockmode, bool *detached_exist, TransactionId *detached_xmin);
+												Snapshot omit_detached_snapshot,
+												LOCKMODE lockmode, bool *detached_exist,
+												TransactionId *detached_xmin);
 
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 								 List **numparents);
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index 7e979433b6..efd0e4a7eb 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -14,6 +14,7 @@
 
 #include "partitioning/partdefs.h"
 #include "utils/relcache.h"
+#include "utils/snapshot.h"
 
 /*
  * Information about partitions of a partitioned table.
@@ -65,9 +66,11 @@ typedef struct PartitionDescData
 
 
 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+												 Snapshot omit_detached_snapshot);
 
 extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);
-extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
+extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation, Snapshot);
 extern void DestroyPartitionDirectory(PartitionDirectory pdir);
 
 extern Oid	get_default_oid_from_partdesc(PartitionDesc partdesc);
-- 
2.35.3

#14

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Amit Langote (#13)

Re: Eliminating SPI from RI triggers - take 2

Hi,

On 2022-09-29 18:18:10 +0900, Amit Langote wrote:

So, here's a final revision for today. Sorry for the noise.

This appears to fail on 32bit systems. Seems the new test is indeed
worthwhile...

https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406

[19:12:24.452] Summary of Failures:
[19:12:24.452]
[19:12:24.452] 2/243 postgresql:main / main/regress FAIL 45.08s (exit status 1)
[19:12:24.452] 4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade ERROR 71.96s
[19:12:24.452] 32/243 postgresql:recovery / recovery/027_stream_regress ERROR 45.84s

Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
suceeded in capture the test files of the 32bit build (and perhaps broke it
for 64bit builds as well?), so I can't see the regression.diffs contents.

[19:12:24.387] alter_table ... FAILED 4546 ms
...
[19:12:24.387] ========================
[19:12:24.387] 1 of 211 tests failed.
[19:12:24.387] ========================
[19:12:24.387]
...

Greetings,

Andres Freund

#15

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#14)

Re: Eliminating SPI from RI triggers - take 2

Hi,

On 2022-10-01 18:21:15 -0700, Andres Freund wrote:

On 2022-09-29 18:18:10 +0900, Amit Langote wrote:

So, here's a final revision for today. Sorry for the noise.

This appears to fail on 32bit systems. Seems the new test is indeed
worthwhile...

https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406

[19:12:24.452] Summary of Failures:
[19:12:24.452]
[19:12:24.452] 2/243 postgresql:main / main/regress FAIL 45.08s (exit status 1)
[19:12:24.452] 4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade ERROR 71.96s
[19:12:24.452] 32/243 postgresql:recovery / recovery/027_stream_regress ERROR 45.84s

Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
suceeded in capture the test files of the 32bit build (and perhaps broke it
for 64bit builds as well?), so I can't see the regression.diffs contents.

Oh, that appears to have been an issue on the CI side (*), while uploading the
logs. The previous run did catch the error:

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out	2022-09-30 15:05:49.930613669 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out	2022-09-30 15:11:21.050383258 +0000
@@ -672,6 +672,8 @@
 ALTER TABLE FKTABLE ADD FOREIGN KEY(ftest1) references pktable;
 -- Check it actually works
 INSERT INTO FKTABLE VALUES(42);		-- should succeed
+ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
+DETAIL:  Key (ftest1)=(42) is not present in table "pktable".
 INSERT INTO FKTABLE VALUES(43);		-- should fail
 ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
 DETAIL:  Key (ftest1)=(43) is not present in table "pktable".

Greetings,

Andres Freund

* Error from upload stream: rpc error: code = Unknown desc =

#16

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Andres Freund (#15)

1 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Sun, Oct 2, 2022 at 10:24 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-10-01 18:21:15 -0700, Andres Freund wrote:

On 2022-09-29 18:18:10 +0900, Amit Langote wrote:

So, here's a final revision for today. Sorry for the noise.

This appears to fail on 32bit systems. Seems the new test is indeed
worthwhile...

https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406

[19:12:24.452] Summary of Failures:
[19:12:24.452]
[19:12:24.452] 2/243 postgresql:main / main/regress FAIL 45.08s (exit status 1)
[19:12:24.452] 4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade ERROR 71.96s
[19:12:24.452] 32/243 postgresql:recovery / recovery/027_stream_regress ERROR 45.84s

Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
suceeded in capture the test files of the 32bit build (and perhaps broke it
for 64bit builds as well?), so I can't see the regression.diffs contents.

Oh, that appears to have been an issue on the CI side (*), while uploading the
logs. The previous run did catch the error:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out      2022-09-30 15:05:49.930613669 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out  2022-09-30 15:11:21.050383258 +0000
@@ -672,6 +672,8 @@
ALTER TABLE FKTABLE ADD FOREIGN KEY(ftest1) references pktable;
-- Check it actually works
INSERT INTO FKTABLE VALUES(42);                -- should succeed
+ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
+DETAIL:  Key (ftest1)=(42) is not present in table "pktable".
INSERT INTO FKTABLE VALUES(43);                -- should fail
ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
DETAIL:  Key (ftest1)=(43) is not present in table "pktable".

Thanks for the heads up. Hmm, this I am not sure how to reproduce on
my own, so I am currently left with second-guessing what may be going
wrong on 32 bit machines with whichever of the 4 patches.

For now, I'll just post 0001, which I am claiming has no semantic
changes (proof pending), to rule out that that one's responsible.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v6-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/octet-stream; name=v6-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 363e7539afea9b5ef287865b5176395818e880df Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v6 1/4] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..a30553ea67 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 1d503e7e01..cfebd9c4f2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

#17

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#16)

Re: Eliminating SPI from RI triggers - take 2

On Fri, Oct 7, 2022 at 6:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sun, Oct 2, 2022 at 10:24 AM Andres Freund <andres@anarazel.de> wrote:
On 2022-10-01 18:21:15 -0700, Andres Freund wrote:

On 2022-09-29 18:18:10 +0900, Amit Langote wrote:

So, here's a final revision for today. Sorry for the noise.

This appears to fail on 32bit systems. Seems the new test is indeed
worthwhile...

https://cirrus-ci.com/task/6581521615159296?logs=test_world_32#L406

[19:12:24.452] Summary of Failures:
[19:12:24.452]
[19:12:24.452] 2/243 postgresql:main / main/regress FAIL 45.08s (exit status 1)
[19:12:24.452] 4/243 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade ERROR 71.96s
[19:12:24.452] 32/243 postgresql:recovery / recovery/027_stream_regress ERROR 45.84s

Unfortunately ccf36ea2580f66abbc37f27d8c296861ffaad9bf seems to not have
suceeded in capture the test files of the 32bit build (and perhaps broke it
for 64bit builds as well?), so I can't see the regression.diffs contents.

Oh, that appears to have been an issue on the CI side (*), while uploading the
logs. The previous run did catch the error:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/alter_table.out      2022-09-30 15:05:49.930613669 +0000
+++ /tmp/cirrus-ci-build/build-32/testrun/main/regress/results/alter_table.out  2022-09-30 15:11:21.050383258 +0000
@@ -672,6 +672,8 @@
ALTER TABLE FKTABLE ADD FOREIGN KEY(ftest1) references pktable;
-- Check it actually works
INSERT INTO FKTABLE VALUES(42);                -- should succeed
+ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
+DETAIL:  Key (ftest1)=(42) is not present in table "pktable".
INSERT INTO FKTABLE VALUES(43);                -- should fail
ERROR:  insert or update on table "fktable" violates foreign key constraint "fktable_ftest1_fkey"
DETAIL:  Key (ftest1)=(43) is not present in table "pktable".
Thanks for the heads up. Hmm, this I am not sure how to reproduce on
my own, so I am currently left with second-guessing what may be going
wrong on 32 bit machines with whichever of the 4 patches.

For now, I'll just post 0001, which I am claiming has no semantic
changes (proof pending), to rule out that that one's responsible.

Nope, not 0001. Here's 0001+0002.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#18

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Amit Langote (#17)

Re: Eliminating SPI from RI triggers - take 2

On 2022-Oct-07, Amit Langote wrote:

Thanks for the heads up. Hmm, this I am not sure how to reproduce on
my own, so I am currently left with second-guessing what may be going
wrong on 32 bit machines with whichever of the 4 patches.

For now, I'll just post 0001, which I am claiming has no semantic
changes (proof pending), to rule out that that one's responsible.

Nope, not 0001. Here's 0001+0002.

Please note that you can set up a github repository so that cirrus-ci
tests whatever patches you like, without having to post them to
pg-hackers. See src/tools/ci/README, it takes three minutes if you
already have the account and repository.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#19

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Alvaro Herrera (#18)

Re: Eliminating SPI from RI triggers - take 2

On Fri, Oct 7, 2022 at 19:15 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Oct-07, Amit Langote wrote:

Thanks for the heads up. Hmm, this I am not sure how to reproduce on
my own, so I am currently left with second-guessing what may be going
wrong on 32 bit machines with whichever of the 4 patches.

For now, I'll just post 0001, which I am claiming has no semantic
changes (proof pending), to rule out that that one's responsible.

Nope, not 0001. Here's 0001+0002.

Please note that you can set up a github repository so that cirrus-ci
tests whatever patches you like, without having to post them to
pg-hackers. See src/tools/ci/README, it takes three minutes if you
already have the account and repository.

Ah, that’s right. Will do so, thanks for the suggestion.

--

Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#20

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#19)

2 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Fri, Oct 7, 2022 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Oct 7, 2022 at 19:15 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Oct-07, Amit Langote wrote:

Thanks for the heads up. Hmm, this I am not sure how to reproduce on
my own, so I am currently left with second-guessing what may be going
wrong on 32 bit machines with whichever of the 4 patches.

For now, I'll just post 0001, which I am claiming has no semantic
changes (proof pending), to rule out that that one's responsible.

Nope, not 0001. Here's 0001+0002.

I had forgotten to actually attach anything with that email.

Please note that you can set up a github repository so that cirrus-ci
tests whatever patches you like, without having to post them to
pg-hackers. See src/tools/ci/README, it takes three minutes if you
already have the account and repository.

Ah, that’s right. Will do so, thanks for the suggestion.

I'm waiting to hear from GitHub Support to resolve an error I'm facing
trying to add Cirrus CI to my account.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v6-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/x-patch; name=v6-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 363e7539afea9b5ef287865b5176395818e880df Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v6 1/4] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..a30553ea67 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 1d503e7e01..cfebd9c4f2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

v6-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/x-patch; name=v6-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From 0d8fc5f14da3fbd0234db46616cae3752dc919a5 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v6 2/4] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.
---
 src/backend/executor/execPartition.c | 167 +++++++++-
 src/backend/executor/nodeLockRows.c  | 160 +++++-----
 src/backend/utils/adt/ri_triggers.c  | 448 +++++++++++++++++++++------
 src/include/executor/execPartition.h |   6 +
 src/include/executor/executor.h      |   9 +
 5 files changed, 611 insertions(+), 179 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..764f2b9f8a 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1379,12 +1382,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset = -1;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/*
@@ -1591,6 +1594,158 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForKey
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified primary key tuple containing a subset of the
+ *		table's columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Any intermediate parent tables encountered on the way to finding the leaf
+ * partition are locked using 'lockmode' when opening.
+ *
+ * Returns NULL if no leaf partition is found for the key.
+ *
+ * This also finds the index in thus found leaf partition that is recorded as
+ * descending from 'root_idxoid' and returns it in '*leaf_idxoid'.
+ *
+ * Caller must close the returned relation, if any.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
+										  partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Done using the partition directory. */
+		DestroyPartitionDirectory(partdir);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a74813c7aa..352cacd70b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cfebd9c4f2..9c52e765fe 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,49 +428,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -533,48 +515,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +703,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +803,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +920,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1147,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1956,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1968,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2234,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2274,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2521,277 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the constraint's
+ * index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	AclResult	aclresult;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = pstrdup("ri_LookupKeyInPkRel");
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	{
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Must explicitly check that the new user has permissions to look into the
+	 * schema of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		/*
+		 * Note that this relies on the latest snapshot having been pushed by
+		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * that runs as part of this will need to use the snapshot to determine
+		 * whether to omit or include any detach-pending partition based on the
+		 * whether the pg_inherits row that marks it as detach-pending is
+		 * is visible to it or not, respectively.
+		 */
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			goto done;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+done:
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2806,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2873,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2898,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3510,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3569,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..cbe1d996e6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..2f415b80ce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern void ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.35.3

#21

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Amit Langote (#10)

1 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Thu, Sep 29, 2022 at 12:47 AM Amit Langote <amitlangote09@gmail.com> wrote:

[ patches ]

While looking over this thread I came across this code:

/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
estate->es_partition_directory =
CreatePartitionDirectory(estate->es_query_cxt, false);

But CreatePartitionDirectory is declared like this:

extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt,
bool omit_detached);

So the comment seems to say the opposite of what the code does. The
code seems to match the explanation in the commit message for
71f4c8c6f74ba021e55d35b1128d22fb8c6e1629, so I am guessing that
perhaps s/always/never/ is needed here.

I also noticed that ExecCreatePartitionPruneState no longer exists in
the code but is still referenced in
src/test/modules/delay_execution/specs/partition-addition.spec

Regarding 0003, it seems unfortunate that
find_inheritance_children_extended() will now have 6 arguments 4 of
which have to do with detached partition handling. That is a lot of
detached partition handling, and it's hard to reason about. I don't
see an obvious way of simplifying things very much, but I wonder if we
could at least have the new omit_detached_snapshot snapshot replace
the existing bool omit_detached flag. Something like the attached
incremental patch.

Probably we need to go further than the attached, though. I don't
think that PartitionDirectoryLookup() should be getting any new
arguments. The whole point of that function is that it's supposed to
ensure that the returned value is stable, and the comments say so. But
with these changes it isn't any more, because it depends on the
snapshot you pass. It seems fine to specify when you create the
partition directory that you want it to show a different, still-stable
view of the world, but as written, it seems to me to undermine the
idea that the return value is expected to be stable at all. Is there a
way we can avoid that?

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

fewer-arguments.txttext/plain; charset=US-ASCII; name=fewer-arguments.txtDownload

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index f810e5de0d..eb5377e7c0 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -60,7 +60,7 @@ typedef struct SeenRelsEntry
 List *
 find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
 {
-	return find_inheritance_children_extended(parentrelId, true,
+	return find_inheritance_children_extended(parentrelId,
 											  ActiveSnapshotSet() ?
 											  GetActiveSnapshot() : NULL,
 											  lockmode, NULL, NULL);
@@ -75,7 +75,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
- * If omit_detached is true and the caller passed 'omit_detached_snapshot',
+ * If the caller passed 'omit_detached_snapshot',
  * the partition whose pg_inherits tuple marks it as "detach pending" is
  * omitted from the output list if the tuple is visible to that snapshot.
  * That is, such a partition is omitted from the output list depending on
@@ -84,7 +84,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
  * NULL) is set to the xmin of that pg_inherits tuple.
  */
 List *
-find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+find_inheritance_children_extended(Oid parentrelId,
 								   Snapshot omit_detached_snapshot,
 								   LOCKMODE lockmode, bool *detached_exist,
 								   TransactionId *detached_xmin)
@@ -146,7 +146,7 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 			if (detached_exist)
 				*detached_exist = true;
 
-			if (omit_detached && omit_detached_snapshot)
+			if (omit_detached_snapshot)
 			{
 				TransactionId xmin;
 
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 863b04c17d..23f1334dbc 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -48,7 +48,6 @@ typedef struct PartitionDirectoryEntry
 } PartitionDirectoryEntry;
 
 static PartitionDesc RelationBuildPartitionDesc(Relation rel,
-												bool omit_detached,
 												Snapshot omit_detached_snapshot);
 
 
@@ -76,8 +75,7 @@ static PartitionDesc RelationBuildPartitionDesc(Relation rel,
  * that the data doesn't become stale.
  */
 PartitionDesc
-RelationGetPartitionDescExt(Relation rel, bool omit_detached,
-							Snapshot omit_detached_snapshot)
+RelationGetPartitionDescExt(Relation rel, Snapshot omit_detached_snapshot)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
@@ -91,7 +89,7 @@ RelationGetPartitionDescExt(Relation rel, bool omit_detached,
 	 * so we we can used the cached descriptor in that case too.
 	 */
 	if (likely(rel->rd_partdesc &&
-			   (!rel->rd_partdesc->detached_exist || !omit_detached ||
+			   (!rel->rd_partdesc->detached_exist ||
 				omit_detached_snapshot == NULL)))
 		return rel->rd_partdesc;
 
@@ -106,9 +104,7 @@ RelationGetPartitionDescExt(Relation rel, bool omit_detached,
 	 * have been passed when rd_partdesc_nodetached was built, then we can
 	 * reuse it.  Otherwise we must build one from scratch.
 	 */
-	if (omit_detached &&
-		rel->rd_partdesc_nodetached &&
-		omit_detached_snapshot)
+	if (rel->rd_partdesc_nodetached && omit_detached_snapshot)
 	{
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
 
@@ -117,8 +113,7 @@ RelationGetPartitionDescExt(Relation rel, bool omit_detached,
 			return rel->rd_partdesc_nodetached;
 	}
 
-	return RelationBuildPartitionDesc(rel, omit_detached,
-									  omit_detached_snapshot);
+	return RelationBuildPartitionDesc(rel, omit_detached_snapshot);
 }
 
 /*
@@ -129,9 +124,11 @@ RelationGetPartitionDescExt(Relation rel, bool omit_detached,
 PartitionDesc
 RelationGetPartitionDesc(Relation rel, bool omit_detached)
 {
-	return RelationGetPartitionDescExt(rel, omit_detached,
-									   ActiveSnapshotSet() ?
-									   GetActiveSnapshot() : NULL);
+	Snapshot	snapshot = NULL;
+
+	if (omit_detached && ActiveSnapshotSet())
+		snapshot = GetActiveSnapshot();
+	return RelationGetPartitionDescExt(rel, snapshot);
 }
 
 /*
@@ -156,7 +153,7 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
  * for them.
  */
 static PartitionDesc
-RelationBuildPartitionDesc(Relation rel, bool omit_detached,
+RelationBuildPartitionDesc(Relation rel,
 						   Snapshot omit_detached_snapshot)
 {
 	PartitionDesc partdesc;
@@ -185,7 +182,6 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached,
 	detached_exist = false;
 	detached_xmin = InvalidTransactionId;
 	inhoids = find_inheritance_children_extended(RelationGetRelid(rel),
-												 omit_detached,
 												 omit_detached_snapshot,
 												 NoLock,
 												 &detached_exist,
@@ -353,7 +349,7 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached,
 	 * have set detached_xmin in that case), we consider there to be no
 	 * "omittable" detached partitions.
 	 */
-	is_omit = omit_detached && detached_exist && omit_detached_snapshot &&
+	is_omit = detached_exist && omit_detached_snapshot &&
 		TransactionIdIsValid(detached_xmin);
 
 	/*
@@ -467,8 +463,7 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel,
 		Assert(omit_detached_snapshot == NULL);
 		if (pdir->omit_detached && ActiveSnapshotSet())
 			omit_detached_snapshot = GetActiveSnapshot();
-		pde->pd = RelationGetPartitionDescExt(rel, pdir->omit_detached,
-											  omit_detached_snapshot);
+		pde->pd = RelationGetPartitionDescExt(rel, omit_detached_snapshot);
 		Assert(pde->pd != NULL);
 	}
 	return pde->pd;
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 67f148f2bf..14515d74d1 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -50,7 +50,7 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, InheritsParentIndexId, on pg_inher
 
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
-extern List *find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+extern List *find_inheritance_children_extended(Oid parentrelId,
 												Snapshot omit_detached_snapshot,
 												LOCKMODE lockmode, bool *detached_exist,
 												TransactionId *detached_xmin);
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index f42d137fc1..f3d701d5b4 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -65,7 +65,7 @@ typedef struct PartitionDescData
 
 
 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
-extern PartitionDesc RelationGetPartitionDescExt(Relation rel, bool omit_detached,
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel,
 												 Snapshot omit_detached_snapshot);
 
 extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);

#22

Amit Langote

amitlangote09@gmail.com

about 3 years ago

In reply to: Robert Haas (#21)

4 attachment(s)

Re: Eliminating SPI from RI triggers - take 2

On Wed, Oct 12, 2022 at 2:27 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 29, 2022 at 12:47 AM Amit Langote <amitlangote09@gmail.com> wrote:

[ patches ]

While looking over this thread I came across this code:

Thanks for looking.

/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
estate->es_partition_directory =
CreatePartitionDirectory(estate->es_query_cxt, false);

But CreatePartitionDirectory is declared like this:

extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt,
bool omit_detached);

So the comment seems to say the opposite of what the code does. The
code seems to match the explanation in the commit message for
71f4c8c6f74ba021e55d35b1128d22fb8c6e1629, so I am guessing that
perhaps s/always/never/ is needed here.

I think you are right. In commit 8aba9322511 that fixed a bug in this
area, we have this hunk:

-   /* Executor must always include detached partitions */
+   /* For data reading, executor always omits detached partitions */
    if (estate->es_partition_directory == NULL)
        estate->es_partition_directory =
-           CreatePartitionDirectory(estate->es_query_cxt, true);
+           CreatePartitionDirectory(estate->es_query_cxt, false);

The same commit also renamed the include_detached parameter of
CreatePartitionDirectory() to omit_detached but the comment change
didn't quite match with that.

I will fix this and other related comments to be consistent about
using the word "omit". Will include them in the updated 0003.

I also noticed that ExecCreatePartitionPruneState no longer exists in
the code but is still referenced in
src/test/modules/delay_execution/specs/partition-addition.spec

It looks like we missed that reference in commit 297daa9d435 wherein
we renamed it to just CreatePartitionPruneState().

I have posted a patch to fix this.

Regarding 0003, it seems unfortunate that
find_inheritance_children_extended() will now have 6 arguments 4 of
which have to do with detached partition handling. That is a lot of
detached partition handling, and it's hard to reason about. I don't
see an obvious way of simplifying things very much, but I wonder if we
could at least have the new omit_detached_snapshot snapshot replace
the existing bool omit_detached flag. Something like the attached
incremental patch.

Yeah, I was wondering the same too and don't see a reason why we
couldn't do it that way.

I have merged your incremental patch into 0003.

Probably we need to go further than the attached, though. I don't
think that PartitionDirectoryLookup() should be getting any new
arguments. The whole point of that function is that it's supposed to
ensure that the returned value is stable, and the comments say so. But
with these changes it isn't any more, because it depends on the
snapshot you pass. It seems fine to specify when you create the
partition directory that you want it to show a different, still-stable
view of the world, but as written, it seems to me to undermine the
idea that the return value is expected to be stable at all. Is there a
way we can avoid that?

Ok, I think it makes sense to have CreatePartitionDirectory take in
the snapshot and store it in PartitionDirectoryData for use during
each subsequent PartitionDirectoryLookup(). So we'll be replacing the
current omit_detached flag in PartitionDirectoryData, just as we are
doing for the interface functions. Done that way in 0003.

Regarding 0002, which introduces ri_LookupKeyInPkRel(), I realized
that it may have been initializing the ScanKeys wrongly. It was using
ScanKeyInit(), which uses InvalidOid for sk_subtype, causing the index
AM / btree code to use the wrong comparison functions when PK and FK
column types don't match. That may have been a reason for 32-bit
machine failures pointed out by Andres upthread. I've fixed it by
using ScanKeyEntryInitialize() to pass the opfamily-specified right
argument (FK column) type OID.

Attached updated patches.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v7-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchapplication/x-patch; name=v7-0004-Teach-ri_LookupKeyInPkRel-to-pass-omit_detached_s.patchDownload

From 2abdbf2d3cecf193f9f6dbb154a1bf0c36afae04 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Wed, 28 Sep 2022 16:37:55 +0900
Subject: [PATCH v7 4/4] Teach ri_LookupKeyInPkRel() to pass
 omit_detached_snapshot

Now that the RI triggers that need to look up PK rows in a
partitioned table can manipulate partitions directly through
ExecGetLeafPartitionForKey(), the snapshot being passed to omit or
include detach-pending partitions can also now be passed explicitly,
rather than using ActiveSnapshot for that purpose.

For the detach-pending partitions to be correctly omitted or included
from the consideration of PK row lookup, the PartitionDesc machinery
needs to see the latest snapshot.  Pushing the latest snapshot to be
the ActiveSnapshot as is done presently meant that even the scans that
should NOT be using the latest snapshot also end up using one to
time-qualify table/partition rows.  That led to incorrect results of
PK lookups over partitioned tables running under REPEATABLE READ
isolation; 00cb86e75d added a test that demonstrates this bug.

To fix, do not force-push the latest snapshot in the cases of PK
lookup over partitioned tables (as was being done by passing
detectNewRows=true to ri_PerformCheck()), but rather make
ri_LookupKeyInPkRel() pass the latest snapshot directly to
PartitionDirectoryLookup() through its new omit_detached_snapshot
parameter.

The buggy output in src/test/isolation/expected/fk-snapshot.out
of the relevant test case that was added by 00cb86e75d has been
changed to the correct output.
---
 src/backend/utils/adt/ri_triggers.c         | 18 ++++++------------
 src/test/isolation/expected/fk-snapshot.out |  4 ++--
 src/test/isolation/specs/fk-snapshot.spec   |  5 +----
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index d7fa2f36ce..eb00125657 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -434,17 +434,11 @@ RI_FKey_check(TriggerData *trigdata)
 							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
+	/* Now check that foreign key exists in PK table */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
 					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
+					false,
 					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
@@ -2715,16 +2709,16 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 		PartitionDirectory partdir;
 
 		/*
-		 * Note that this relies on the latest snapshot having been pushed by
-		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * Pass the latest snapshot for omit_detached_snapshot so that any
+		 * detach-pending partitions are correctly omitted or included from
+		 * the considerations of this lookup.  The PartitionDesc machinery
 		 * that runs as part of this will need to use the snapshot to determine
 		 * whether to omit or include any detach-pending partition based on the
 		 * whether the pg_inherits row that marks it as detach-pending is
 		 * is visible to it or not, respectively.
 		 */
-		Assert(ActiveSnapshotSet());
 		partdir = CreatePartitionDirectory(CurrentMemoryContext,
-										   GetActiveSnapshot());
+										   GetLatestSnapshot());
 		leaf_pk_rel = ExecGetLeafPartitionForKey(partdir,
 												 pk_rel, riinfo->nkeys,
 												 riinfo->pk_attnums,
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.35.3

v7-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchapplication/x-patch; name=v7-0003-Make-omit_detached-logic-independent-of-ActiveSna.patchDownload

From 661be8eae5b31876e4f998179a0b32bb2dd45727 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 15 Sep 2022 16:45:44 +0900
Subject: [PATCH v7 3/4] Make omit_detached logic independent of ActiveSnapshot

In find_inheritance_children_extended() and elsewhere, we use
ActiveSnapshot to determine if a detach-pending partition should
be considered detached or not based on checking if the xmin of
such a partition's pg_inherits row appears committed to that
snapshot or not.

This logic really came in to make the RI queries over partitioned
PK tables running under REPEATABLE READ isolation level work
correctly by appropriately omitting or including the detach-pending
partition from the plan, based on the visibility of the pg_inherits
row of that partition to the latest snapshot.  To that end,
RI_FKey_check()  was made to force-push the latest snapshot to get
that desired behavior.  However, pushing a snapshot this way makes
the results of other scans that use ActiveSnapshot violate the
isolation of the parent transaction; 00cb86e75d added a test that
demonstrates this bug.

So, this commit changes the PartitionDesc interface to allow the
desired snapshot to be passed explicitly as a parameter, rather than
having to scribble on ActiveSnapshot to pass it.  A later commit will
change ExecGetLeafPartitionForKey() used by RI PK row lookups to use
this new interface.

Note that the default behavior in the absence of any explicitly
specified snapshot is still to use the ActiveSnapshot, so there is
no behavior change from this to non-RI queries and sites that call
find_inheritance_children() for purposes other than querying a
partitioned table.
---
 src/backend/catalog/pg_inherits.c    | 33 +++++-----
 src/backend/executor/execPartition.c | 20 ++++--
 src/backend/optimizer/util/plancat.c |  6 +-
 src/backend/partitioning/partdesc.c  | 94 +++++++++++++++++-----------
 src/backend/utils/adt/ri_triggers.c  |  4 +-
 src/include/catalog/pg_inherits.h    |  7 ++-
 src/include/partitioning/partdesc.h  |  6 +-
 7 files changed, 109 insertions(+), 61 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 92afbc2f25..ba9ffba5f5 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -52,14 +52,18 @@ typedef struct SeenRelsEntry
  * then no locks are acquired, but caller must beware of race conditions
  * against possible DROPs of child relations.
  *
- * Partitions marked as being detached are omitted; see
+ * A partition marked as being detached is omitted from the result if the
+ * pg_inherits row showing the partition as being detached is visible to
+ * ActiveSnapshot, doing so only when one has been pushed; see
  * find_inheritance_children_extended for details.
  */
 List *
 find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
 {
-	return find_inheritance_children_extended(parentrelId, true, lockmode,
-											  NULL, NULL);
+	return find_inheritance_children_extended(parentrelId,
+											  ActiveSnapshotSet() ?
+											  GetActiveSnapshot() : NULL,
+											  lockmode, NULL, NULL);
 }
 
 /*
@@ -71,16 +75,17 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
  * If a partition's pg_inherits row is marked "detach pending",
  * *detached_exist (if not null) is set true.
  *
- * If omit_detached is true and there is an active snapshot (not the same as
- * the catalog snapshot used to scan pg_inherits!) and a pg_inherits tuple
- * marked "detach pending" is visible to that snapshot, then that partition is
- * omitted from the output list.  This makes partitions invisible depending on
- * whether the transaction that marked those partitions as detached appears
- * committed to the active snapshot.  In addition, *detached_xmin (if not null)
- * is set to the xmin of the row of the detached partition.
+ * If the caller passed 'omit_detached_snapshot', the partition whose
+ * pg_inherits tuple marks it as "detach pending" is omitted from the output
+ * list if the tuple is visible to that snapshot.  That is, such a partition
+ * is omitted from the output list depending on whether the transaction that
+ * marked that partition as detached appears committed to
+ * omit_detached_snapshot.  If omitted, *detached_xmin (if non NULL) is set
+ * to the xmin of that pg_inherits tuple.
  */
 List *
-find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
+find_inheritance_children_extended(Oid parentrelId,
+								   Snapshot omit_detached_snapshot,
 								   LOCKMODE lockmode, bool *detached_exist,
 								   TransactionId *detached_xmin)
 {
@@ -141,15 +146,13 @@ find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
 			if (detached_exist)
 				*detached_exist = true;
 
-			if (omit_detached && ActiveSnapshotSet())
+			if (omit_detached_snapshot)
 			{
 				TransactionId xmin;
-				Snapshot	snap;
 
 				xmin = HeapTupleHeaderGetXmin(inheritsTuple->t_data);
-				snap = GetActiveSnapshot();
 
-				if (!XidInMVCCSnapshot(xmin, snap))
+				if (!XidInMVCCSnapshot(xmin, omit_detached_snapshot))
 				{
 					if (detached_xmin)
 					{
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index affed94f19..553c594510 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -34,6 +34,7 @@
 #include "utils/partcache.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
 
 
 /*-----------------------
@@ -1098,17 +1099,24 @@ ExecInitPartitionDispatchInfo(EState *estate,
 	MemoryContext oldcxt;
 
 	/*
-	 * For data modification, it is better that executor does not include
-	 * partitions being detached, except when running in snapshot-isolation
-	 * mode.  This means that a read-committed transaction immediately gets a
+	 * For data modification, it is better that executor omits the partitions
+	 * being detached, except when running in snapshot-isolation mode.  This
+	 * means that a read-committed transaction immediately gets a
 	 * "no partition for tuple" error when a tuple is inserted into a
 	 * partition that's being detached concurrently, but a transaction in
 	 * repeatable-read mode can still use such a partition.
 	 */
 	if (estate->es_partition_directory == NULL)
+	{
+		Snapshot	omit_detached_snapshot = NULL;
+
+		Assert(ActiveSnapshotSet());
+		if (!IsolationUsesXactSnapshot())
+			omit_detached_snapshot = GetActiveSnapshot();
 		estate->es_partition_directory =
 			CreatePartitionDirectory(estate->es_query_cxt,
-									 !IsolationUsesXactSnapshot());
+									 omit_detached_snapshot);
+	}
 
 	oldcxt = MemoryContextSwitchTo(proute->memcxt);
 
@@ -2018,10 +2026,10 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
 	int			i;
 	ExprContext *econtext = planstate->ps_ExprContext;
 
-	/* For data reading, executor always omits detached partitions */
+	/* For data reading, executor never omits detached partitions */
 	if (estate->es_partition_directory == NULL)
 		estate->es_partition_directory =
-			CreatePartitionDirectory(estate->es_query_cxt, false);
+			CreatePartitionDirectory(estate->es_query_cxt, NULL);
 
 	n_part_hierarchies = list_length(pruneinfo->prune_infos);
 	Assert(n_part_hierarchies > 0);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6d5718ee4c..acae926829 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -2213,11 +2213,15 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 
 	/*
 	 * Create the PartitionDirectory infrastructure if we didn't already.
+	 * Note that the planner always omits the partitions being detached
+	 * concurrently.
 	 */
 	if (root->glob->partition_directory == NULL)
 	{
+		Assert(ActiveSnapshotSet());
 		root->glob->partition_directory =
-			CreatePartitionDirectory(CurrentMemoryContext, true);
+			CreatePartitionDirectory(CurrentMemoryContext,
+									 GetActiveSnapshot());
 	}
 
 	partdesc = PartitionDirectoryLookup(root->glob->partition_directory,
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index 737f0edd89..96e76b6ec9 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -37,7 +37,7 @@ typedef struct PartitionDirectoryData
 {
 	MemoryContext pdir_mcxt;
 	HTAB	   *pdir_hash;
-	bool		omit_detached;
+	Snapshot	omit_detached_snapshot;
 }			PartitionDirectoryData;
 
 typedef struct PartitionDirectoryEntry
@@ -48,17 +48,23 @@ typedef struct PartitionDirectoryEntry
 } PartitionDirectoryEntry;
 
 static PartitionDesc RelationBuildPartitionDesc(Relation rel,
-												bool omit_detached);
+												Snapshot omit_detached_snapshot);
 
 
 /*
- * RelationGetPartitionDesc -- get partition descriptor, if relation is partitioned
+ * RelationGetPartitionDescExt
+ * 		Get partition descriptor of a partitioned table, building one and
+ * 		caching it for later use if not already or if the cached one would
+ * 		not be suitable for a given request
  *
  * We keep two partdescs in relcache: rd_partdesc includes all partitions
- * (even those being concurrently marked detached), while rd_partdesc_nodetach
- * omits (some of) those.  We store the pg_inherits.xmin value for the latter,
- * to determine whether it can be validly reused in each case, since that
- * depends on the active snapshot.
+ * (even the one being concurrently marked detached), while
+ * rd_partdesc_nodetach omits the detach-pending partition.  If the latter one
+ * is present, rd_partdesc_nodetach_xmin would have been set to the xmin of
+ * the detach-pending partition's pg_inherits row, which is used to determine
+ * whether rd_partdesc_nodetach can be validly reused for a given request by
+ * checking if the xmin appears visible to 'omit_detached_snapshot' passed by
+ * the caller.
  *
  * Note: we arrange for partition descriptors to not get freed until the
  * relcache entry's refcount goes to zero (see hacks in RelationClose,
@@ -69,7 +75,7 @@ static PartitionDesc RelationBuildPartitionDesc(Relation rel,
  * that the data doesn't become stale.
  */
 PartitionDesc
-RelationGetPartitionDesc(Relation rel, bool omit_detached)
+RelationGetPartitionDescExt(Relation rel, Snapshot omit_detached_snapshot)
 {
 	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
@@ -78,36 +84,51 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
 	 * do so when we are asked to include all partitions including detached;
 	 * and also when we know that there are no detached partitions.
 	 *
-	 * If there is no active snapshot, detached partitions aren't omitted
-	 * either, so we can use the cached descriptor too in that case.
+	 * omit_detached_snapshot being NULL means that the caller doesn't care
+	 * that the returned partition descriptor may contain detached partitions,
+	 * so we we can used the cached descriptor in that case too.
 	 */
 	if (likely(rel->rd_partdesc &&
-			   (!rel->rd_partdesc->detached_exist || !omit_detached ||
-				!ActiveSnapshotSet())))
+			   (!rel->rd_partdesc->detached_exist ||
+				omit_detached_snapshot == NULL)))
 		return rel->rd_partdesc;
 
 	/*
-	 * If we're asked to omit detached partitions, we may be able to use a
-	 * cached descriptor too.  We determine that based on the pg_inherits.xmin
-	 * that was saved alongside that descriptor: if the xmin that was not in
-	 * progress for that active snapshot is also not in progress for the
-	 * current active snapshot, then we can use it.  Otherwise build one from
-	 * scratch.
+	 * If we're asked to omit the detached partition, we may be able to use
+	 * the other cached descriptor, which has been made to omit the detached
+	 * partition.  Whether that descriptor can be reused in this case is
+	 * determined based on cross-checking the visibility of
+	 * rd_partdesc_nodetached_xmin, that is, the pg_inherits.xmin of the
+	 * pg_inherits row of the detached partition: if the xmin seems in-progress
+	 * to both the given omit_detached_snapshot and to the snapshot that would
+	 * have been passed when rd_partdesc_nodetached was built, then we can
+	 * reuse it.  Otherwise we must build one from scratch.
 	 */
-	if (omit_detached &&
-		rel->rd_partdesc_nodetached &&
-		ActiveSnapshotSet())
+	if (rel->rd_partdesc_nodetached && omit_detached_snapshot)
 	{
-		Snapshot	activesnap;
-
 		Assert(TransactionIdIsValid(rel->rd_partdesc_nodetached_xmin));
-		activesnap = GetActiveSnapshot();
 
-		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin, activesnap))
+		if (!XidInMVCCSnapshot(rel->rd_partdesc_nodetached_xmin,
+							   omit_detached_snapshot))
 			return rel->rd_partdesc_nodetached;
 	}
 
-	return RelationBuildPartitionDesc(rel, omit_detached);
+	return RelationBuildPartitionDesc(rel, omit_detached_snapshot);
+}
+
+/*
+ * RelationGetPartitionDesc
+ *		Like RelationGetPartitionDescExt() but for callers that are fine with
+ *		ActiveSnapshot being used as omit_detached_snapshot
+ */
+PartitionDesc
+RelationGetPartitionDesc(Relation rel, bool omit_detached)
+{
+	Snapshot	snapshot = NULL;
+
+	if (omit_detached && ActiveSnapshotSet())
+		snapshot = GetActiveSnapshot();
+	return RelationGetPartitionDescExt(rel, snapshot);
 }
 
 /*
@@ -132,7 +153,8 @@ RelationGetPartitionDesc(Relation rel, bool omit_detached)
  * for them.
  */
 static PartitionDesc
-RelationBuildPartitionDesc(Relation rel, bool omit_detached)
+RelationBuildPartitionDesc(Relation rel,
+						   Snapshot omit_detached_snapshot)
 {
 	PartitionDesc partdesc;
 	PartitionBoundInfo boundinfo = NULL;
@@ -160,7 +182,8 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	detached_exist = false;
 	detached_xmin = InvalidTransactionId;
 	inhoids = find_inheritance_children_extended(RelationGetRelid(rel),
-												 omit_detached, NoLock,
+												 omit_detached_snapshot,
+												 NoLock,
 												 &detached_exist,
 												 &detached_xmin);
 
@@ -322,11 +345,11 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
 	 *
 	 * Note that if a partition was found by the catalog's scan to have been
 	 * detached, but the pg_inherit tuple saying so was not visible to the
-	 * active snapshot (find_inheritance_children_extended will not have set
-	 * detached_xmin in that case), we consider there to be no "omittable"
-	 * detached partitions.
+	 * omit_detached_snapshot (find_inheritance_children_extended() will not
+	 * have set detached_xmin in that case), we consider there to be no
+	 * "omittable" detached partitions.
 	 */
-	is_omit = omit_detached && detached_exist && ActiveSnapshotSet() &&
+	is_omit = detached_exist && omit_detached_snapshot &&
 		TransactionIdIsValid(detached_xmin);
 
 	/*
@@ -380,7 +403,7 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
  *		Create a new partition directory object.
  */
 PartitionDirectory
-CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
+CreatePartitionDirectory(MemoryContext mcxt, Snapshot omit_detached_snapshot)
 {
 	MemoryContext oldcontext = MemoryContextSwitchTo(mcxt);
 	PartitionDirectory pdir;
@@ -395,7 +418,7 @@ CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached)
 
 	pdir->pdir_hash = hash_create("partition directory", 256, &ctl,
 								  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
-	pdir->omit_detached = omit_detached;
+	pdir->omit_detached_snapshot = omit_detached_snapshot;
 
 	MemoryContextSwitchTo(oldcontext);
 	return pdir;
@@ -428,7 +451,8 @@ PartitionDirectoryLookup(PartitionDirectory pdir, Relation rel)
 		 */
 		RelationIncrementReferenceCount(rel);
 		pde->rel = rel;
-		pde->pd = RelationGetPartitionDesc(rel, pdir->omit_detached);
+		pde->pd = RelationGetPartitionDescExt(rel,
+											  pdir->omit_detached_snapshot);
 		Assert(pde->pd != NULL);
 	}
 	return pde->pd;
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 84d994f6cf..d7fa2f36ce 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2722,7 +2722,9 @@ ri_LookupKeyInPkRel(struct RI_Plan *plan,
 		 * whether the pg_inherits row that marks it as detach-pending is
 		 * is visible to it or not, respectively.
 		 */
-		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		Assert(ActiveSnapshotSet());
+		partdir = CreatePartitionDirectory(CurrentMemoryContext,
+										   GetActiveSnapshot());
 		leaf_pk_rel = ExecGetLeafPartitionForKey(partdir,
 												 pk_rel, riinfo->nkeys,
 												 riinfo->pk_attnums,
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 9221c2ea57..14515d74d1 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -23,6 +23,7 @@
 
 #include "nodes/pg_list.h"
 #include "storage/lock.h"
+#include "utils/snapshot.h"
 
 /* ----------------
  *		pg_inherits definition.  cpp turns this into
@@ -49,8 +50,10 @@ DECLARE_INDEX(pg_inherits_parent_index, 2187, InheritsParentIndexId, on pg_inher
 
 
 extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
-extern List *find_inheritance_children_extended(Oid parentrelId, bool omit_detached,
-												LOCKMODE lockmode, bool *detached_exist, TransactionId *detached_xmin);
+extern List *find_inheritance_children_extended(Oid parentrelId,
+												Snapshot omit_detached_snapshot,
+												LOCKMODE lockmode, bool *detached_exist,
+												TransactionId *detached_xmin);
 
 extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
 								 List **numparents);
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index 7e979433b6..51947b276b 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -14,6 +14,7 @@
 
 #include "partitioning/partdefs.h"
 #include "utils/relcache.h"
+#include "utils/snapshot.h"
 
 /*
  * Information about partitions of a partitioned table.
@@ -65,8 +66,11 @@ typedef struct PartitionDescData
 
 
 extern PartitionDesc RelationGetPartitionDesc(Relation rel, bool omit_detached);
+extern PartitionDesc RelationGetPartitionDescExt(Relation rel,
+												 Snapshot omit_detached_snapshot);
 
-extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt, bool omit_detached);
+extern PartitionDirectory CreatePartitionDirectory(MemoryContext mcxt,
+												   Snapshot omit_detached_snapshot);
 extern PartitionDesc PartitionDirectoryLookup(PartitionDirectory, Relation);
 extern void DestroyPartitionDirectory(PartitionDirectory pdir);
 
-- 
2.35.3

v7-0001-Avoid-using-SPI-in-RI-trigger-functions.patchapplication/x-patch; name=v7-0001-Avoid-using-SPI-in-RI-trigger-functions.patchDownload

From 0623d524c1e445c47d6e3c91e1f438fbd7a3548f Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 28 Jun 2022 17:15:51 +0900
Subject: [PATCH v7 1/4] Avoid using SPI in RI trigger functions

Currently, ri_PlanCheck() uses SPI_prepare() to get an "SPI plan"
containing a CachedPlanSource for the SQL query that a given RI
trigger function uses to implement an RI check.  Furthermore,
ri_PerformCheck() calls SPI_execute_snapshot() on the "SPI plan"
to execute the query for a given snapshot.

This commit invents ri_PlanCreate() and ri_PlanExecute() to take
the place of SPI_prepare() and SPI_execute_snapshot(), respectively.

ri_PlanCreate() will create an "RI plan" for a given query, using a
caller-specified (caller of ri_PlanCheck() that is) callback
function.  For example, the callback ri_SqlStringPlanCreate() will
produce a CachedPlanSource for the input SQL string, just as
SPI_prepare() would.

ri_PlanExecute() will execute the "RI plan" by calling a
caller-specific callback function whose pointer is saved within the
"RI Plan" data structure (struct RIPlan).  For example, the callback
ri_SqlStringPlanExecute() will fetch a CachedPlan for given
CachedPlanSource found in the "RI plan" and execute its PlannedStmt
by invoking the executor, just as SPI_execute_snapshot() would.
Details such as which snapshot to use are now fully controlled by
ri_PerformCheck(), whereas the previous arrangement relied on the
SPI logic for snapshot management.

ri_PlanCreate(), ri_PlanExecute(), and the "RI plan" data structure
they manipulate are pluggable such that it will be possible for the
future commits to replace the current SQL string based implementation
of some RI checks with something as simple as a C function to directly
scan the underlying table/index of the referencing or the referenced
table.

NB: RI_Initial_Check() and RI_PartitionRemove_Check() still use the
the SPI_prepare()/SPI_execute_snapshot() combination, because I
haven't yet added a proper DestReceiver in ri_SqlStringPlanExecute()
to receive and process the tuples that the execution would produce,
which those RI_* functions will need.
---
 src/backend/executor/spi.c          |   2 +-
 src/backend/utils/adt/ri_triggers.c | 600 +++++++++++++++++++++++-----
 2 files changed, 490 insertions(+), 112 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..a30553ea67 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -762,7 +762,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
  * end of the command.
  *
  * This is currently not documented in spi.sgml because it is only intended
- * for use by RI triggers.
+ * for use by some functions in ri_triggers.c.
  *
  * Passing snapshot == InvalidSnapshot will select the normal behavior of
  * fetching a new snapshot for each query.
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 1d503e7e01..cfebd9c4f2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -9,7 +9,7 @@
  *	across query and transaction boundaries, in fact they live as long as
  *	the backend does.  This works because the hashtable structures
  *	themselves are allocated by dynahash.c in its permanent DynaHashCxt,
- *	and the SPI plans they point to are saved using SPI_keepplan().
+ *	and the CachedPlanSources they point to are saved in CachedMemoryContext.
  *	There is not currently any provision for throwing away a no-longer-needed
  *	plan --- consider improving this someday.
  *
@@ -40,6 +40,8 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
 #include "storage/bufmgr.h"
+#include "tcop/pquery.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -127,10 +129,55 @@ typedef struct RI_ConstraintInfo
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
+/* RI plan callback functions */
+struct RI_Plan;
+typedef void (*RI_PlanCreateFunc_type) (struct RI_Plan *plan, const char *querystr, int nargs, Oid *paramtypes);
+typedef int (*RI_PlanExecFunc_type) (struct RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+									 Datum *param_vals, char *params_isnulls,
+									 Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+									 int limit, CmdType *last_stmt_cmdtype);
+typedef bool (*RI_PlanIsValidFunc_type) (struct RI_Plan *plan);
+typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
+
+/*
+ * RI_Plan
+ *
+ * Information related to the implementation of a plan for a given RI query.
+ * ri_PlanCheck() makes and stores these in ri_query_cache.  The callers of
+ * ri_PlanCheck() specify a RI_PlanCreateFunc_type function to fill in the
+ * caller-specific implementation details such as the callback functions
+ * to create, validate, free a plan, and also the arguments necessary for
+ * the execution of the plan.
+ */
+typedef struct RI_Plan
+{
+	/*
+	 * Context under which this struct and its subsidiary data gets allocated.
+	 * It is made a child of CacheMemoryContext.
+	 */
+	MemoryContext	plancxt;
+
+	/* Query parameter types. */
+	int				nargs;
+	Oid			   *paramtypes;
+
+	/*
+	 * Set of functions specified by a RI trigger function to implement
+	 * the plan for the trigger's RI query.
+	 */
+	RI_PlanExecFunc_type plan_exec_func;	/* execute the plan */
+	void		   *plan_exec_arg;			/* execution argument, such as
+											 * a List of CachedPlanSource */
+	RI_PlanIsValidFunc_type plan_is_valid_func; /* check if the plan still
+												 * valid for ri_query_cache
+												 * to continue caching it */
+	RI_PlanFreeFunc_type plan_free_func;	/* release plan resources */
+} RI_Plan;
+
 /*
  * RI_QueryKey
  *
- * The key identifying a prepared SPI plan in our query hashtable
+ * The key identifying a plan in our query hashtable
  */
 typedef struct RI_QueryKey
 {
@@ -144,7 +191,7 @@ typedef struct RI_QueryKey
 typedef struct RI_QueryHashEntry
 {
 	RI_QueryKey key;
-	SPIPlanPtr	plan;
+	RI_Plan	   *plan;
 } RI_QueryHashEntry;
 
 /*
@@ -208,8 +255,8 @@ static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
 
 static void ri_InitHashTables(void);
 static void InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue);
-static SPIPlanPtr ri_FetchPreparedPlan(RI_QueryKey *key);
-static void ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan);
+static RI_Plan *ri_FetchPreparedPlan(RI_QueryKey *key);
+static void ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan);
 static RI_CompareHashEntry *ri_HashCompareOp(Oid eq_opr, Oid typeid);
 
 static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
@@ -218,13 +265,14 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+							 const char *querystr, int nargs, Oid *argtypes,
+							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-							RI_QueryKey *qkey, SPIPlanPtr qplan,
+							RI_QueryKey *qkey, RI_Plan *qplan,
 							Relation fk_rel, Relation pk_rel,
 							TupleTableSlot *oldslot, TupleTableSlot *newslot,
-							bool detectNewRows, int expect_OK);
+							bool detectNewRows, int expected_cmdtype);
 static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
 							 Datum *vals, char *nulls);
@@ -232,6 +280,15 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static void ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes);
+static bool ri_SqlStringPlanIsValid(RI_Plan *plan);
+static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *vals, char *nulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype);
+static void ri_SqlStringPlanFree(RI_Plan *plan);
 
 
 /*
@@ -247,7 +304,7 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -344,9 +401,6 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
 
@@ -392,8 +446,9 @@ RI_FKey_check(TriggerData *trigdata)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -408,10 +463,7 @@ RI_FKey_check(TriggerData *trigdata)
 					fk_rel, pk_rel,
 					NULL, newslot,
 					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -466,16 +518,13 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	RI_QueryKey qkey;
 	bool		result;
 
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
@@ -523,8 +572,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -535,10 +585,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							 fk_rel, pk_rel,
 							 oldslot, NULL,
 							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+							 CMD_SELECT);
 
 	return result;
 }
@@ -632,7 +679,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -660,9 +707,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		return PointerGetDatum(NULL);
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
 	 * query for delete and update cases)
@@ -715,8 +759,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		}
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -727,10 +772,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_SELECT);
 
 	table_close(fk_rel, RowShareLock);
 
@@ -752,7 +794,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -770,9 +812,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONDELETE);
 
@@ -820,8 +859,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -833,10 +873,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_DELETE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_DELETE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -859,7 +896,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	TupleTableSlot *newslot;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -879,9 +916,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	newslot = trigdata->tg_newslot;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_ONUPDATE);
 
@@ -942,8 +976,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		}
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -954,10 +989,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 					fk_rel, pk_rel,
 					oldslot, newslot,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1039,7 +1071,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	Relation	pk_rel;
 	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	int32		queryno;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
@@ -1055,9 +1087,6 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 	pk_rel = trigdata->tg_relation;
 	oldslot = trigdata->tg_trigslot;
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
 	/*
 	 * Fetch or prepare a saved plan for the trigger.
 	 */
@@ -1174,8 +1203,9 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 			queryoids[i] = pk_type;
 		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
+		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -1186,10 +1216,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 					fk_rel, pk_rel,
 					oldslot, NULL,
 					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+					CMD_UPDATE);
 
 	table_close(fk_rel, RowExclusiveLock);
 
@@ -1382,7 +1409,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	int			save_nestlevel;
 	char		workmembuf[32];
 	int			spi_result;
-	SPIPlanPtr	qplan;
+	SPIPlanPtr  qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1963,7 +1990,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
 /* ----------
  * ri_BuildQueryKey -
  *
- *	Construct a hashtable key for a prepared SPI plan of an FK constraint.
+ *	Construct a hashtable key for a plan of an FK constraint.
  *
  *		key: output argument, *key is filled in based on the other arguments
  *		riinfo: info derived from pg_constraint entry
@@ -1982,9 +2009,9 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * the FK constraint (i.e., not the table on which the trigger has been
 	 * fired), and so it will be the same for all members of the inheritance
 	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * than the constraint's own OID.  This avoids creating duplicate plans,
+	 * saving lots of work and memory when there are many partitions with
+	 * similar FK constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
@@ -2258,15 +2285,368 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+/* Query string or an equivalent name to show in the error CONTEXT. */
+typedef struct RIErrorCallbackArg
+{
+	const char *query;
+} RIErrorCallbackArg;
+
+/*
+ * _RI_error_callback
+ *
+ * Add context information when a query being processed with ri_CreatePlan()
+ * or ri_PlanExecute() fails.
+ */
+static void
+_RI_error_callback(void *arg)
+{
+	RIErrorCallbackArg *carg = (RIErrorCallbackArg *) arg;
+	const char *query = carg->query;
+	int			syntaxerrposition;
+
+	Assert(query != NULL);
+
+	/*
+	 * If there is a syntax error position, convert to internal syntax error;
+	 * otherwise treat the query as an item of context stack
+	 */
+	syntaxerrposition = geterrposition();
+	if (syntaxerrposition > 0)
+	{
+		errposition(0);
+		internalerrposition(syntaxerrposition);
+		internalerrquery(query);
+	}
+	else
+		errcontext("SQL statement \"%s\"", query);
+}
+
+/*
+ * This creates a plan for a query written in SQL.
+ *
+ * The main product is a list of CachedPlanSource for each of the queries
+ * resulting from the provided query's rewrite that is saved to
+ * plan->plan_exec_arg.
+ */
+static void
+ri_SqlStringPlanCreate(RI_Plan *plan,
+					   const char *querystr, int nargs, Oid *paramtypes)
+{
+	List	   *raw_parsetree_list;
+	List	   *plancache_list = NIL;
+	ListCell   *list_item;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(querystr != NULL);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = querystr;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Parse the request string into a list of raw parse trees.
+	 */
+	raw_parsetree_list = raw_parser(querystr, RAW_PARSE_DEFAULT);
+
+	/*
+	 * Do parse analysis and rule rewrite for each raw parsetree, storing the
+	 * results into unsaved plancache entries.
+	 */
+	plancache_list = NIL;
+
+	foreach(list_item, raw_parsetree_list)
+	{
+		RawStmt    *parsetree = lfirst_node(RawStmt, list_item);
+		List	   *stmt_list;
+		CachedPlanSource *plansource;
+
+		/*
+		 * Create the CachedPlanSource before we do parse analysis, since it
+		 * needs to see the unmodified raw parse tree.
+		 */
+		plansource = CreateCachedPlan(parsetree, querystr,
+									  CreateCommandTag(parsetree->stmt));
+
+		stmt_list = pg_analyze_and_rewrite_fixedparams(parsetree, querystr,
+													   paramtypes, nargs,
+													   NULL);
+
+		/* Finish filling in the CachedPlanSource */
+		CompleteCachedPlan(plansource,
+						   stmt_list,
+						   NULL,
+						   paramtypes, nargs,
+						   NULL, NULL, 0,
+						   false);	/* not fixed result */
+
+		SaveCachedPlan(plansource);
+		plancache_list = lappend(plancache_list, plansource);
+	}
+
+	plan->plan_exec_func = ri_SqlStringPlanExecute;
+	plan->plan_exec_arg = (void *) plancache_list;
+	plan->plan_is_valid_func = ri_SqlStringPlanIsValid;
+	plan->plan_free_func = ri_SqlStringPlanFree;
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+}
+
+/*
+ * This executes the plan after creating a CachedPlan for each
+ * CachedPlanSource found stored in plan->plan_exec_arg using given
+ * parameter values.
+ *
+ * Return value is the number of tuples returned by the "last" CachedPlan.
+ */
+static int
+ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+						Datum *param_vals, char *param_isnulls,
+						Snapshot test_snapshot,
+						Snapshot crosscheck_snapshot,
+						int limit, CmdType *last_stmt_cmdtype)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell   *lc;
+	CachedPlan *cplan;
+	ResourceOwner plan_owner;
+	int			tuples_processed = 0;	/* appease compiler */
+	ParamListInfo paramLI;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	Assert(list_length(plancache_list) > 0);
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = NULL;		/* will be filled below */
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/*
+	 * Convert the parameters into a format that the planner and the executor
+	 * expect them to be in.
+	 */
+	if (plan->nargs > 0)
+	{
+		paramLI = makeParamList(plan->nargs);
+
+		for (int i = 0; i < plan->nargs; i++)
+		{
+			ParamExternData *prm = &paramLI->params[i];
+
+			prm->value = param_vals[i];
+			prm->isnull = (param_isnulls && param_isnulls[i] == 'n');
+			prm->pflags = PARAM_FLAG_CONST;
+			prm->ptype = plan->paramtypes[i];
+		}
+	}
+	else
+		paramLI = NULL;
+
+	plan_owner = CurrentResourceOwner; /* XXX - why? */
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+		List	   *stmt_list;
+		ListCell   *lc2;
+
+		ricallbackarg.query = plansource->query_string;
+
+		/*
+		 * Replan if needed, and increment plan refcount.  If it's a saved
+		 * plan, the refcount must be backed by the plan_owner.
+		 */
+		cplan = GetCachedPlan(plansource, paramLI, plan_owner, NULL);
+
+		stmt_list = cplan->stmt_list;
+
+		foreach(lc2, stmt_list)
+		{
+			PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+			DestReceiver *dest;
+			QueryDesc  *qdesc;
+			int			eflags;
+
+			*last_stmt_cmdtype = stmt->commandType;
+
+			/*
+			 * Advance the command counter before each command and update the
+			 * snapshot.
+			 */
+			CommandCounterIncrement();
+			UpdateActiveSnapshotCommandId();
+
+			dest = CreateDestReceiver(DestNone);
+			qdesc = CreateQueryDesc(stmt, plansource->query_string,
+									test_snapshot, crosscheck_snapshot,
+									dest, paramLI, NULL, 0);
+
+			/* Select execution options */
+			eflags = EXEC_FLAG_SKIP_TRIGGERS;
+			ExecutorStart(qdesc, eflags);
+			ExecutorRun(qdesc, ForwardScanDirection, limit, true);
+
+			/* We return the last executed statement's value. */
+			tuples_processed = qdesc->estate->es_processed;
+
+			ExecutorFinish(qdesc);
+			ExecutorEnd(qdesc);
+			FreeQueryDesc(qdesc);
+		}
+
+		/* Done with this plan, so release refcount */
+		ReleaseCachedPlan(cplan, CurrentResourceOwner);
+		cplan = NULL;
+	}
+
+	Assert(cplan == NULL);
+
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+/*
+ * Have any of the CachedPlanSources been invalidated since being created?
+ */
+static bool
+ri_SqlStringPlanIsValid(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		if (!CachedPlanIsValid(plansource))
+			return false;
+	}
+	return true;
+}
+
+/* Release CachedPlanSources and associated CachedPlans if any.*/
+static void
+ri_SqlStringPlanFree(RI_Plan *plan)
+{
+	List   *plancache_list = (List *) plan->plan_exec_arg;
+	ListCell *lc;
+
+	foreach(lc, plancache_list)
+	{
+		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc);
+
+		DropCachedPlan(plansource);
+	}
+}
+
+/*
+ * Create an RI_Plan for a given RI check query and initialize the
+ * plan callbacks and execution argument using the caller specified
+ * function.
+ */
+static RI_Plan *
+ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes)
+{
+	RI_Plan	   *plan;
+	MemoryContext plancxt,
+				oldcxt;
+
+	/*
+	 * Create a memory context for the plan underneath CurrentMemoryContext,
+	 * which is reparented later to be underneath CacheMemoryContext;
+	 */
+	plancxt = AllocSetContextCreate(CurrentMemoryContext,
+									"RI Plan",
+									ALLOCSET_SMALL_SIZES);
+	oldcxt = MemoryContextSwitchTo(plancxt);
+	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->plancxt = plancxt;
+	plan->nargs = nargs;
+	if (plan->nargs > 0)
+	{
+		plan->paramtypes = (Oid *) palloc(plan->nargs * sizeof(Oid));
+		memcpy(plan->paramtypes, paramtypes, plan->nargs * sizeof(Oid));
+	}
+
+	plan_create_func(plan, querystr, nargs, paramtypes);
+
+	MemoryContextSetParent(plan->plancxt, CacheMemoryContext);
+	MemoryContextSwitchTo(oldcxt);
+
+	return plan;
+}
+
+/*
+ * Execute the plan by calling plan_exec_func().
+ *
+ * Returns the number of tuples obtained by executing the plan; the caller
+ * typically wants to checks if at least 1 row was returned.
+ *
+ * *last_stmt_cmdtype is set to the CmdType of the last operation performed
+ * by executing the plan, which may consist of more than 1 executable
+ * statements if, for example, any rules belonging to the tables mentioned in
+ * the original query added additional operations.
+ */
+static int
+ri_PlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_rel,
+			   Datum *param_vals, char *param_isnulls,
+			   Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+			   int limit, CmdType *last_stmt_cmdtype)
+{
+	Assert(test_snapshot != NULL && ActiveSnapshotSet());
+	return plan->plan_exec_func(plan, fk_rel, pk_rel,
+								param_vals, param_isnulls,
+								test_snapshot,
+								crosscheck_snapshot,
+								limit, last_stmt_cmdtype);
+}
+
+/*
+ * Is the plan still valid to continue caching?
+ */
+static bool
+ri_PlanIsValid(RI_Plan *plan)
+{
+	return plan->plan_is_valid_func(plan);
+}
+
+/* Release plan resources. */
+static void
+ri_FreePlan(RI_Plan *plan)
+{
+	/* First call the implementation specific release function. */
+	plan->plan_free_func(plan);
+
+	/* Now get rid of the RI_plan and subsidiary data in its plancxt */
+	MemoryContextDelete(plan->plancxt);
+}
 
 /*
  * Prepare execution plan for a query to enforce an RI restriction
  */
-static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
+static RI_Plan *
+ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
-	SPIPlanPtr	qplan;
+	RI_Plan	   *qplan;
 	Relation	query_rel;
 	Oid			save_userid;
 	int			save_sec_context;
@@ -2285,18 +2665,12 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
-
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
-
-	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
+	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Save the plan */
-	SPI_keepplan(qplan);
 	ri_HashPreparedPlan(qkey, qplan);
 
 	return qplan;
@@ -2307,10 +2681,10 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
  */
 static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
-				RI_QueryKey *qkey, SPIPlanPtr qplan,
+				RI_QueryKey *qkey, RI_Plan *qplan,
 				Relation fk_rel, Relation pk_rel,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
-				bool detectNewRows, int expect_OK)
+				bool detectNewRows, int expected_cmdtype)
 {
 	Relation	query_rel,
 				source_rel;
@@ -2318,11 +2692,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
-	int			spi_result;
+	int			tuples_processed;
 	Oid			save_userid;
 	int			save_sec_context;
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
+	CmdType		last_stmt_cmdtype;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2373,30 +2748,36 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	 * the caller passes detectNewRows == false then it's okay to do the query
 	 * with the transaction snapshot; otherwise we use a current snapshot, and
 	 * tell the executor to error out if it finds any rows under the current
-	 * snapshot that wouldn't be visible per the transaction snapshot.  Note
-	 * that SPI_execute_snapshot will register the snapshots, so we don't need
-	 * to bother here.
+	 * snapshot that wouldn't be visible per the transaction snapshot.
+	 *
+	 * Also push the chosen snapshot so that anyplace that wants to use it
+	 * can get it by calling GetActiveSnapshot().
 	 */
 	if (IsolationUsesXactSnapshot() && detectNewRows)
 	{
-		CommandCounterIncrement();	/* be sure all my own work is visible */
 		test_snapshot = GetLatestSnapshot();
 		crosscheck_snapshot = GetTransactionSnapshot();
+		/* Make sure we have a private copy of the snapshot to modify. */
+		PushCopiedSnapshot(test_snapshot);
 	}
 	else
 	{
-		/* the default SPI behavior is okay */
-		test_snapshot = InvalidSnapshot;
+		test_snapshot = GetTransactionSnapshot();
 		crosscheck_snapshot = InvalidSnapshot;
+		PushActiveSnapshot(test_snapshot);
 	}
 
+	/* Also advance the command counter and update the snapshot. */
+	CommandCounterIncrement();
+	UpdateActiveSnapshotCommandId();
+
 	/*
 	 * If this is a select query (e.g., for a 'no action' or 'restrict'
 	 * trigger), we only need to see if there is a single row in the table,
 	 * matching the key.  Otherwise, limit = 0 - because we want the query to
 	 * affect ALL the matching rows.
 	 */
-	limit = (expect_OK == SPI_OK_SELECT) ? 1 : 0;
+	limit = (expected_cmdtype == CMD_SELECT) ? 1 : 0;
 
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
@@ -2405,19 +2786,16 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Finally we can run the query. */
-	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+	tuples_processed = ri_PlanExecute(qplan, fk_rel, pk_rel, vals, nulls,
 									  test_snapshot, crosscheck_snapshot,
-									  false, false, limit);
+									  limit, &last_stmt_cmdtype);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
 
-	/* Check result */
-	if (spi_result < 0)
-		elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+	PopActiveSnapshot();
 
-	if (expect_OK >= 0 && spi_result != expect_OK)
+	if (last_stmt_cmdtype != expected_cmdtype)
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
@@ -2428,15 +2806,15 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
 	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+		expected_cmdtype == CMD_SELECT &&
+		(tuples_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
 						   qkey->constr_queryno, false);
 
-	return SPI_processed != 0;
+	return tuples_processed != 0;
 }
 
 /*
@@ -2699,14 +3077,14 @@ ri_InitHashTables(void)
 /*
  * ri_FetchPreparedPlan -
  *
- * Lookup for a query key in our private hash table of prepared
- * and saved SPI execution plans. Return the plan if found or NULL.
+ * Lookup for a query key in our private hash table of saved RI plans.
+ * Return the plan if found or NULL.
  */
-static SPIPlanPtr
+static RI_Plan *
 ri_FetchPreparedPlan(RI_QueryKey *key)
 {
 	RI_QueryHashEntry *entry;
-	SPIPlanPtr	plan;
+	RI_Plan *plan;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2734,7 +3112,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 * locked both FK and PK rels.
 	 */
 	plan = entry->plan;
-	if (plan && SPI_plan_is_valid(plan))
+	if (plan && ri_PlanIsValid(plan))
 		return plan;
 
 	/*
@@ -2743,7 +3121,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
 	 */
 	entry->plan = NULL;
 	if (plan)
-		SPI_freeplan(plan);
+		ri_FreePlan(plan);
 
 	return NULL;
 }
@@ -2755,7 +3133,7 @@ ri_FetchPreparedPlan(RI_QueryKey *key)
  * Add another plan to our private SPI query plan hashtable.
  */
 static void
-ri_HashPreparedPlan(RI_QueryKey *key, SPIPlanPtr plan)
+ri_HashPreparedPlan(RI_QueryKey *key, RI_Plan *plan)
 {
 	RI_QueryHashEntry *entry;
 	bool		found;
-- 
2.35.3

v7-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchapplication/x-patch; name=v7-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patchDownload

From 4985eb70321e7d823f57e266c9915020a1f0ff5e Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v7 2/4] Avoid using an SQL query for some RI checks

For RI triggers that want to check if a given referenced value exists
in the referenced relation, it suffices to simply scan the foreign key
constraint's unique index, instead of issuing an SQL query to do the
same thing.

To do so, this commit builds on the RIPlan infrastructure added in the
previous commit.  It replaces ri_SqlStringPlanCreate() used in
RI_FKey_check() and ri_Check_Pk_Match() for creating the plan for their
respective checks by ri_LookupKeyInPkRelPlanCreate(), which installs
ri_LookupKeyInPkRel() as the plan to implement those checks.
ri_LookupKeyInPkRel() contains the logic to directly scan the unique
key associated with the foreign key constraint.
---
 src/backend/executor/execPartition.c | 166 ++++++++-
 src/backend/executor/nodeLockRows.c  | 160 +++++----
 src/backend/utils/adt/ri_triggers.c  | 492 +++++++++++++++++++++------
 src/include/executor/execPartition.h |   7 +
 src/include/executor/executor.h      |   9 +
 5 files changed, 655 insertions(+), 179 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..affed94f19 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1379,12 +1382,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset = -1;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/*
@@ -1591,6 +1594,157 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForKey
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified primary key tuple containing a subset of the
+ *		table's columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Partition descriptors for tuple routing are obtained by referring to the
+ * caller-specified partition directory.
+ *
+ * Any intermediate parent tables encountered on the way to finding the leaf
+ * partition are locked using 'lockmode' when opening.
+ *
+ * Returns NULL if no leaf partition is found for the key.
+ *
+ * This also finds the index in thus found leaf partition that is recorded as
+ * descending from 'root_idxoid' and returns it in '*leaf_idxoid'.
+ *
+ * Caller must close the returned relation, if any.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(PartitionDirectory partdir,
+						   Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, bool *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = key_nulls[k];
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc, partkey_vals,
+										  partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* Close any intermediate parents we opened, but keep the lock. */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index a74813c7aa..352cacd70b 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index cfebd9c4f2..84d994f6cf 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
@@ -50,6 +55,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -151,6 +157,12 @@ typedef void (*RI_PlanFreeFunc_type) (struct RI_Plan *plan);
  */
 typedef struct RI_Plan
 {
+	/* Constraint for this plan. */
+	const RI_ConstraintInfo *riinfo;
+
+	/* RI query type code. */
+	int				constr_queryno;
+
 	/*
 	 * Context under which this struct and its subsidiary data gets allocated.
 	 * It is made a child of CacheMemoryContext.
@@ -265,7 +277,8 @@ static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
 static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
 static Oid	get_ri_constraint_root(Oid constrOid);
-static RI_Plan *ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+static RI_Plan *ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+							 RI_PlanCreateFunc_type plan_create_func,
 							 const char *querystr, int nargs, Oid *argtypes,
 							 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
@@ -289,6 +302,15 @@ static int ri_SqlStringPlanExecute(RI_Plan *plan, Relation fk_rel, Relation pk_r
 						Snapshot crosscheck_snapshot,
 						int limit, CmdType *last_stmt_cmdtype);
 static void ri_SqlStringPlanFree(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes);
+static int ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype);
+static bool ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan);
+static void ri_LookupKeyInPkRelPlanFree(RI_Plan *plan);
 
 
 /*
@@ -384,9 +406,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * LookupKeyInPkRelPlanExecute() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -406,49 +428,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -533,48 +515,9 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
-							 querybuf.data, riinfo->nkeys, queryoids,
+		/* Prepare and save the plan using ri_LookupKeyInPkRelPlanCreate(). */
+		qplan = ri_PlanCheck(riinfo, ri_LookupKeyInPkRelPlanCreate,
+							 NULL, 0 /* nargs */, NULL /* argtypes */,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -760,7 +703,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -860,7 +803,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -977,7 +920,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys * 2, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -1204,7 +1147,7 @@ ri_set(TriggerData *trigdata, bool is_set_null, int tgkind)
 		}
 
 		/* Prepare and save the plan using ri_SqlStringPlanCreate(). */
-		qplan = ri_PlanCheck(ri_SqlStringPlanCreate,
+		qplan = ri_PlanCheck(riinfo, ri_SqlStringPlanCreate,
 							 querybuf.data, riinfo->nkeys, queryoids,
 							 &qkey, fk_rel, pk_rel);
 	}
@@ -2013,6 +1956,11 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * saving lots of work and memory when there are many partitions with
 	 * similar FK constraints.
 	 *
+	 * We must not share the plan for RI_PLAN_CHECK_LOOKUPPK queries either,
+	 * because its execution function (ri_LookupKeyInPkRel()) expects to see
+	 * the RI_ConstraintInfo of the individual leaf partitions that the
+	 * query fired on.
+	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
@@ -2020,7 +1968,8 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
+		constr_queryno != RI_PLAN_CHECK_LOOKUPPK)
 		key->constr_id = riinfo->constraint_root_id;
 	else
 		key->constr_id = riinfo->constraint_id;
@@ -2285,10 +2234,17 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 	}
 }
 
+typedef enum RI_Plantype
+{
+	RI_PLAN_SQL = 0,
+	RI_PLAN_CHECK_FUNCTION
+} RI_Plantype;
+
 /* Query string or an equivalent name to show in the error CONTEXT. */
 typedef struct RIErrorCallbackArg
 {
 	const char *query;
+	RI_Plantype plantype;
 } RIErrorCallbackArg;
 
 /*
@@ -2318,7 +2274,17 @@ _RI_error_callback(void *arg)
 		internalerrquery(query);
 	}
 	else
-		errcontext("SQL statement \"%s\"", query);
+	{
+		switch (carg->plantype)
+		{
+			case RI_PLAN_SQL:
+				errcontext("SQL statement \"%s\"", query);
+				break;
+			case RI_PLAN_CHECK_FUNCTION:
+				errcontext("RI check function \"%s\"", query);
+				break;
+		}
+	}
 }
 
 /*
@@ -2555,14 +2521,321 @@ ri_SqlStringPlanFree(RI_Plan *plan)
 	}
 }
 
+/*
+ * Creates an RI_Plan to look a key up in the PK table.
+ *
+ * Not much to do beside initializing the expected callback members, because
+ * there is no query string to parse and plan.
+ */
+static void
+ri_LookupKeyInPkRelPlanCreate(RI_Plan *plan,
+							  const char *querystr, int nargs, Oid *paramtypes)
+{
+	Assert(querystr == NULL);
+	plan->plan_exec_func = ri_LookupKeyInPkRel;
+	plan->plan_exec_arg = NULL;
+	plan->plan_is_valid_func = ri_LookupKeyInPkRelPlanIsValid;
+	plan->plan_free_func = ri_LookupKeyInPkRelPlanFree;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
+
+/*
+ * ri_CheckPermissions
+ * 		Check that the new user has permissions to look into the schema of
+ * 		and SELECT from 'query_rel'
+ *
+ * Provided for non-SQL implementors of an RI_Plan.
+ */
+static void
+ri_CheckPermissions(Relation query_rel)
+{
+	AclResult	aclresult;
+
+	/* USAGE on schema. */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(query_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(query_rel)));
+
+	/* SELECT on relation. */
+	aclresult = pg_class_aclcheck(RelationGetRelid(query_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(query_rel));
+}
+
+/*
+ * Checks whether a tuple containing the given unique key given by pk_vals,
+ * pk_nulls exists in 'pk_rel'.  The key is looked up using the constraint's
+ * index given in plan->riinfo.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation (fk_rel) or the one being deleted from the referenced relation
+ * (pk_rel).
+ */
+static int
+ri_LookupKeyInPkRel(struct RI_Plan *plan,
+					Relation fk_rel, Relation pk_rel,
+					Datum *pk_vals, char *pk_nulls,
+					Snapshot test_snapshot, Snapshot crosscheck_snapshot,
+					int limit, CmdType *last_stmt_cmdtype)
+{
+	const RI_ConstraintInfo *riinfo = plan->riinfo;
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	int			tuples_processed = 0;
+	const Oid  *eq_oprs;
+	Datum		pk_values[INDEX_MAX_KEYS];
+	bool		pk_isnulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	RIErrorCallbackArg ricallbackarg;
+	ErrorContextCallback rierrcontext;
+
+	/* We're effectively doing a CMD_SELECT below. */
+	*last_stmt_cmdtype = CMD_SELECT;
+
+	/*
+	 * Setup error traceback support for ereport()
+	 */
+	ricallbackarg.query = pstrdup("ri_LookupKeyInPkRel");
+	ricallbackarg.plantype = RI_PLAN_CHECK_FUNCTION;
+	rierrcontext.callback = _RI_error_callback;
+	rierrcontext.arg = &ricallbackarg;
+	rierrcontext.previous = error_context_stack;
+	error_context_stack = &rierrcontext;
+
+	/* XXX Maybe afterTriggerInvokeEvents() / AfterTriggerExecute() should? */
+	CHECK_FOR_INTERRUPTS();
+
+	ri_CheckPermissions(pk_rel);
+
+	/*
+	 * Choose the equality operators to use when scanning the PK index below.
+	 *
+	 * May need to cast the foreign key value (of the FK column's type) to
+	 * the corresponding PK column's type if the equality operator
+	 * demands it.
+	 */
+	if (plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
+	{
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				pk_isnulls[i] = false;
+				pk_values[i] = pk_vals[i];
+			}
+			else
+			{
+				Assert(false);
+			}
+		}
+	}
+	else
+	{
+		Assert(plan->constr_queryno == RI_PLAN_CHECK_LOOKUPPK);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				pk_isnulls[i] = false;
+				pk_values[i] = pk_vals[i];
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+				{
+					pk_values[i] = FunctionCall3(&entry->cast_func_finfo,
+												 pk_vals[i],
+												 Int32GetDatum(-1), /* typmod */
+												 BoolGetDatum(false)); /* implicit coercion */
+				}
+			}
+			else
+			{
+				Assert(false);
+			}
+		}
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		PartitionDirectory partdir;
+
+		/*
+		 * Note that this relies on the latest snapshot having been pushed by
+		 * the caller to be the ActiveSnapshot.  The PartitionDesc machinery
+		 * that runs as part of this will need to use the snapshot to determine
+		 * whether to omit or include any detach-pending partition based on the
+		 * whether the pg_inherits row that marks it as detach-pending is
+		 * is visible to it or not, respectively.
+		 */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		leaf_pk_rel = ExecGetLeafPartitionForKey(partdir,
+												 pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_values, pk_isnulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+
+		/*
+		 * XXX - Would be nice if this could be saved across calls. Problem
+		 * with just putting it in RI_Plan.plan_exec_arg is that the RI_Plan
+		 * is cached for the session duration, whereas the PartitionDirectory
+		 * can't last past the transaction.
+		 */
+		DestroyPartitionDirectory(partdir);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			goto done;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/*
+	 * Set up ScanKeys for the index scan.  This is essentially how
+	 * ExecIndexBuildScanKeys() sets them up.
+	 */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			lefttype,
+					righttype;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		int			strat;
+		RegProcedure regop = get_opcode(operator);
+
+		Assert(!pk_isnulls[i]);
+		get_op_opfamily_properties(operator, opfamily, false, &strat,
+								   &lefttype, &righttype);
+		ScanKeyEntryInitialize(&skey[i], 0, pkattno, strat, righttype,
+							   idxrel->rd_indcollation[i], regop,
+							   pk_values[i]);
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, test_snapshot, num_pk, 0);
+
+	/* Install the ScanKeys. */
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		if (ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+							   test_snapshot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock, NULL))
+			tuples_processed = 1;
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+done:
+	/*
+	 * Pop the error context stack
+	 */
+	error_context_stack = rierrcontext.previous;
+
+	return tuples_processed;
+}
+
+static bool
+ri_LookupKeyInPkRelPlanIsValid(RI_Plan *plan)
+{
+	/* Never store anything that can be invalidated. */
+	return true;
+}
+
+static void
+ri_LookupKeyInPkRelPlanFree(RI_Plan *plan)
+{
+	/* Nothing to free. */
+}
+
 /*
  * Create an RI_Plan for a given RI check query and initialize the
  * plan callbacks and execution argument using the caller specified
  * function.
  */
 static RI_Plan *
-ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
-			  const char *querystr, int nargs, Oid *paramtypes)
+ri_PlanCreate(const RI_ConstraintInfo *riinfo,
+			  RI_PlanCreateFunc_type plan_create_func,
+			  const char *querystr, int nargs, Oid *paramtypes,
+			  int constr_queryno)
 {
 	RI_Plan	   *plan;
 	MemoryContext plancxt,
@@ -2577,6 +2850,8 @@ ri_PlanCreate(RI_PlanCreateFunc_type plan_create_func,
 									ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(plancxt);
 	plan = (RI_Plan *) palloc0(sizeof(*plan));
+	plan->riinfo = riinfo;
+	plan->constr_queryno = constr_queryno;
 	plan->plancxt = plancxt;
 	plan->nargs = nargs;
 	if (plan->nargs > 0)
@@ -2642,7 +2917,8 @@ ri_FreePlan(RI_Plan *plan)
  * Prepare execution plan for a query to enforce an RI restriction
  */
 static RI_Plan *
-ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
+ri_PlanCheck(const RI_ConstraintInfo *riinfo,
+			 RI_PlanCreateFunc_type plan_create_func,
 			 const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
@@ -2666,7 +2942,8 @@ ri_PlanCheck(RI_PlanCreateFunc_type plan_create_func,
 						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
 						   SECURITY_NOFORCE_RLS);
 	/* Create the plan */
-	qplan = ri_PlanCreate(plan_create_func, querystr, nargs, argtypes);
+	qplan = ri_PlanCreate(riinfo, plan_create_func, querystr, nargs,
+						  argtypes, qkey->constr_queryno);
 
 	/* Restore UID and security context */
 	SetUserIdAndSecContext(save_userid, save_sec_context);
@@ -3277,7 +3554,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -3333,8 +3613,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_LookupKeyInPkRel()), in which
+		 * case, we only need the cast if the right operand value doesn't match
+		 * the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..621cefb7ff 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,6 +31,13 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
+extern Relation ExecGetLeafPartitionForKey(PartitionDirectory partdir,
+										   Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, bool *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..2f415b80ce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern void ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.35.3

#23

Robert Haas

robertmhaas@gmail.com

about 3 years ago

In reply to: Amit Langote (#22)

Re: Eliminating SPI from RI triggers - take 2

On Sat, Oct 15, 2022 at 1:47 AM Amit Langote <amitlangote09@gmail.com> wrote:

I have merged your incremental patch into 0003.

Note that if someone goes to commit 0003, they would have no idea that
I contributed to the effort. You should probably try to keep a running
list of co-authors, reviewers, or other people that need to be
acknowledged in your draft commit messages. On that note, I think that
the commit messages for 0001 and to some extent 0002 need some more
work. In particular, it seems like the commit message for 0001 is
entirely concerned with what the patch does and says nothing about why
it's a good idea. In my opinion, a good commit message needs to do
both, ideally but not always in less space than this patch takes to do
only one of those things. 0002 has the same problem to a lesser
degree, since it is perhaps not so hard to infer that the reason for
avoiding the SQL query is performance.

I am wondering if the ordering for this patch series needs to be
rethought. The commit message for 0004 reads as if it is fixing a bug
introduced by earlier patches in the series. If that is not correct,
maybe it can be made clearer. If it is correct, then that's not good,
because we don't want to commit buggy patches and then make follow-up
commits to remove the bugs. If a planned commit needs new
infrastructure to avoid being buggy, the commits adding that
infrastructure should happen first.

But I think the bigger problem for this patch set is that the
design-level feedback from
/messages/by-id/CA+TgmoaiTNj4DgQy42OT9JmTTP1NWcMV+ke0i=+a7=VgnzqGXw@mail.gmail.com
hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
is still trivial in v7, and that still seems wrong to me. And I still
don't know how we're going to avoid changing the semantics in ways
that are undesirable, or even knowing precisely what we did change. If
we don't have answers to those questions, then I suspect that this
patch set isn't going anywhere.

--
Robert Haas
EDB: http://www.enterprisedb.com

#24

Andres Freund

andres@anarazel.de

about 3 years ago

In reply to: Amit Langote (#22)

Re: Eliminating SPI from RI triggers - take 2

Hi,

On 2022-10-15 14:47:05 +0900, Amit Langote wrote:

Attached updated patches.

These started to fail to build recently:

[04:43:33.046] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage -Isrc/include/utils -Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/executor_execPartition.c.o -MF src/backend/postgres_lib.a.p/executor_execPartition.c.o.d -o src/backend/postgres_lib.a.p/executor_execPartition.c.o -c ../src/backend/executor/execPartition.c
[04:43:33.046] ../src/backend/executor/execPartition.c: In function ‘ExecGetLeafPartitionForKey’:
[04:43:33.046] ../src/backend/executor/execPartition.c:1679:19: error: too few arguments to function ‘build_attrmap_by_name_if_req’
[04:43:33.046] 1679 | AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
[04:43:33.046] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[04:43:33.046] In file included from ../src/include/access/tupconvert.h:17,
[04:43:33.046] from ../src/include/nodes/execnodes.h:32,
[04:43:33.046] from ../src/include/executor/execPartition.h:16,
[04:43:33.046] from ../src/backend/executor/execPartition.c:21:
[04:43:33.046] ../src/include/access/attmap.h:47:17: note: declared here
[04:43:33.046] 47 | extern AttrMap *build_attrmap_by_name_if_req(TupleDesc indesc,
[04:43:33.046] | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

Regards,

Andres

#25

Gregory Stark (as CFM)

stark.cfm@gmail.com

almost 3 years ago

In reply to: Robert Haas (#23)

Re: Eliminating SPI from RI triggers - take 2

On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Oct 15, 2022 at 1:47 AM Amit Langote <amitlangote09@gmail.com> wrote:

But I think the bigger problem for this patch set is that the
design-level feedback from
/messages/by-id/CA+TgmoaiTNj4DgQy42OT9JmTTP1NWcMV+ke0i=+a7=VgnzqGXw@mail.gmail.com
hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
is still trivial in v7, and that still seems wrong to me. And I still
don't know how we're going to avoid changing the semantics in ways
that are undesirable, or even knowing precisely what we did change. If
we don't have answers to those questions, then I suspect that this
patch set isn't going anywhere.

Amit, do you plan to work on this patch for this commitfest (and
therefore this release?). And do you think it has a realistic chance
of being ready for commit this month?

It looks to me like you have some good feedback and can progress and
are unlikely to finish this patch for this release. In which case
maybe we can move it forward to the next release?

--
Gregory Stark
As Commitfest Manager

#26

Amit Langote

amitlangote09@gmail.com

almost 3 years ago

In reply to: Gregory Stark (as CFM) (#25)

Re: Eliminating SPI from RI triggers - take 2

Hi Greg,

On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM)
<stark.cfm@gmail.com> wrote:

On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:

But I think the bigger problem for this patch set is that the
design-level feedback from
/messages/by-id/CA+TgmoaiTNj4DgQy42OT9JmTTP1NWcMV+ke0i=+a7=VgnzqGXw@mail.gmail.com
hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
is still trivial in v7, and that still seems wrong to me. And I still
don't know how we're going to avoid changing the semantics in ways
that are undesirable, or even knowing precisely what we did change. If
we don't have answers to those questions, then I suspect that this
patch set isn't going anywhere.

Amit, do you plan to work on this patch for this commitfest (and
therefore this release?). And do you think it has a realistic chance
of being ready for commit this month?

Unfortunately, I don't think so.

It looks to me like you have some good feedback and can progress and
are unlikely to finish this patch for this release. In which case
maybe we can move it forward to the next release?

Yes, that's what I am thinking too at this point.

I agree with Robert's point that changing the implementation from an
SQL query plan to a hand-rolled C function is going to change the
semantics in some known and perhaps many unknown ways. Until I have
enumerated all those semantic changes, it's hard to judge whether the
hand-rolled implementation is correct to begin with. I had started
doing that a few months back but couldn't keep up due to some other
work.

An example I had found of a thing that would be broken by taking out
the executor out of the equation, as the patch does, is the behavior
of an update under READ COMMITTED isolation, whereby a PK tuple being
checked for existence is concurrently updated and thus needs to
rechecked whether it still satisfies the RI query's conditions. The
executor has the EvalPlanQual() mechanism to do that, but while the
hand-rolled implementation did refactor ExecLockRows() to allow doing
the tuple-locking without a PlanState, it gave no consideration to
handling rechecking under READ COMMITTED isolation.

There may be other such things and I think I'd better look for them
carefully in the next cycle than in the next couple of weeks for this
release. My apologies that I didn't withdraw the patch sooner.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

#27

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Amit Langote (#26)

Re: Eliminating SPI from RI triggers - take 2

On 21 Mar 2023, at 06:03, Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote:

On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:

But I think the bigger problem for this patch set is that the
design-level feedback from
/messages/by-id/CA+TgmoaiTNj4DgQy42OT9JmTTP1NWcMV+ke0i=+a7=VgnzqGXw@mail.gmail.com
hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
is still trivial in v7, and that still seems wrong to me. And I still
don't know how we're going to avoid changing the semantics in ways
that are undesirable, or even knowing precisely what we did change. If
we don't have answers to those questions, then I suspect that this
patch set isn't going anywhere.

Amit, do you plan to work on this patch for this commitfest (and
therefore this release?). And do you think it has a realistic chance
of being ready for commit this month?

Unfortunately, I don't think so.

This thread has stalled with the patch not building and/or applying for a
while, so I am going to mark this Returned with Feebdback. Please feel free to
resubmit to a future CF when there is renewed interest/time to work on this.

--
Daniel Gustafsson

#28

Amit Langote

amitlangote09@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#27)

Re: Eliminating SPI from RI triggers - take 2

On Mon, Jul 10, 2023 at 5:27 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 21 Mar 2023, at 06:03, Amit Langote <amitlangote09@gmail.com> wrote:
On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM) <stark.cfm@gmail.com> wrote:

On Mon, 17 Oct 2022 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote:

But I think the bigger problem for this patch set is that the
design-level feedback from
/messages/by-id/CA+TgmoaiTNj4DgQy42OT9JmTTP1NWcMV+ke0i=+a7=VgnzqGXw@mail.gmail.com
hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
is still trivial in v7, and that still seems wrong to me. And I still
don't know how we're going to avoid changing the semantics in ways
that are undesirable, or even knowing precisely what we did change. If
we don't have answers to those questions, then I suspect that this
patch set isn't going anywhere.

Amit, do you plan to work on this patch for this commitfest (and
therefore this release?). And do you think it has a realistic chance
of being ready for commit this month?

Unfortunately, I don't think so.

This thread has stalled with the patch not building and/or applying for a
while, so I am going to mark this Returned with Feebdback.

Agreed, I was about to do so myself.

I'll give this another try later in the cycle.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com